1.arxiv.orgの人工知能の論文を分類したい(7)
・2017年にarxiv.orgの代表的な人工知能6カテゴリに登録された論文の概要から単語傾向を調べた
・state-of-the-artはやはり人気が高く79位にランクインしている
・年間登録論文数は約13700論文だが12月のCVのみと単語出現傾向が似ていて一括分類は無理そう
2.arxiv.orgの代表的な人工知能6カテゴリの単語出現傾向
2017年にarxiv.orgに登録された代表的な6カテゴリの概要をクローラーで取得し単語数を数えた表が下記
|
ああ順番 |
単語 | 出現回数 |
| 1 | the | 109,397 |
| 2 | of | 76,443 |
| 3 | and | 60,080 |
| 4 | a | 55,859 |
| 5 | to | 52,235 |
| 6 | in | 35,461 |
| 7 | for | 30,482 |
| 8 | is | 27,326 |
| 9 | that | 23,573 |
| 10 | on | 21,859 |
| 11 | with | 18,629 |
| 12 | we | 18,485 |
| 13 | We | 16,074 |
| 14 | this | 13,751 |
| 15 | are | 13,694 |
| 16 | as | 13,262 |
| 17 | by | 12,773 |
| 18 | from | 11,724 |
| 19 | an | 11,678 |
| 20 | The | 11,369 |
| 21 | In | 9,969 |
| 22 | can | 9,611 |
| 23 | which | 9,476 |
| 24 | learning | 9,077 |
| 25 | be | 8,666 |
| 26 | our | 8,463 |
| 27 | data | 7,641 |
| 28 | model | 7,556 |
| 29 | using | 6,980 |
| 30 | method | 6,102 |
| 31 | show | 5,928 |
| 32 | proposed | 5,778 |
| 33 | neural | 5,497 |
| 34 | based | 5,493 |
| 35 | propose | 5,363 |
| 36 | network | 5,244 |
| 37 | results | 5,093 |
| 38 | approach | 5,061 |
| 39 | This | 4,983 |
| 40 | or | 4,961 |
| 41 | such | 4,829 |
| 42 | it | 4,800 |
| 43 | deep | 4,756 |
| 44 | image | 4,747 |
| 45 | have | 4,735 |
| 46 | has | 4,471 |
| 47 | performance | 4,376 |
| 48 | models | 4,249 |
| 49 | methods | 4,151 |
| 50 | algorithm | 4,072 |
| 51 | new | 4,054 |
| 52 | different | 4,015 |
| 53 | two | 4,013 |
| 54 | training | 3,986 |
| 55 | also | 3,939 |
| 56 | Our | 3,888 |
| 57 | problem | 3,856 |
| 58 | used | 3,827 |
| 59 | these | 3,822 |
| 60 | not | 3,711 |
| 61 | between | 3,584 |
| 62 | at | 3,533 |
| 63 | more | 3,518 |
| 64 | networks | 3,464 |
| 65 | both | 3,427 |
| 66 | paper, | 3,408 |
| 67 | use | 3,237 |
| 68 | A | 3,210 |
| 69 | their | 3,137 |
| 70 | been | 3,128 |
| 71 | paper | 3,105 |
| 72 | novel | 3,014 |
| 73 | information | 3,012 |
| 74 | Learning | 2,967 |
| 75 | over | 2,956 |
| 76 | its | 2,955 |
| 77 | features | 2,926 |
| 78 | each | 2,925 |
| 79 | state-of-the-art | 2,906 |
| 80 | present | 2,893 |
| 81 | than | 2,851 |
| 82 | demonstrate | 2,850 |
| 83 | into | 2,835 |
| 84 | images | 2,764 |
| 85 | number | 2,735 |
| 86 | classification | 2,730 |
| 87 | where | 2,603 |
| 88 | framework | 2,560 |
| 89 | large | 2,540 |
| 90 | algorithms | 2,530 |
| 91 | feature | 2,468 |
| 92 | when | 2,462 |
| 93 | Neural | 2,440 |
| 94 | only | 2,438 |
| 95 | To | 2,407 |
| 96 | other | 2,383 |
| 97 | However, | 2,343 |
| 98 | Deep | 2,327 |
| 99 | one | 2,325 |
| 100 | system | 2,305 |
| 101 | set | 2,294 |
| 102 | but | 2,235 |
| 103 | machine | 2,234 |
| 104 | time | 2,196 |
| 105 | first | 2,129 |
| 106 | analysis | 2,106 |
| 107 | well | 2,088 |
| 108 | detection | 2,077 |
| 109 | existing | 2,075 |
| 110 | accuracy | 2,062 |
| 111 | convolutional | 2,046 |
| 112 | how | 2,039 |
| 113 | many | 2,037 |
| 114 | Networks | 1,983 |
| 115 | human | 1,974 |
| 116 | while | 1,962 |
| 117 | all | 1,953 |
| 118 | task | 1,934 |
| 119 | work | 1,915 |
| 120 | provide | 1,913 |
| 121 | 3D | 1,885 |
| 122 | multiple | 1,847 |
| 123 | learn | 1,843 |
| 124 | dataset | 1,843 |
| 125 | experiments | 1,835 |
| 126 | several | 1,818 |
| 127 | most | 1,817 |
| 128 | data. | 1,816 |
| 129 | object | 1,796 |
| 130 | trained | 1,794 |
| 131 | better | 1,764 |
| 132 | high | 1,725 |
| 133 | recognition | 1,716 |
| 134 | visual | 1,708 |
| 135 | function | 1,691 |
| 136 | datasets | 1,688 |
| 137 | approaches | 1,676 |
| 138 | study | 1,640 |
| 139 | then | 1,624 |
| 140 | optimization | 1,622 |
| 141 | input | 1,601 |
| 142 | some | 1,552 |
| 143 | language | 1,547 |
| 144 | introduce | 1,531 |
| 145 | they | 1,528 |
| 146 | through | 1,523 |
| 147 | representation | 1,519 |
| 148 | without | 1,502 |
| 149 | semantic | 1,480 |
| 150 | via | 1,478 |
| 151 | compared | 1,469 |
| 152 | order | 1,465 |
| 153 | given | 1,464 |
| 154 | real | 1,461 |
| 155 | efficient | 1,430 |
| 156 | segmentation | 1,418 |
| 157 | under | 1,417 |
| 158 | important | 1,412 |
| 159 | tasks | 1,406 |
| 160 | structure | 1,392 |
| 161 | prediction | 1,391 |
| 162 | For | 1,389 |
| 163 | various | 1,370 |
| 164 | improve | 1,357 |
| 165 | any | 1,355 |
| 166 | problems | 1,348 |
| 167 | single | 1,343 |
| 168 | knowledge | 1,343 |
| 169 | linear | 1,340 |
| 170 | very | 1,339 |
| 171 | recent | 1,338 |
| 172 | computational | 1,332 |
| 173 | achieve | 1,315 |
| 174 | systems | 1,302 |
| 175 | three | 1,289 |
| 176 | It | 1,282 |
| 177 | may | 1,277 |
| 178 | outperforms | 1,263 |
| 179 | Network | 1,257 |
| 180 | process | 1,256 |
| 181 | often | 1,256 |
| 182 | local | 1,249 |
| 183 | challenging | 1,248 |
| 184 | techniques | 1,243 |
| 185 | video | 1,237 |
| 186 | simple | 1,236 |
| 187 | standard | 1,221 |
| 188 | including | 1,217 |
| 189 | significantly | 1,213 |
| 190 | same | 1,213 |
| 191 | best | 1,206 |
| 192 | was | 1,200 |
| 193 | complex | 1,194 |
| 194 | optimal | 1,188 |
| 195 | natural | 1,187 |
| 196 | architecture | 1,180 |
| 197 | due | 1,170 |
| 198 | further | 1,169 |
| 199 | about | 1,163 |
| 200 | available | 1,157 |
state-of-the-artが79位に入っており、前回のComputer Vision and Pattern Recognitionの12月登録分のみに偏りがあったわけではない事が裏付けられた。しかし、一般的な英単語以外の上位に出てきた単語もimage,images, 3D, video, convolutionalなど、Computer Vision and Pattern Recognitionの12月登録分の上位陣に似ている。六分野を一気に分類できたら楽だが、やはりここは丁寧に6分野毎の単語出現傾向を調べて、クラスタリングを行う方がよさそう。


コメント