1.arxiv.orgの人工知能の論文を分類したい(7)
・2017年にarxiv.orgの代表的な人工知能6カテゴリに登録された論文の概要から単語傾向を調べた
・state-of-the-artはやはり人気が高く79位にランクインしている
・年間登録論文数は約13700論文だが12月のCVのみと単語出現傾向が似ていて一括分類は無理そう
2.arxiv.orgの代表的な人工知能6カテゴリの単語出現傾向
2017年にarxiv.orgに登録された代表的な6カテゴリの概要をクローラーで取得し単語数を数えた表が下記
ああ順番 |
単語 | 出現回数 |
1 | the | 109,397 |
2 | of | 76,443 |
3 | and | 60,080 |
4 | a | 55,859 |
5 | to | 52,235 |
6 | in | 35,461 |
7 | for | 30,482 |
8 | is | 27,326 |
9 | that | 23,573 |
10 | on | 21,859 |
11 | with | 18,629 |
12 | we | 18,485 |
13 | We | 16,074 |
14 | this | 13,751 |
15 | are | 13,694 |
16 | as | 13,262 |
17 | by | 12,773 |
18 | from | 11,724 |
19 | an | 11,678 |
20 | The | 11,369 |
21 | In | 9,969 |
22 | can | 9,611 |
23 | which | 9,476 |
24 | learning | 9,077 |
25 | be | 8,666 |
26 | our | 8,463 |
27 | data | 7,641 |
28 | model | 7,556 |
29 | using | 6,980 |
30 | method | 6,102 |
31 | show | 5,928 |
32 | proposed | 5,778 |
33 | neural | 5,497 |
34 | based | 5,493 |
35 | propose | 5,363 |
36 | network | 5,244 |
37 | results | 5,093 |
38 | approach | 5,061 |
39 | This | 4,983 |
40 | or | 4,961 |
41 | such | 4,829 |
42 | it | 4,800 |
43 | deep | 4,756 |
44 | image | 4,747 |
45 | have | 4,735 |
46 | has | 4,471 |
47 | performance | 4,376 |
48 | models | 4,249 |
49 | methods | 4,151 |
50 | algorithm | 4,072 |
51 | new | 4,054 |
52 | different | 4,015 |
53 | two | 4,013 |
54 | training | 3,986 |
55 | also | 3,939 |
56 | Our | 3,888 |
57 | problem | 3,856 |
58 | used | 3,827 |
59 | these | 3,822 |
60 | not | 3,711 |
61 | between | 3,584 |
62 | at | 3,533 |
63 | more | 3,518 |
64 | networks | 3,464 |
65 | both | 3,427 |
66 | paper, | 3,408 |
67 | use | 3,237 |
68 | A | 3,210 |
69 | their | 3,137 |
70 | been | 3,128 |
71 | paper | 3,105 |
72 | novel | 3,014 |
73 | information | 3,012 |
74 | Learning | 2,967 |
75 | over | 2,956 |
76 | its | 2,955 |
77 | features | 2,926 |
78 | each | 2,925 |
79 | state-of-the-art | 2,906 |
80 | present | 2,893 |
81 | than | 2,851 |
82 | demonstrate | 2,850 |
83 | into | 2,835 |
84 | images | 2,764 |
85 | number | 2,735 |
86 | classification | 2,730 |
87 | where | 2,603 |
88 | framework | 2,560 |
89 | large | 2,540 |
90 | algorithms | 2,530 |
91 | feature | 2,468 |
92 | when | 2,462 |
93 | Neural | 2,440 |
94 | only | 2,438 |
95 | To | 2,407 |
96 | other | 2,383 |
97 | However, | 2,343 |
98 | Deep | 2,327 |
99 | one | 2,325 |
100 | system | 2,305 |
101 | set | 2,294 |
102 | but | 2,235 |
103 | machine | 2,234 |
104 | time | 2,196 |
105 | first | 2,129 |
106 | analysis | 2,106 |
107 | well | 2,088 |
108 | detection | 2,077 |
109 | existing | 2,075 |
110 | accuracy | 2,062 |
111 | convolutional | 2,046 |
112 | how | 2,039 |
113 | many | 2,037 |
114 | Networks | 1,983 |
115 | human | 1,974 |
116 | while | 1,962 |
117 | all | 1,953 |
118 | task | 1,934 |
119 | work | 1,915 |
120 | provide | 1,913 |
121 | 3D | 1,885 |
122 | multiple | 1,847 |
123 | learn | 1,843 |
124 | dataset | 1,843 |
125 | experiments | 1,835 |
126 | several | 1,818 |
127 | most | 1,817 |
128 | data. | 1,816 |
129 | object | 1,796 |
130 | trained | 1,794 |
131 | better | 1,764 |
132 | high | 1,725 |
133 | recognition | 1,716 |
134 | visual | 1,708 |
135 | function | 1,691 |
136 | datasets | 1,688 |
137 | approaches | 1,676 |
138 | study | 1,640 |
139 | then | 1,624 |
140 | optimization | 1,622 |
141 | input | 1,601 |
142 | some | 1,552 |
143 | language | 1,547 |
144 | introduce | 1,531 |
145 | they | 1,528 |
146 | through | 1,523 |
147 | representation | 1,519 |
148 | without | 1,502 |
149 | semantic | 1,480 |
150 | via | 1,478 |
151 | compared | 1,469 |
152 | order | 1,465 |
153 | given | 1,464 |
154 | real | 1,461 |
155 | efficient | 1,430 |
156 | segmentation | 1,418 |
157 | under | 1,417 |
158 | important | 1,412 |
159 | tasks | 1,406 |
160 | structure | 1,392 |
161 | prediction | 1,391 |
162 | For | 1,389 |
163 | various | 1,370 |
164 | improve | 1,357 |
165 | any | 1,355 |
166 | problems | 1,348 |
167 | single | 1,343 |
168 | knowledge | 1,343 |
169 | linear | 1,340 |
170 | very | 1,339 |
171 | recent | 1,338 |
172 | computational | 1,332 |
173 | achieve | 1,315 |
174 | systems | 1,302 |
175 | three | 1,289 |
176 | It | 1,282 |
177 | may | 1,277 |
178 | outperforms | 1,263 |
179 | Network | 1,257 |
180 | process | 1,256 |
181 | often | 1,256 |
182 | local | 1,249 |
183 | challenging | 1,248 |
184 | techniques | 1,243 |
185 | video | 1,237 |
186 | simple | 1,236 |
187 | standard | 1,221 |
188 | including | 1,217 |
189 | significantly | 1,213 |
190 | same | 1,213 |
191 | best | 1,206 |
192 | was | 1,200 |
193 | complex | 1,194 |
194 | optimal | 1,188 |
195 | natural | 1,187 |
196 | architecture | 1,180 |
197 | due | 1,170 |
198 | further | 1,169 |
199 | about | 1,163 |
200 | available | 1,157 |
state-of-the-artが79位に入っており、前回のComputer Vision and Pattern Recognitionの12月登録分のみに偏りがあったわけではない事が裏付けられた。しかし、一般的な英単語以外の上位に出てきた単語もimage,images, 3D, video, convolutionalなど、Computer Vision and Pattern Recognitionの12月登録分の上位陣に似ている。六分野を一気に分類できたら楽だが、やはりここは丁寧に6分野毎の単語出現傾向を調べて、クラスタリングを行う方がよさそう。
コメント