arxiv.orgの人工知能の論文を分類したい(7)

arxiv.orgの人工知能の論文を分類したい(7)

1.arxiv.orgの人工知能の論文を分類したい(7)

・2017年にarxiv.orgの代表的な人工知能6カテゴリに登録された論文の概要から単語傾向を調べた
・state-of-the-artはやはり人気が高く79位にランクインしている
・年間登録論文数は約13700論文だが12月のCVのみと単語出現傾向が似ていて一括分類は無理そう

2.arxiv.orgの代表的な人工知能6カテゴリの単語出現傾向

2017年にarxiv.orgに登録された代表的な6カテゴリの概要をクローラーで取得し単語数を数えた表が下記

 

 

ああ順番

単語 出現回数
1 the 109,397
2 of 76,443
3 and 60,080
4 a 55,859
5 to 52,235
6 in 35,461
7 for 30,482
8 is 27,326
9 that 23,573
10 on 21,859
11 with 18,629
12 we 18,485
13 We 16,074
14 this 13,751
15 are 13,694
16 as 13,262
17 by 12,773
18 from 11,724
19 an 11,678
20 The 11,369
21 In 9,969
22 can 9,611
23 which 9,476
24 learning 9,077
25 be 8,666
26 our 8,463
27 data 7,641
28 model 7,556
29 using 6,980
30 method 6,102
31 show 5,928
32 proposed 5,778
33 neural 5,497
34 based 5,493
35 propose 5,363
36 network 5,244
37 results 5,093
38 approach 5,061
39 This 4,983
40 or 4,961
41 such 4,829
42 it 4,800
43 deep 4,756
44 image 4,747
45 have 4,735
46 has 4,471
47 performance 4,376
48 models 4,249
49 methods 4,151
50 algorithm 4,072
51 new 4,054
52 different 4,015
53 two 4,013
54 training 3,986
55 also 3,939
56 Our 3,888
57 problem 3,856
58 used 3,827
59 these 3,822
60 not 3,711
61 between 3,584
62 at 3,533
63 more 3,518
64 networks 3,464
65 both 3,427
66 paper, 3,408
67 use 3,237
68 A 3,210
69 their 3,137
70 been 3,128
71 paper 3,105
72 novel 3,014
73 information 3,012
74 Learning 2,967
75 over 2,956
76 its 2,955
77 features 2,926
78 each 2,925
79 state-of-the-art 2,906
80 present 2,893
81 than 2,851
82 demonstrate 2,850
83 into 2,835
84 images 2,764
85 number 2,735
86 classification 2,730
87 where 2,603
88 framework 2,560
89 large 2,540
90 algorithms 2,530
91 feature 2,468
92 when 2,462
93 Neural 2,440
94 only 2,438
95 To 2,407
96 other 2,383
97 However, 2,343
98 Deep 2,327
99 one 2,325
100 system 2,305
101 set 2,294
102 but 2,235
103 machine 2,234
104 time 2,196
105 first 2,129
106 analysis 2,106
107 well 2,088
108 detection 2,077
109 existing 2,075
110 accuracy 2,062
111 convolutional 2,046
112 how 2,039
113 many 2,037
114 Networks 1,983
115 human 1,974
116 while 1,962
117 all 1,953
118 task 1,934
119 work 1,915
120 provide 1,913
121 3D 1,885
122 multiple 1,847
123 learn 1,843
124 dataset 1,843
125 experiments 1,835
126 several 1,818
127 most 1,817
128 data. 1,816
129 object 1,796
130 trained 1,794
131 better 1,764
132 high 1,725
133 recognition 1,716
134 visual 1,708
135 function 1,691
136 datasets 1,688
137 approaches 1,676
138 study 1,640
139 then 1,624
140 optimization 1,622
141 input 1,601
142 some 1,552
143 language 1,547
144 introduce 1,531
145 they 1,528
146 through 1,523
147 representation 1,519
148 without 1,502
149 semantic 1,480
150 via 1,478
151 compared 1,469
152 order 1,465
153 given 1,464
154 real 1,461
155 efficient 1,430
156 segmentation 1,418
157 under 1,417
158 important 1,412
159 tasks 1,406
160 structure 1,392
161 prediction 1,391
162 For 1,389
163 various 1,370
164 improve 1,357
165 any 1,355
166 problems 1,348
167 single 1,343
168 knowledge 1,343
169 linear 1,340
170 very 1,339
171 recent 1,338
172 computational 1,332
173 achieve 1,315
174 systems 1,302
175 three 1,289
176 It 1,282
177 may 1,277
178 outperforms 1,263
179 Network 1,257
180 process 1,256
181 often 1,256
182 local 1,249
183 challenging 1,248
184 techniques 1,243
185 video 1,237
186 simple 1,236
187 standard 1,221
188 including 1,217
189 significantly 1,213
190 same 1,213
191 best 1,206
192 was 1,200
193 complex 1,194
194 optimal 1,188
195 natural 1,187
196 architecture 1,180
197 due 1,170
198 further 1,169
199 about 1,163
200 available 1,157

 

state-of-the-artが79位に入っており、前回のComputer Vision and Pattern Recognitionの12月登録分のみに偏りがあったわけではない事が裏付けられた。しかし、一般的な英単語以外の上位に出てきた単語もimage,images, 3D, video, convolutionalなど、Computer Vision and Pattern Recognitionの12月登録分の上位陣に似ている。六分野を一気に分類できたら楽だが、やはりここは丁寧に6分野毎の単語出現傾向を調べて、クラスタリングを行う方がよさそう。