arxiv.orgの人工知能の論文を分類したい（７）

１．arxiv.orgの人工知能の論文を分類したい（７）

・２０１７年にarxiv.orgの代表的な人工知能６カテゴリに登録された論文の概要から単語傾向を調べた
・state-of-the-artはやはり人気が高く７９位にランクインしている
・年間登録論文数は約13700論文だが12月のCVのみと単語出現傾向が似ていて一括分類は無理そう

２．arxiv.orgの代表的な人工知能６カテゴリの単語出現傾向

2017年にarxiv.orgに登録された代表的な６カテゴリの概要をクローラーで取得し単語数を数えた表が下記

ああ順番	単語	出現回数
1	the	109,397
2	of	76,443
3	and	60,080
4	a	55,859
5	to	52,235
6	in	35,461
7	for	30,482
8	is	27,326
9	that	23,573
10	on	21,859
11	with	18,629
12	we	18,485
13	We	16,074
14	this	13,751
15	are	13,694
16	as	13,262
17	by	12,773
18	from	11,724
19	an	11,678
20	The	11,369
21	In	9,969
22	can	9,611
23	which	9,476
24	learning	9,077
25	be	8,666
26	our	8,463
27	data	7,641
28	model	7,556
29	using	6,980
30	method	6,102
31	show	5,928
32	proposed	5,778
33	neural	5,497
34	based	5,493
35	propose	5,363
36	network	5,244
37	results	5,093
38	approach	5,061
39	This	4,983
40	or	4,961
41	such	4,829
42	it	4,800
43	deep	4,756
44	image	4,747
45	have	4,735
46	has	4,471
47	performance	4,376
48	models	4,249
49	methods	4,151
50	algorithm	4,072
51	new	4,054
52	different	4,015
53	two	4,013
54	training	3,986
55	also	3,939
56	Our	3,888
57	problem	3,856
58	used	3,827
59	these	3,822
60	not	3,711
61	between	3,584
62	at	3,533
63	more	3,518
64	networks	3,464
65	both	3,427
66	paper,	3,408
67	use	3,237
68	A	3,210
69	their	3,137
70	been	3,128
71	paper	3,105
72	novel	3,014
73	information	3,012
74	Learning	2,967
75	over	2,956
76	its	2,955
77	features	2,926
78	each	2,925
79	state-of-the-art	2,906
80	present	2,893
81	than	2,851
82	demonstrate	2,850
83	into	2,835
84	images	2,764
85	number	2,735
86	classification	2,730
87	where	2,603
88	framework	2,560
89	large	2,540
90	algorithms	2,530
91	feature	2,468
92	when	2,462
93	Neural	2,440
94	only	2,438
95	To	2,407
96	other	2,383
97	However,	2,343
98	Deep	2,327
99	one	2,325
100	system	2,305
101	set	2,294
102	but	2,235
103	machine	2,234
104	time	2,196
105	first	2,129
106	analysis	2,106
107	well	2,088
108	detection	2,077
109	existing	2,075
110	accuracy	2,062
111	convolutional	2,046
112	how	2,039
113	many	2,037
114	Networks	1,983
115	human	1,974
116	while	1,962
117	all	1,953
118	task	1,934
119	work	1,915
120	provide	1,913
121	3D	1,885
122	multiple	1,847
123	learn	1,843
124	dataset	1,843
125	experiments	1,835
126	several	1,818
127	most	1,817
128	data.	1,816
129	object	1,796
130	trained	1,794
131	better	1,764
132	high	1,725
133	recognition	1,716
134	visual	1,708
135	function	1,691
136	datasets	1,688
137	approaches	1,676
138	study	1,640
139	then	1,624
140	optimization	1,622
141	input	1,601
142	some	1,552
143	language	1,547
144	introduce	1,531
145	they	1,528
146	through	1,523
147	representation	1,519
148	without	1,502
149	semantic	1,480
150	via	1,478
151	compared	1,469
152	order	1,465
153	given	1,464
154	real	1,461
155	efficient	1,430
156	segmentation	1,418
157	under	1,417
158	important	1,412
159	tasks	1,406
160	structure	1,392
161	prediction	1,391
162	For	1,389
163	various	1,370
164	improve	1,357
165	any	1,355
166	problems	1,348
167	single	1,343
168	knowledge	1,343
169	linear	1,340
170	very	1,339
171	recent	1,338
172	computational	1,332
173	achieve	1,315
174	systems	1,302
175	three	1,289
176	It	1,282
177	may	1,277
178	outperforms	1,263
179	Network	1,257
180	process	1,256
181	often	1,256
182	local	1,249
183	challenging	1,248
184	techniques	1,243
185	video	1,237
186	simple	1,236
187	standard	1,221
188	including	1,217
189	significantly	1,213
190	same	1,213
191	best	1,206
192	was	1,200
193	complex	1,194
194	optimal	1,188
195	natural	1,187
196	architecture	1,180
197	due	1,170
198	further	1,169
199	about	1,163
200	available	1,157

state-of-the-artが７９位に入っており、前回のComputer Vision and Pattern Recognitionの12月登録分のみに偏りがあったわけではない事が裏付けられた。しかし、一般的な英単語以外の上位に出てきた単語もimage,images, 3D, video, convolutionalなど、Computer Vision and Pattern Recognitionの12月登録分の上位陣に似ている。六分野を一気に分類できたら楽だが、やはりここは丁寧に６分野毎の単語出現傾向を調べて、クラスタリングを行う方がよさそう。

１．arxiv.orgの人工知能の論文を分類したい（７）

２．arxiv.orgの代表的な人工知能６カテゴリの単語出現傾向

コメント