|
Lerman Twitter 2010 Dataset
|
3 |
2014-08-15 |
292.17MB |
3,438 | 10+ |
0 |
|
Yale YouTube Video Text
|
1 |
2014-10-20 |
434.77MB |
7,762 | 6+ |
1 |
|
Enwiki Word2vec model 1000 Dimensions
|
1 |
2015-04-09 |
8.63GB |
3,474 | 10 |
0 |
|
SMS Spam Collection Data Set
|
2 |
2015-11-28 |
695.38kB |
806 | 7+ |
0 |
|
Structured Web Data Extraction Dataset (SWDE)
|
1 |
2015-11-29 |
207.31MB |
2,773 | 4 |
0 |
|
Online News Popularity Data Set
|
1 |
2016-02-11 |
7.48MB |
3,072 | 3+ |
1 |
|
Sentiment Labelled Sentences Data Set
|
1 |
2016-08-26 |
512.21kB |
508 | 4+ |
1 |
|
MovieLens 20M Dataset
|
1 |
2016-12-16 |
198.70MB |
2,046 | 5+ |
0 |
|
Microsoft Academic Graph - 2016/02/05
|
1 |
2016-12-25 |
28.94GB |
249 | 2+ |
0 |
|
IMDb Large Movie Review Dataset
|
1 |
2018-10-16 |
26.40MB |
891 | 5+ |
0 |
|
Wikitext-103
|
1 |
2018-10-16 |
190.20MB |
592 | 7+ |
1 |
|
Wikitext-2
|
1 |
2018-10-16 |
4.07MB |
248 | 3+ |
1 |
|
WMT 2015 French/English parallel texts
|
1 |
2018-10-16 |
2.60GB |
1,723 | 3+ |
0 |
|
AG News
|
1 |
2018-10-16 |
11.78MB |
220 | 3+ |
0 |
|
Amazon reviews - Full
|
1 |
2018-10-16 |
643.70MB |
1,113 | 5+ |
0 |
|
Amazon reviews - Polarity
|
1 |
2018-10-16 |
688.34MB |
1,096 | 2+ |
0 |
|
DBPedia ontology
|
1 |
2018-10-16 |
68.34MB |
126 | 2+ |
1 |
|
Sogou news
|
1 |
2018-10-16 |
384.27MB |
256 | 2+ |
0 |
|
Yelp reviews - Full
|
1 |
2018-10-16 |
196.15MB |
384 | 2+ |
0 |
|
Yelp reviews - Polarity
|
1 |
2018-10-16 |
166.37MB |
441 | 2+ |
0 |
|
Indiana University - Chest X-Rays (XML Reports)
|
1 |
2018-11-22 |
1.11MB |
40,671 | 37+ |
0 |
|
30M Factoid Question-Answer Corpus (30MQA)
|
2 |
2018-11-29 |
529.34MB |
3,960 | 8+ |
1 |
|
Phishing corpus
|
4555 |
2019-01-02 |
37.48MB |
976 | 2+ |
0 |
|
Europarl v7 - training-parallel-europarl-v7.tgz (CS-EN, DE-EN, ES-EN, FR-EN)
|
1 |
2019-02-04 |
657.63MB |
48 | 2+ |
0 |
|
UN corpus - training-parallel-un.tgz (ES-EN, FR-EN)
|
1 |
2019-02-04 |
2.37GB |
57 | 2+ |
0 |
|
Common Crawl corpus - training-parallel-commoncrawl.tgz (CS-EN, DE-EN, ES-EN, FR-EN, RU-EN)
|
1 |
2019-02-04 |
918.31MB |
113 | 2+ |
0 |
|
Flickr8k Dataset
|
2 |
2019-03-09 |
1.12GB |
12,161 | 29+ |
0 |
|
OpenWebText (Gokaslan's distribution, 2019), GPT-2 Tokenized
|
395 |
2019-06-01 |
16.02GB |
207 | 3 |
0 |
|
r/WritingPrompts, Text (2018)
|
1 |
2019-06-19 |
87.47MB |
400 | 4 |
0 |
|
PMC Open Access Subset
|
16 |
2020-05-24 |
84.14GB |
238 | 4+ |
0 |
|
Reading Text in the Wild with Convolutional Neural Networks
|
1 |
2021-11-12 |
10.68GB |
39,705 | 28 |
0 |
|
Synthetic Data for Text Localisation in Natural Images
|
15 |
2021-11-15 |
73.50GB |
3,710 | 12 |
3 |