Innotescus has compiled a list of public datasets your team can utilize on your next machine learning project.
General-Purpose Image Datasets
COCO: (Common Objects in Context) is a large-scale dataset object detection, segmentation, and captioning dataset.
ImageNet: A collaboration between Stanford and Princeton; this dataset spans 1000 object classes and contains 1,281,167 training images, 50,000 validation images and 100,000 test images.
Data Europa: The official portal for European data. This dataset contains data related to economics, agriculture, education, employment, climate, finance, science, and more.
Video Datasets
YouTube 8-M: Human-verified labels on approximately 237K segments with 1000 classes are collected from the validation set of the YouTube-8M dataset.
Kinetics-700: A high-quality dataset of video URLs. Each clip is human annotated with a single action class and lasts around 10 seconds.
Autonomous Driving Datasets
Waymo: A great source of data for a wide range of tasks in autonomous driving.
nuScenes: Large-scale public dataset for autonomous driving developed by the team at Motional (formerly nuTonomy).
Lyft Level 5: Level 5 is collecting and processing data from our autonomous fleet and sharing it with you.
Berkley Deep Drive: Over 100k videos of driving experiences, each running 40 seconds at 30 frames per second.
Unmanned Autonomous Vehicle Datasets
SenseFly: A collection of aerial videos that can be used to train a variety of unmanned autonomous vehicles.
Stanford Drone Dataset: The dataset includes images and videos of various types of agents (not just pedestrians, but also bicyclists, skateboarders, cars, buses, and golf carts) that navigate in a real-world outdoor environment such as a university campus.
Medical Datasets
OpenfMRI: Dedicated to the free and open sharing of raw magnetic resonance imaging (MRI) datasets.
Centar Labs: Explore datasets by size, category, modality (including X-ray, Ultrasound, Whole Slide Images, CT Scans, ECGs), and more.
Dataset Collections
Kagle: Our favorite source for free datasets, collaboration, and competition is Kaggle.
Microsoft Research Open Data: Since 2018 Microsoft research open data has been collaborating across the research community to collect datasets for a variety of categories.