Data Links & Resources

Datasets

The home of the U.S. Government’s open data: Here you will find data, tools, and resources to conduct research, develop web and mobile applications, design data visualizations, and more. https://www.data.gov

Awesome Public Datasets
https://github.com/awesomedata/awesome-public-datasets

Kaggle is great because it promotes the use of different dataset publication formats. However, the better part is that it strongly recommends that the dataset publishers share their data in an accessible, non-proprietary format. 
https://www.kaggle.com/datasets

Google’s Open Images Dataset
https://storage.googleapis.com/openimages/web/index.html

UCI Machine Learning Dataset Repository
https://archive.ics.uci.edu/ml/index.php

COVID-19 Open Research Dataset
https://www.semanticscholar.org/cord19

Best FREE Datasets | Open-Source data for machine learning projects

10 Popular Machine Learning Datasets, Explained

How to Create a Dataset for Machine Learning

Instagram-Scraper: A command-line application written in Python that scrapes and downloads an instagram user’s photos and videos.
https://github.com/arc298/instagram-scraper

Web Scraping With Python 101

Google Images Download: Python Script for ‘searching’ and ‘downloading’ hundreds of Google images to the local hard disk!
https://github.com/hardikvasa/google-images-download

Memory of the World Library
https://library.memoryoftheworld.org

EasyOCR: Ready-to-use optical character recognition with 70+ languages supported including Chinese, Japanese, Korean and Thai.
https://github.com/JaidedAI/EasyOCR

Sound Maker made using Nsynth: a research project that trained a neural network on over 300,000 instrument sounds.
https://experiments.withgoogle.com/ai/sound-maker/view/

NSynth Dataset
https://magenta.tensorflow.org/datasets/nsynth

How Machine Learning Is Generating Strange, New Sounds

Listening to data from the Large Hadron Collider