AWS Free Datasets: Part 2

Mar 24, 2024 by mtxvp

Continuing from Part 1 of select AWS datasets, let's look at few more datasets :

📓 Pre- and post-purchase product questions

This dataset provides product related questions, including their textual content and gap, in hours, between purchase and posting time. Each question is also associated with related product details, including its id and title. Questions extracted from the Amazon website.

📕 Open City Model (OCM)

Open City Model is an initiative to provide cityGML data for all the buildings in the United States. By using other open datasets in conjunction with our own code and algorithms it is our goal to provide 3D geometries for every US building. This data contains roughly 125 million buildings.

📘 Daylight Map Distribution of OpenStreetMap

Daylight is a complete distribution of global, open map data that’s freely available with support from community and professional mapmakers.In addition to the standard OpenStreetMap PBF format, Daylight is available in two parquet formats that are optimized for AWS Athena including geometries (Points, LineStrings, Polygons, or MultiPolygons). Daylight OSM Features contains the nearly 1B renderable OSM features; all of OpenStreetMap data, including all 7B nodes without attributes, and relations that do not contain geometries, such as turn restrictions. Daylight Earth Table is a new data schema that classifies OpenStreetMap-style tags into a 3-level ontology (theme, class, subclass) and is the result of running the earth table classification over the latest release (v1.18) of the Daylight Map Distribution.

📗 Natural Earth

Natural Earth is a public domain map dataset available at 1:10m, 1:50m, and 1:110 million scales, featuring tightly integrated vector and raster data. We developed Natural Earth as a convenient resource for making custom maps. Unlike other map data intended for scientific analysis or military mapping, Natural Earth is designed to meet the needs of production cartographers using a variety of software applications. Maximum flexibility is a goal.

📔 Smithsonian Open Access

The Smithsonian’s mission is the "increase and diffusion of knowledge" and has been collecting since 1846. The Smithsonian, through its efforts to digitize its multidisciplinary collections, has created millions of digital assets and related metadata describing the collection objects. On February 25th, 2020, the Smithsonian released over 2.8 million CC0 interdisciplinary 2-D and 3-D images, related metadata, and additionally, research data from researches across the Smithsonian. The 2.8 million "open access" collections are a subset of the Smithsonian’s 155 million objects, 2.1 million library volumes and 156,000 cubic feet of archival collections held in 19 museums, 9 research centers, libraries, archives and the National Zoo. Digitization of collections is ongoing.

dataset | opendata