The Datasette Ecosystem

Datasette sits at the center of a growing ecosystem of open source tools aimed at making it as easy as possible to gather, analyze and publish interesting data.

These tools are divided into two main groups: tools for building SQLite databases (for use with Datasette) and plugins that extend Datasette’s functionality.

Tools for creating SQLite databases

csvs-to-sqlite

csvs-to-sqlite lets you take one or more CSV files and load them into a SQLite database. It can also extract repeated columns out into a separate table and configure SQLite full-text search against the contents of specific columns.

sqlite-utils

sqlite-utils is a Python library and CLI tool that provides shortcuts for loading data into SQLite. It can be used programmatically (e.g. in a Jupyter notebook) to load data, and will automatically create SQLite tables with the necessary schema.

The CLI tool can consume JSON streams directly and use them to create tables. It can also be used to query SQLite databases and output the results as CSV or JSON.

See sqlite-utils: a Python library and CLI tool for building SQLite databases for more.

db-to-sqlite

db-to-sqlite is a CLI tool that builds on top of SQLAlchemy and allows you to connect to any database supported by that library (including MySQL, oracle and PostgreSQL), run a SQL query and save the results to a new table in a SQLite database.

You can mirror an entire database (including copying foreign key relationships) with the --all option:

$ db-to-sqlite --connection="postgresql://simonw@localhost/myblog" --all blog.db

dbf-to-sqlite

dbf-to-sqlite works with dBase files such as those produced by Visual FoxPro. It is a command-line tool that can convert one or more .dbf file to tables in a SQLite database.

markdown-to-sqlite

markdown-to-sqlite reads Markdown files with embedded YAML metadata (e.g. for Jekyll Front Matter) and creates a SQLite table with a schema matching the metadata. This is useful if you want to keep structured data in text form in a GitHub repository and use that to build a SQLite database.

socrata2sql

socrata2sql is a tool by Andrew Chavez at the Dallas Morning News. It works with Socrata, a widely used platform for local and national government open data portals. It uses the Socrata API to pull down government datasets and store them in a local SQLite database (it can also export data to PostgreSQL, MySQL and other SQLAlchemy-supported databases).

For example, to create a SQLite database of the City of Dallas Payment Register you would run the following command:

$ socrata2sql insert www.dallasopendata.com 64pp-jeba

Datasette Plugins

Datasette’s plugin system makes it easy to enhance Datasette with additional functionality.

datasette-cluster-map

datasette-cluster-map is the original Datasette plugin, described in Datasette plugins, and building a clustered map visualization.

The plugin works against any table with latitude and longitude columns. It can load over 100,000 points onto a map to visualize the geographical distribution of the underlying data.

datasette-vega

datasette-vega exposes the powerful Vega charting library, allowing you to construct line, bar and scatter charts against your data and share links to your visualizations.

datasette-auth-github

datasette-auth-github adds an authentication layer to Datasette. Users will have to sign in using their GitHub account before they can view data or interact with Datasette. You can also use it to restrict access to specific GitHub users, or to members of specified GitHub organizations or teams.

datasette-json-html

datasette-json-html renders HTML in Datasette’s table view driven by JSON returned from your SQL queries. This provides a way to embed images, links and lists of links directly in Datasette’s main interface, defined using custom SQL statements.

datasette-jellyfish

datasette-jellyfish exposes custom SQL functions for a range of common fuzzy string matching functions, including soundex, porter stemming and levenshtein distance. It builds on top of the Jellyfish Python library.

datasette-doublemetaphone

datasette-doublemetaphone by Matthew Somerville adds custom SQL functions for applying the Double Metaphone fuzzy “sounds like” algorithm.

datasette-jq

datasette-jq adds a custom SQL function for filtering and transforming values from JSON columns using the jq expression language.

datasette-render-images

datasette-render-images works with SQLite tables that contain binary image data in BLOB columns. It converts any images it finds into data-uri image elements, allowing you to view them directly in the Datasette interface.

datasette-render-binary

datasette-render-binary renders binary data in a slightly more readable fashion: it shows ASCII characters as they are, and shows all other data as monospace octets. Useful as a tool for exploring new unfamiliar databases as it makes it easier to spot if a binary column may contain a decipherable binary format.

datasette-pretty-json

datasette-pretty-json seeks out JSON values in Datasette’s table browsing interface and pretty-prints them, making them easier to read.

datasette-sqlite-fts4

datasette-sqlite-fts4 provides search relevance ranking algorithms that can be used with SQLite’s FTS4 search module. You can read more about it in Exploring search relevance algorithms with SQLite.

datasette-bplist

datasette-bplist provides tools for working with Apple’s binary plist format embedded in SQLite database tables. If you use OS X you already have dozens of SQLite databases hidden away in your ~/Library folder that include data in this format - this plugin allows you to view the decoded data and run SQL queries against embedded values using a bplist_to_json(value) custom SQL function.

datasette-cors

datasette-cors allows you to configure CORS headers for your Datasette instance. You can use this to enable JavaScript running on a whitelisted set of domains to make fetch() calls to the JSON API provided by your Datasette instance.