Free Open-Source Test Databases

Published by Cristian Scutaru on

Besides the sample databases included for test or practice by RDBMS vendors – presented in a separate article – there are many relational databases that you can download and use for free in your tests. Some have the SQL scripts shared from repositories such as GitHub. Others are published by various people on their websites. Book authors also expose their source code, with database scripts. All databases presented here are relational, with open-source SQL scrips, and can be used for free.

Table of Contents

Employees Test Databases

This is a very simple database, with just a few tables, one database view, around a dozen entries per table and basic relationships: one self-join and two one-to-many relationships.

EmployeesQX (from the “Employees for Query Xtractor” initial test database) is our own small generic test database. It is also included in the setup files with db-specific SQL create/insert scripts. And as ready-to-use SQLite, SQL CE, SQL Anywhere, Firebird and Microsoft Access binary database files. Published now as free open-source database on GitHub, it offers SQL scripts for many other relational database systems, such as Oracle and IBM Db2, Microsoft SQL Server and Azure, PostgreSQL, MySQL and MariaDB, Amazon Redshift and Aurora, SAP/Sybase ASE. More about our Employee database used as a sample here.

The database has been inspired from early versions of some Oracle or Db2 samples. Oracle’s scott/tiger (scott schema/user name and tiger password) was such an early sample. IBM Db2’s sample database (called indeed sample) is another similar example.

Employees Test Database

Each department (dept) links to all the employees (emp) from that department. With emp shape selected, the previous Data Xtractor model diagram shows a lookup of all emp records below. Both model and data are available only in Data Xtractor. Model Xtractor handles just database models, and Query Xtractor just designed queries, through a visual SQL query builder.

Some employees work on projects (proj) between a start and end date. Managers is a simple test view, with all manager-subordinate name pairs. And datatypes table has database-specific data types.

Here is a list with the number of rows per table, for an Oracle database. Remark the estimated Size and the number of columns in datatypes is database-specific. One single entry in datatypes is enough to test each specific data type.

Employees Test Database Tables

The table row count and the eventual size per table are estimated calculated values. If your database changes, they must be re-calculated (for many tables, this may take a while). To do so, show all table names, under the Tables node, in the Objects viewer. Select all the list items, right-click and call the contextual Update Row Count menu command.

Chinook Test Database

Chinook is another generic test database, created a while ago by Luis Rocha. You can download from GitHub the latest SQL scripts for Microsoft SQL Server and SQL Server Compact, SQLite, PostgreSQL and MySQL, Oracle and DB2. Chinook is the second sample database we also include as SQLite, SQL CE, SQL Anywhere, Firebird and Microsoft Access database files in all our Xtractor applications. With SQL scripts for all our supported databases.

Chinook is inspired from the classic Sakila sample database from MySQL. But while Sakila keeps DVDs with movies, Chinook is a back-end database of a fictional music DVD store. Most data was collected by the author from iTunes.

Chinook Test Database

There are two main areas: inventory and billing. Inventory keeps the catalog of Tracks for each Album, with albums produced by Artists. MediaType and Genre act like categories. Tracks can be also grouped within Playlists.

Invoice is the main billing table, with order details kept in InvoiceLine. Store’s Employees sell Tracks to Customers all over the world.

The table below shows a large enough number of records in Invoice and InvoiceLine (for billing), and Tracks (for inventory) with many of them saved in Playlists:

Chinook Test Database Tables

SportsDB Test Databases

SportsDB is an open-source database with real statistical data for different sports (baseball, American football, ice hockey, tennis, basketball etc), licensed under Creative Commons GNU GPL. You can download either the SQL script files for MySQL or PostgreSQL, or a detached Microsoft SQL Server MDF database file. The SQL script is over 15MB in size, so attaching and using rather the SQL Server database could be more practical.

The database has over 100 tables, many of them isolated, with no relationships and foreign keys. The group below shows a bunch of related tables as collapsed shapes.

SportsDB Test Databases

This is a database small in size, but with many tables. The great part is it contains real data, and if you’re a sports fan you can enjoy it even more. It could be interesting to find associations between related tables and data. And drill-down on different events.

SportsDB Test Databases Tables

Baseball Test Databases

Sean Lahman, former sports reporter and columnist, created a freely available database of baseball statistics. The database contains batting and pitching statistics from 1871 to 2018, plus other statistical real data. The work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

You can download the archived files with either a MDB Microsoft Access database (to load directly in Data Xtractor), a Microsoft SQL Server bak file or compressed CSV (Comma-Separated Values) files. The database has 24 tables with no referential integrity set, i.e. no foreign keys defined. We kept just a few tables expanded here below:

Baseball Test Database

This is a large database in size, with 70MB for the decompressed MDB Access file. Some tables like Fielding, Batting and Appearances contain over 100,000 rows, so that’s a good test database for data analysis.

Baseball Test Database Tables

GolfClub Microsoft Access Database

Beginning SQL Queries was written by Clare Churther in 2016, and published by Apress. The SQL scripts and an ACCDB Microsoft Access database file are shared on Github. This test database is small, but you can walk through the defined relationships to query data in different ways, that’s a good practice.

GolfClub Microsoft Access Test Database

The database is very small in size, with just a few records per table. But this makes it also ideal for practice and to learn SQL, whether you buy the book or not.

GolfClub Microsoft Access Database Tables

Waterfalls Test Database

SQL Pocket Guide was written by Jonathan Gennick in 2010, and the source code is publicly available on O’Reilly’s website, whether you buy the book or not. The author created SQL database scripts for Microsoft SQL Server, Oracle, IBM Db2, PostgreSQL and MySQL/MariaDB.

The database has 14 tables and one view, and we called it Waterfalls. This original repository contains an inventory with a few waterfalls across the US.

Waterfalls Test Database

There are just 16 waterfalls recorded in the upfall table, but enough for the use cases described in the book. The database is rather sparse, as many tables are empty.

Waterfalls Test Database Tables

Bank Test Database

This large enough database has been used by Alan Beaulieu in his Learning SQL book, published in 2009. The SQL database create script can still be found on O’Reilly’s website. You can use it to create tables in a new MySQL or MariaDB database, and populate them with data. You may also have to call SET SQL_SAFE_UPDATES = 0; prior running the script.

The database keeps transactions processed by a bank with multiple branches. Customers have bank accounts. Bank employees are grouped by department, just like in the Employees database. The ER model is interesting to analyze and play with:

Bank Test Database

There are 11 tables and no views. As size, this is a very small database, with just a few entries per every table, but enough to demo all use cases described in the book.

Bank Test Database Tables

WordPress Internal Database

Everybody heard about WordPress, and chances are you already use one or more remote MySQL or MariaDB WordPress databases for your websites.

This is a typical database with no foreign keys to enforce the referential integrity. The database diagram below shows just the basic tables (with names usually prefixed by wp_), that in practice can be joined by other tables created by plugins.

WordPress Test Database

Both posts and pages are kept in wp_posts, While wp_comments can keep blog comments, when allowed. These are the main database tables.

Categories: Data Modeling

Cristian Scutaru

I designed and implemented the Data Xtractor suite, with Model Xtractor, Query Xtractor, and Visual Xtractor as separate modules. I am a software architect and developer with over 30 years professional experience. I’ve been working with relational databases for almost three decades and I was constantly unhappy with the relative limitation of those tools used to connect directly to a platform, and instantly extract and display data in flexible ways.