Free Open-Source Test Databases
Besides the sample databases included for test or practice by RDBMS vendors – presented in a separate article – there are many relational databases that you can download and use for free in your tests. Some have the SQL scripts shared from repositories such as GitHub. Others are published by various people on their websites. Book authors also expose their source code, with database scripts. All databases presented here are relational, with open-source SQL scrips, and can be used for free.
Table of Contents
Employees Test Databases
This is a very simple database, with just a few tables, one database view, around a dozen entries per table and basic relationships: one self-join and two one-to-many relationships.
EmployeesQX (from the “Employees for Query Xtractor” initial test database) is our own small generic test database. It is also included in the setup files with db-specific SQL create/insert scripts. And as ready-to-use SQLite, SQL CE, SQL Anywhere, Firebird and Microsoft Access binary database files. Published now as free open-source database on GitHub, it offers SQL scripts for many other relational database systems, such as Oracle and IBM Db2, Microsoft SQL Server and Azure, PostgreSQL, MySQL and MariaDB, Amazon Redshift and Aurora, SAP/Sybase ASE. More about our Employee database used as a sample here.
The database has been inspired from early versions of some Oracle or Db2 samples. Oracle’s scott/tiger (scott schema/user name and tiger password) was such an early sample. IBM Db2’s sample database (called indeed sample) is another similar example.
Each department (dept) links to all the employees (emp) from that department. With emp shape selected, the previous Data Xtractor model diagram shows a lookup of all emp records below. Both model and data are available only in Data Xtractor. Model Xtractor handles just database models, and Query Xtractor just designed queries, through a visual SQL query builder.
Some employees work on projects (proj) between a start and end date. Managers is a simple test view, with all manager-subordinate name pairs. And datatypes table has database-specific data types.
Here is a list with the number of rows per table, for an Oracle database. Remark the estimated Size and the number of columns in datatypes is database-specific. One single entry in datatypes is enough to test each specific data type.
The table row count and the eventual size per table are estimated calculated values. If your database changes, they must be re-calculated (for many tables, this may take a while). To do so, show all table names, under the Tables node, in the Objects viewer. Select all the list items, right-click and call the contextual Update Row Count menu command.
Chinook Test Database
Chinook is another generic test database, created a while ago by Luis Rocha. You can download from GitHub the latest SQL scripts for Microsoft SQL Server and SQL Server Compact, SQLite, PostgreSQL and MySQL, Oracle and DB2. Chinook is the second sample database we also include as SQLite, SQL CE, SQL Anywhere, Firebird and Microsoft Access database files in all our Xtractor applications. With SQL scripts for all our supported databases.
Chinook is inspired from the classic Sakila sample database from MySQL. But while Sakila keeps DVDs with movies, Chinook is a back-end database of a fictional music DVD store. Most data was collected by the author from iTunes.
There are two main areas: inventory and billing. Inventory keeps the catalog of Tracks for each Album, with albums produced by Artists. MediaType and Genre act like categories. Tracks can be also grouped within Playlists.
Invoice is the main billing table, with order details kept in InvoiceLine. Store’s Employees sell Tracks to Customers all over the world.
The table below shows a large enough number of records in Invoice and InvoiceLine (for billing), and Tracks (for inventory) with many of them saved in Playlists:
SportsDB Test Databases
SportsDB is an open-source database with real statistical data for different sports (baseball, American football, ice hockey, tennis, basketball etc), licensed under Creative Commons GNU GPL. You can download either the SQL script files for MySQL or PostgreSQL, or a detached Microsoft SQL Server MDF database file. The SQL script is over 15MB in size, so attaching and using rather the SQL Server database could be more practical.
The database has over 100 tables, many of them isolated, with no relationships and foreign keys. The group below shows a bunch of related tables as collapsed shapes.
This is a database small in size, but with many tables. The great part is it contains real data, and if you’re a sports fan you can enjoy it even more. It could be interesting to find associations between related tables and data. And drill-down on different events.
Baseball Test Databases
Sean Lahman, former sports reporter and columnist, created a freely available database of baseball statistics. The database contains batting and pitching statistics from 1871 to 2018, plus other statistical real data. The work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
You can download the archived files with either a MDB Microsoft Access database (to load directly in Data Xtractor), a Microsoft SQL Server bak file or compressed CSV (Comma-Separated Values) files. The database has 24 tables with no referential integrity set, i.e. no foreign keys defined. We kept just a few tables expanded here below:
This is a large database in size, with 70MB for the decompressed MDB Access file. Some tables like Fielding, Batting and Appearances contain over 100,000 rows, so that’s a good test database for data analysis.
GolfClub Microsoft Access Database
Beginning SQL Queries was written by Clare Churther in 2016, and published by Apress. The SQL scripts and an ACCDB Microsoft Access database file are shared on Github. This test database is small, but you can walk through the defined relationships to query data in different ways, that’s a good practice.
The database is very small in size, with just a few records per table. But this makes it also ideal for practice and to learn SQL, whether you buy the book or not.
Waterfalls Test Database
SQL Pocket Guide was written by Jonathan Gennick in 2010, and the source code is publicly available on O’Reilly’s website, whether you buy the book or not. The author created SQL database scripts for Microsoft SQL Server, Oracle, IBM Db2, PostgreSQL and MySQL/MariaDB.
The database has 14 tables and one view, and we called it Waterfalls. This original repository contains an inventory with a few waterfalls across the US.
There are just 16 waterfalls recorded in the upfall table, but enough for the use cases described in the book. The database is rather sparse, as many tables are empty.
Bank Test Database
This large enough database has been used by Alan Beaulieu in his Learning SQL book, published in 2009. The SQL database create script can still be found on O’Reilly’s website. You can use it to create tables in a new MySQL or MariaDB database, and populate them with data. You may also have to call SET SQL_SAFE_UPDATES = 0; prior running the script.
The database keeps transactions processed by a bank with multiple branches. Customers have bank accounts. Bank employees are grouped by department, just like in the Employees database. The ER model is interesting to analyze and play with:
There are 11 tables and no views. As size, this is a very small database, with just a few entries per every table, but enough to demo all use cases described in the book.
WordPress Internal Database
Everybody heard about WordPress, and chances are you already use one or more remote MySQL or MariaDB WordPress databases for your websites.
This is a typical database with no foreign keys to enforce the referential integrity. The database diagram below shows just the basic tables (with names usually prefixed by wp_), that in practice can be joined by other tables created by plugins.
Both posts and pages are kept in wp_posts, While wp_comments can keep blog comments, when allowed. These are the main database tables.