Use SQLite to Improve Excel Spreadsheets
4 min read

Use SQLite to Improve Excel Spreadsheets

I'm often asked to find rows in one list that aren't in another list (or some similar task) and I spend too much time with vlookup as a result. I hate vlookup. It's finicky and I feel like it never works right. If it works for you then that's fine. But I'm ditching it in favor of plain old sql.

I find it easier to export my spreadsheets as CSV files, import them into SQLite, and run regular old sql queries. It's surprisingly fast and easy. Do whatever you want with your data. Stop looking at Excel formula reference websites and instead use the SQL statements that you already know and love.

SQLite is fast, self-contained, lightweight, and is file-based. It's very popular and is often used as a file format.

It's even easy to script using Powershell. More on that towards the end.

A Tale of Two Spreadsheets

Here are two spreadsheets with fake data in them. Which email addresses exist in both? We'll figure it out quickly with SQLite.

Import the Data into SQLite

First you need to download SQLite. Download the "bundle of command line tools" for your operating system from the downloads page.

Export both spreadsheets as CSV files and put them in the same directory as the SQLite exes you downloaded - it's just easier to reference in our examples without having to use fully-qualified paths.

Here they are for you to download:
email1.csv
email2.csv

Open a command prompt in your SQLite directory. By default SQLite will create an in-memory database but by issuing an "open" command a new database will be created. Called "dot commands", statements that start with a period are executed by sqlite3.exe and we use them to:

1) Create a new database (.open),
2) Tell the program to expect a CSV file (.mode csv), then
3) Import the CSV data into a new table (.import email1.csv email1).

".schema" tells us the structure of the tables that were created, which used the first row of the csv file as the column name.

sqlite3
.open email.db
.mode csv
.import email1.csv email1
.import email2.csv email2
.schema

And here's what it looks like in my command line:

Run SQL Queries

Now we can find duplicate email addresses with an easy sql query:

select x.emailaddress from email1 x where x.emailaddress in (select y.emailaddress from email2 y);

While we're here let's also find out how many different domains are represented in our list of email addresses:

select count(*), substring(emailaddress, instr(emailaddress, '@')+1) from email1 group by substring(emailaddress, instr(emailaddress, '@')+1);

Exporting as a CSV

Exporting to a CSV file is as simple as importing a CSV file.

".headers on" adds the column name to the output.
".output duplicates.csv" writes the output to a file. (".output" without any parameters switches back to standard output.)

Then execute your sql statement and it will be written to disk.

.headers on
.mode csv
.output duplicates.csv
select x.emailaddress from email1 x where x.emailaddress in (select y.emailaddress from email2 y);
.output domain-count.csv
select count(*), substring(emailaddress, instr(emailaddress, '@')+1) from email1 group by substring(emailaddress, instr(emailaddress, '@')+1);

Powershell

Here are two Powershell functions that handle importing and exporting CSV data with a SQLite database. It uses the same code that we used above. Add them to a script you've created or save them as modules.

function Import-CSVToNewSqliteTable
{
    <#
        .SYNOPSIS
            Import a CSV file to sqlite.
A new sqlite database will be created if it doesn't already exist.
If it does exist, the table named $tableName will be dropped before the import if it exists.

    #>

Param
(
# Full path to the sqlite database you want to use or create.
[string]$fullPathToSqliteDatabase,

# Full path to the CSV to import.
[string]$fullPathtoCSV,

# Name of the database table to insert into. Will be dropped if it exists already.
[string]$tableName,

# Full path to sqlite exe. For example c:\sqlite\sqlite3.exe
[string]$fullPathToSqliteExe
)

# there can't be any spaces before the new line and the dot (.) or else sqlite throws an exception
$sqliteCommand = ".open '$fullPathToSqliteDatabase'
drop table if exists $tableName;
.mode csv
.import '$fullPathtoCSV' $tableName
.exit"

$sqliteCommand | & "$fullPathToSqliteExe"
}

function Export-SqliteQueryToCSV
{
    <#
        .SYNOPSIS
            Exports a sqlite query to a CSV.

    #>

Param
(
# Full path to the sqlite database.
[string]$fullPathToSqliteDatabase,

# select command to use to create the CSV.
[string]$sqlCommand,

# full path to the output CSV.
[string]$outputCSVFullPath,

# Full path to sqlite exe. For example c:\sqlite\sqlite3.exe
[string]$fullPathToSqliteExe
)

$sqliteCommand = ".open '$fullPathToSqliteDatabase'
.headers on
.mode csv
.output '$outputCSVFullPath'
$sqlCommand
.exit"

$sqliteCommand | & "$fullPathToSqliteExe"
}

Conclusion

This is way easier than messing around with Excel formulas, isn't it? And once your data is in a sql database you can run any query you can think of.

If you would rather have a nice graphical interface for your queries you can download the free and excellent SQLite Studio.

SQLiteStudio

Enjoy! And remember that just because you were given an Excel document doesn't mean you're stuck with it!