Please note: this document is about a work-in-progess to develop a program to load CSV files into a database. I'm documenting a program I'm writing before I've written the code. Things may change! There is a FAQ containing random questions about the project. This document is partly a specification.
****** THE AUTHOR OF THESE PROGRAMS ACCEPTS NO RESPONSIBLITY FOR ANY LOSS OF DATA HOWSOEVER CAUSED. ******
This Python3 script loads csv files downloaded from British banks into a Sqlite3 database.
Windows or macOS or Linux or *BSD. Any system that can run Python3 and sqlite3
Python 3, preferably a reasonably recent version. It is available already installed on recent versions of macOS. For Linux it can be installed using apt-get, or the equivalent program on other installations of Linux. Python 2 will NOT work.
Python3 is available from www.python.org
sqlite3. This is an SQL database manager. It may already be on a macOS system. On Linux it can be installed as above.
sqlite3 can be obtained from www.sqlite.org
(Optional), a spreadsheet program such as Excel or Numbers.
tload.py is a test program that will create a database from a CSV file. It has been tested on CSVs from Natwest.
tload.py can be obtained from Github. Click on the "Latest" button under Releases on the right-hand side. You can also clone the repository with git.
Extract the files from the zip or tar.
Create an empty directory. Copy tload.py into it. Copy a CSV file into it and rename it to "testme.csv"
Run the python script with "python3 tload.py"
The database will be named "trans.db"
Here's a sample configuration file. It's in JSON. Hopefully, it's not too intimidating for a novice user to edit.
It defines the configuration for two users named "bart" and "homer"
{
"bart": {
"account_number": "50128900",
"account_sort_code": "71-72-03",
"account_type": "current",
"account_name": "SIMPSON-B",
"csv_filename_contains_account_number": true,
"csv_format": ["nationwide","nationwide-2"],
"database_filename": "bart.db"
},
"homer": {
"account_number": "60123422",
"account_sort_code": "22-40-77",
"account_type": "saving",
"account_name": "SIMPSONH",
"csv_filename_contains_account_number": false,
"csv_format": ["starling"],
"database_filename": "homer.db"
}
}
(from Wikipedia) Comma-separated values (CSV) is a plain text data format for storing tabular data where the fields (values) of a record are separated by a comma and each record is a line (i.e. newline separated). CSV is commonly-used in software that generally deals with tabular data such as a database or a spreadsheet. Benefits cited for using CSV include simplicity of use and human readability. CSV is a form of delimiter-separated values. A CSV file is a file that contains CSV-formatted data.
Banks provide downloadable CSV files as an additional method of giving their customer information about the transactions in their account.
The customer can then import the CSV files into their spreadsheets and/or databases.
There does not appear to be one. The format of the date value can be "02/01/2026" or "02 Jan 2026". Neither of these formats is collatable in a meaingful order in a database or spreadsheet column. The transactions can be in ascending or descending date order. The amount of a transaction can be in two columns for "Paid Out" and "Paid In" or one column labelled "Amount" with a signed number.
Sometimes numerical values are preceded by a £ sign. Sometimes the account number is present in a column, and sometimes it's not.
None of the banks I've seen so far give any indication of the order of transactions within a particular day. This makes it hard to sort the transactions to determine if there is an inconsistency or missing transactions.
Sometimes the transaction type is described in detail, and other times a 3 letter abbreviation is used. See this document from Natwest for a list of the abbreviations they use.
It's a complete mess!
Natwest allow up to 1 years worth in a single file. The transactions are only available for the last seven years. Multiple files can be downloaded in one session logged in to the website.
Importing a single CSV into a speadsheet is a perfectly acceptable way of working. It's probable that banks think this is how their customers use their CSVs, as an alternative to downloading a PDF statement or receiving a paper statement in the post or even visiting the web site to view their transactions online. (Do banks still supply paper statements?)
The difficulty arises when multiple CSVs are imported into one spreadsheet. Banks have been known to change the date order of their CSVs. The format of the date could differ. (I'm looking at you, Natwest). There is no unique transaction ID. Unless you are very disciplined and organised it's easy to get duplicated and missing transactions. This is a disaster in the making if you are compiling data for your tax return.
You don't actually need to know very much about SQL. It helps to know about the SELECT command.
This reasonably simple query will go a long way:
SELECT * FROM mytable WHERE date >= '2026-01-01' ORDER BY tid ASC;
This will select all transactions beginning in January 2026
www.sqlite.org is the obvious place, but the syntax diagrams are a bit intimidating. I also like www.sqlitetutorial.net
sqlite3 has a great variety of output formats. It is capable of showing the results of a query in a web browser as a nicely formatted table.
Bearing in mind that you cannot get CSV data from the bank for transactions older than seven years, it is important to keep the files safe.
I create a directory which only contains pristine CSV files exactly as downloaded from the bank. I do not create any databases in this directory or use the files in any program.
I check the files into version control programs like Git or Fossil. I also upload the files to the cloud.
When all the files are safely stored, I copy them to my working directory or directories
CSV files are precious. Do not rely on your database as your sole archive as there are SQL commands like "DROP TABLE" which will delete all your data. :-( The creator of these programs is NOT responsible for any loss of data.
"tid" means "transaction id." It is an integer calculated by taking the number of days since 17 November 1858, multiplying it by 100, and then adding 1 for the first transaction for the day, 2 for the second, and so on for all the other transactions on the day.
The number of transactions for the day can be no more than 99, so this program is not suitable for use in business.
17 November 1858 is day 0 of the Modified Julian Day (MJD). Wikipedia MJD.
The MJD for 6 April 2026 is 61136, so the first transaction tid is 6113601, the second tid is 6113602.
Unlike dates, tids are unique within a table. They are also collatable.
The script will load a database from a CSV with each transaction given an unique id (a "tid").
No transaction will be duplicated in the database based on the tid.
Transactions will be verified to see if the balance of the previous transaction plus the value of the current transaction is equal to the balance of the current transaction.
The script will give a list of date ranges where transactions are missing.