What Year Is It?

Lately I’ve been getting a lot of complaints that the year is wrong in Bank Statement Converter’s resulting CSV. At first when I got these complaints I thought “What the hell you talking about? All we do is find the transaction data and then write it out to a CSV file. How can the year be wrong?”. Let’s walk through an what’s going on with one of my HSBC bank statements.

When converted it looks like this

After downloading and opening up the text in a text editor like like Sublime Text, it looks like this.

Date,Transaction Details,Deposit,Withdrawal,Balance
20 Feb,B/F BALANCE,,,"XXX"
27 Feb,CREDIT INTEREST,3.84,,"XXX"

If you open it up in Excel it looks like this

Take a look at the date at the top of the image. The year is set to 2022, but the CSV does not have a year in it. Excel is a jerk and has added in the current year.

Not our problem?

I could say to my customers “Well actually Excel is the one adding in the year. The CSV we generate is correct, it’s a problem with Excel”. That’s tempting, but surely there is a way to figure out the correct year for a bank statement. Let’s take a look at the top of my HSBC bank statement.

There’s a mysterious unlabeled date that reads “20 March 2021”. We can’t really know for sure, but let’s assume this date is the “Statement Date” or the date the statement was generated. Let’s see if we can figure out a method to add in the correct year for the dates that we have.

Example #1 - March

Input
Statement Date: 20 March 2021
Dates: 20 Feb, 27 Feb, 2 Mar, 4 Mar, 9 Mar, 12 Mar, 16 Mar

Output
20 Feb 2021
27 Feb 2021
2 Mar 2021
4 Mar 2021
9 Mar 2021
12 Mar 2021
16 Mar 2021

This looks pretty easy to deal with. Let’s just grab the year from the statement date and append it to the dates.

Example #2 - January

Input
Statement Date: 20 January 2021
Dates: 22 Dec, 24 Dec, 25 Dec, 31 Dec, 2 Jan, 14 Jan

Output
22 Dec 2020
24 Dec 2020
25 Dec 2020
31 Dec 2020
2 Jan 2021
14 Jan 2021

Our strategy from the previous example isn’t going to work. The first thing that comes to mind is to detect the cross over point. Set the statement year minus one for all dates before the cross over. Set the statement year for all dates after the cross over point. Detecting the cross over point is a little bit annoying. One way to do it is put a dummy year on all the dates, and then find the first date that is “less than” the preceding date.

Let’s walk through the example. First we put on a dummy year on all the dates:
22 Dec 1999
24 Dec 1999
25 Dec 1999
31 Dec 1999
2 Jan 1999
14 Jan 1999

Then we find the first date that is less than the preceding date. Which is “2 Jan 1999”. We then set the statement year for all dates after and including the cross over point:

22 Dec 1999
24 Dec 1999
25 Dec 1999
31 Dec 1999
2 Jan 2021
14 Jan 2021

We then set the year before the statement year for all dates before the cross over point:

22 Dec 2020
24 Dec 2020
25 Dec 2020
31 Dec 2020
2 Jan 2021
14 Jan 2021

That’s pretty complicated, but it works! It does smell like a loser though. If the code smells like a loser, it probably is.

Example #3 - January Statement with no January transactions

Input
Statement Date: 20 January 2021
Dates: 22 Dec, 24 Dec, 25 Dec, 31 Dec

Output
22 Dec 2020
24 Dec 2020
25 Dec 2020
31 Dec 2020

There is no cross over point in this data. If we ran our imaginary code it would probably set the year to 2021 for all of these dates. Which is not what we want. This set of data stumped me for quite some time. I eventually called up my good friend Jason Traish. He immediately came up with the following algorithm.

“Assume statements only go back in time by three months. Create a mapping for the previous three months. January goes to 2021, December to 2020 and November to 2020. Then you transform the dates, grab the correct year from your mapping”

This works pretty well. This method doesn’t care about the order of the dates, ascending, descending or randomly ordered dates will work.

What about combined PDFs?

Some people like to combine multiple PDFs into one and then upload it to Bank Statement Converter. This usually works fine. However this could really mess with our year adding technique. What if a user uploads three years worth of bank statements merged into one PDF? We’d need to look for multiple statement dates and use the closest one in the document.

What about statements that span multiple years?

Ah… pass. I’m sure they exist, but I don’t think they’re common. I’ll deal with this if it ever comes up.

Does a Generic Solution Exist?

It might be possible, I don’t see it at the moment though. A generic solution using the strategy I mentioned above would require a generic way of finding all the statement dates in the PDF. It would also require a generic way of adding the year to a transaction date. Perhaps Jason Traish will think of a generic solution, until then I’ll continue playing whack-a-mole by fixing up statements as customers complain.

Join The Mailing List