I own a limited liability company in Hong Kong called Dragon King Creation Limited. The company has no employees, I’m the sole director and I own 100% of the shares of the company. I created the company in 2015 to manage the revenues from my Android and iPhone application sales. The company doesn’t make a lot of money, but it’s still officially a company. Every year limited companies in Hong Kong must go through an audit. When I go through this audit process my accountant asks me a bunch of questions and asks for various files, including bank statements and credit card statements.
I have two bank accountants and one credit card. My bank accountant and credit cards are issued by HSBC, and HSBC does not provide Excel or CSV dumps. They only provide PDF bank statements and PDF credit card statements. So every year around February I go into my internet banking accounts and download 24 bank statements and 12 credit card statements. Then I email them over to my accountant and continue living my life. This year I sent over the files to my accountant and they sent me an Excel document with all the transactions. I asked him “How did you get these PDFs into Excel format”.
“I copy and pasted the data into Excel”
I then thought, jeez he probably has to do this for all his client, and some of them probably have thousands of transactions per month and tens of bank accounts. I talked to a few friends who are accountants and they told me the same thing “We manually copy the data into Excel.”. They also told me they have tried using software that converts PDFs to Excel but it doesn’t really work properly, and can take several minutes to process the files. I then thought “hmmm I have a feeling other people have this problem, maybe I can produce something that automatically extracts the important information from a bank statement and writes it out into an Excel file”.
I’ve got access to four bank statement/credit card statement formats:
- HSBC Personal
- HSBC Credit Card
- HSBC Commercial
- Westpac Personal
Here’s what the bank statement data looks like in PDF:
This is what I wanted the resulting data to look like
It doesn’t sound that hard to do, but figuring out where the transaction table in a PDF document is tricky. It’s also tricky to figure out the bottom of the transaction table. Additioanlly, the “Transaction Details” column can split across multiple lines, when really it is describing one transfer. Figuring out when to merge the lines and when not to is not easy!
I’ll go into detail how it all works in a future post, this post is more about why I’ve been spending my time working on converting PDF bank statements into Excel files.