Psst! Want to see a cool bug?

Yesterday I was going through error reports and I saw this email.

Email from user

At first I was kind of annoyed, often users like to merge multiple PDFs into one large PDF to abuse my free plan and convert more than one PDF a day for free. Then I looked at the PDF and thought “Ah actually this is supposed to convert correctly even if the PDF has been merged into one mega PDF”.

Date Enrichment

I’ll give you a quick recap on what date enrichment is and how it works in Bank Statement Converter. If you’re very interested you can read this blog post.

Often bank statements will render partial dates with transaction data. You might see transaction data like this:

Date Description Amount
09/15 Coffee Shop Purchase -$4.50
09/16 ATM Withdrawal -$100.00
09/17 Direct Deposit +$2,500.00
09/18 Grocery Store -$87.32
09/19 Gas Station -$45.20

It looks reasonable, but the problem is the year is missing from the dates. That’s okay if you’re converting one PDF, or multiple PDFs from the same year, but it’s not very good if you’re converting multiple years worth of PDFs into one CSV file. To solve this problem, I capture the statement date from the PDF and use that to add in the year.

Back to the bug

I figured that the bug was happening because we were capturing a statement date early in the PDF and failing to capture the later ones. So perhaps we were using a statement date of “28 Februrary 2021” to enrich data that actually had a statement date of “31 August 2025”.

I then ran the document and saw that was not the case. All the statement dates across the document were captured accurately. Hmmm… what’s going on. The following is the code that enriches a partial statement date.

fun enrichDates(
    result: PageParseResult,
    dates: List<LocalDateTime>
): PageParseResult {
    if (dates.isEmpty()) {
        logger.info("No dates provided, will not enrich")
        return result
    }

    val statementDate = dates.last()

    return transformColumns(setOf(ColumnType.DATE), result) { _, value ->
        val enrichedDate = enrich(statementDate, value)

        if (enrichedDate == null) {
            value
        }
        else {
            enrichedDate.format(DateTimeFormatter.ISO_DATE)
        }
    }
}

fun enrich(statementDate: LocalDateTime, date: String): LocalDateTime? {
    for (builder in formatterBuilders) {
        val formatter = builder
            .parseDefaulting(ChronoField.YEAR, statementDate.year.toLong())
            .parseDefaulting(ChronoField.MILLI_OF_DAY, 0)
            .toFormatter()

        try {
            val temp = LocalDateTime.parse(date, formatter)

            if (statementDateInclusive && temp == statementDate) {
                return temp
            }
            if (temp < statementDate) {
                return temp
            }

            return temp.withYear(statementDate.year - 1)
        }
        catch (dpe: DateTimeParseException) {
            logger.error("Failed to parse date = $date")
            continue
        }
    }

    return null
}

I debugged for a little while and then came across something weird. If I tried to enrich the value “01 Feb” with a statement date of “2025-02-28” it returned a LocalDateTime of “2021-02-01”. That was not the behaviour I wanted. Very strange. Then I looked at the first of the above functions and I saw that the statementDate value was being captured by the subsequent lambda. Maybe lambda variable capture behaved differently to how I expected? I showed the code to my good friend Grok and it told me the lambda behaved as I wanted. I read about lambda variable capture in Kotlin and it seemed to behave as I thought.

I then thought “Maybe the List class behaves differently to how I expected”. The function doesn’t really need to accept a List, so I modified it to take just one statement date instead of a list of them. Still the bug persisted.

As I debugged I got annoyed with setting conditional break points and then stepping through to where the bug occured. So I set up a unit test to make the bug appear.

val result = dateEnricher.enrich("2025-02-28".toLocalDateTime(), "01 Feb")
assertThat(result, equalTo("2025-02-01".toLocalDateTime())

The test passed. Strange, I was unable to replicate the bug in a unit test. Quite the stumper. Looking at the class, there didn’t seem to be any state being stored. All I had was a boolean and a list of Builder objects for DateTimeFormatters.

class DateEnricher(
    private val formatterBuilders: List<DateTimeFormatterBuilder>,
    private val statementDateInclusive: Boolean = true
)

The boolean probably didn’t commit the crime, so I focused on the formatterBuilders. As I was debugging I noticed an enormous amount of properties on one of the DateTimeFormatterBuilder objects.

val formatter = builder
    .parseDefaulting(ChronoField.YEAR, statementDate.year.toLong())
    .parseDefaulting(ChronoField.MILLI_OF_DAY, 0)
    .toFormatter()

Then I realised, this line of code was probably just appending properties onto the builder insteaed of replacing them. This meant the builder had statementDates on it from across the document. Perhaps the builder was using the first one set? I modified the code so that DateEnricher created new builders each time instead of reusing the old ones. That fixed the bug.

I deployed the fix, told the user and then the user was happy.

Email from user

Join The Mailing List