The Hours of Work Behind a Simple Feature

Since I started this website in April 2021, I knew users would upload PDF files that were password protected. I figured it would be too annoying to do the work to support these PDFs so I put it off. Recently a user asked me to support password protected PDFs, so I did it.

It’s a simple feature, and it works pretty well. I spent all of yesterday working on it and pushed it into production at 2:32 AM this morning. From the small amount of data I gathered over the last few days, it looks like around 5% of users upload PDFs with passwords. Previously I would reject the upload without even displaying an error message. A lot of those users probably went away and assumed the website didn’t work.

I was surprised and annoyed about the amount of time it took me to implement this feature, so I thought it would be good to share it with you so that you can appreciate the work I have done for you. 1

Saturday 11:46 PM - Adding fields to the file_mapping table

The file_mapping table stores information about each PDF a user uploads. I added two fields password and requires_password. The idea is to store the password in the database along with other document metadata.

Sunday 11:10 AM - When the API receives a protected PDF

I changed the API so that when it creates a file_mapping record with requires_password as true and the pdf_type as UNKNOWN. Previously pdf_type could be TEXT_BASED or IMAGE_BASED. Since we can’t read the file without a password, we can’t analyse it to determine whether it is TEXT_BASED or IMAGE_BASED.

We do this because we run different code paths for TEXT and IMAGE PDFs.

Sunday 1:14 PM - Wrote SQL to update file_mapping records

UPDATE file_mapping SET password = ? WHERE uuid = ?;

Sunday 2:55 PM - Changed the SQL and created an API to set a password on a document

UPDATE file_mapping SET password = ? pdf_type = ? page_count = ? WHERE uuid = ?;

As I coded the API, I realised I would want to change more fields on the file_mapping after a user successfully sets a password for a document. Once you can read the document, you can classify its pdf_type and count the number of pages it has.

val fileMappings = repository.getFileMappings(body.passwords.map { it.uuid })
val updates = mutableMapOf<String, UpdateFileMapping>()

val results = fileMappings.map { mapping ->
    mapping.validateOwnership(userId, ipAddress)
    val password = body.passwords.first { it.uuid == mapping.uuid}.password
    val file = File(mapping.filename)
    val result = uploadAction.analysePdf(file, password, userId, ipAddress, mapping.uuid, mapping.originalFilename)

    if (result.state != UploadResponse.State.REQUIRES_PASSWORD) {
        updates[mapping.uuid] = UpdateFileMapping(password, result.pdfType.toString(), result.numberOfPages)
    }

    result
}

// Set the passwords
repository.updateFileMappings(updates)

call.respondText(contentType = ContentType.Application.Json) {
    stringify(results)
}

The API accepts document identifiers and passwords. It then tries to open the document with the provided password. It writes correct passwords into the database. It responds by indicating which documents were successfully opened with the provided passwords.

Sunday 3:21 PM - Changed document opening code to use a password when available

val document = if (password == null) Loader.loadPDF(statement) else Loader.loadPDF(statement, password)

If we’ve got a password, use it. This can actually be simpler. Loader.loadPDF is happy to receive a null value for the password.

Sunday 11:57 PM - Changing all Loader.loadPDF calls to provide a password

Pretty easy to do, but I also wrote a bunch of test cases to verify protected PDFs can go through all the conversion paths. There’s a big time gap because I went over to my old man’s house and had a nap, had dinner and then watched the Hong Kong Film Awards.

Monday 12:30 AM - Fixed a folder deletion bug in DEV

Periodically I run a clean up job to delete user data like PDFs, rendered images and optical character recognition results.

for (directory in directories) {
    for (file in directory.walk()) {
        // delete file
    }
}

If the directory is empty, the code above will delete the directory. This doesn’t happen to the production servers because the directories are never empty since people constantly use the app. It happens in DEV though! I fixed this bug with the code below.

fun start(): AccountingProApp {
    tempFileDirectories.forEach {
        if (!it.exists()) {
            it.mkdirs()
        }
    }
    startServerWithRetries()
    cleanup.scheduleExecution()
    imagePdfWorker.scheduleExecution()
    return this
}

On application launch, if the directory does not exist, the code will create it. As I write this post I realise this fix isn’t great. The empty folders can still be deleted, and to get them back again I need to bounce the server. The dev server bounces quite a lot because it re-deploys everytime I commit code. I reckon a better fix would be to not delete folders in the clean up job.

Monday 2:24 AM - UI to allow setting a password for a document

<Input onChange={onChange} value={value} placeholder="Password"/>
<Button
  onClick={onClick}
  css={{ marginLeft: '0.25em' }}
  size="compact"
  emphasis="primary">
  Submit
</Button>

Provide an input element when an uploaded PDF is protected. Hit the API when the submit button is pressed. Reconcile the API response with the local state in the browser. It was all very fiddly, especially since I’m not very good at writing front end code.

Doesn’t look so good, the input boxes are too large. I don’t like writing CSS, and the functionality worked so I stopped there.

Monday 2:32 AM - Fixed a logic bug in the API

I noticed that text based PDFs were being rendered and OCRed. That isn’t correct. So I fixed the bug. I pushed the code into production and went to sleep.

Monday 11:13 AM - UI finishing touches

I go into my office, check my Grafana dashboard and see that someone successfully used the set password feature. Nice!

I add loading indicators and make the Input element smaller.

<Input
  size="compact"
  onChange={onChange}
  value={value}
  placeholder="Password"
  disabled={disabled}
/>
<Button
  onClick={onClick}
  css={{ marginLeft: '0.25em' }}
  size="compact"
  emphasis="primary"
  disabled={disabled}
  decoratorLeft={decoratorLeft}
>
  Submit
</Button>

Monday 11:38 AM - Fixed another API bug

// Before
val state = if (it.uuid in inProgressUuids) UploadResponse.State.PROCESSING else UploadResponse.State.READY

// After
val state = when {
    it.uuid in inProgressUuids -> UploadResponse.State.PROCESSING
    it.pdfType == PDFType.UNKNOWN -> UploadResponse.State.REQUIRES_PASSWORD
    else -> UploadResponse.State.READY
}

The code old code assumed only two states, READY and PROCESSING. This new feature added in the REQUIRES_PASSWORD state.

Conclusion

Here’s a demo of the feature one more time. It’s hard to believe such a simple feature required sleeping twice and watching the Hong Kong Film Awards.

Was all this work worth it? Only time will tell. If you’re reading this in 2032 please contact me, I’ll do some analysis to figure out how much money this feature made. “Sum the revenue from users whose first uploaded file was protected”. If the year is 2033, I’m not going to do the analysis.


  1. Your money ↩︎

Join The Mailing List