The Hours of Work Behind a Simple Feature
Posted on by Angus Cheng
Since I started this website in April 2021, I knew users would upload PDF files that were password protected. I figured it would be too annoying to do the work to support these PDFs so I put it off. Recently a user asked me to support password protected PDFs, so I did it.
It’s a simple feature, and it works pretty well. I spent all of yesterday working on it and pushed it into production at 2:32 AM this morning. From the small amount of data I gathered over the last few days, it looks like around 5% of users upload PDFs with passwords. Previously I would reject the upload without even displaying an error message. A lot of those users probably went away and assumed the website didn’t work.
I was surprised and annoyed about the amount of time it took me to implement this feature, so I thought it would be good to share it with you so that you can appreciate the work I have done for you. 1
Saturday 11:46 PM - Adding fields to the file_mapping table
The file_mapping
table stores information about each PDF a user uploads. I added two fields password
and requires_password
. The idea is to store the password in the database along with other document metadata.
Sunday 11:10 AM - When the API receives a protected PDF
I changed the API so that when it creates a file_mapping
record with requires_password
as true and the pdf_type as UNKNOWN
. Previously pdf_type could be TEXT_BASED
or IMAGE_BASED
. Since we can’t read the file without a password, we can’t analyse it to determine whether it is TEXT_BASED or IMAGE_BASED.
We do this because we run different code paths for TEXT and IMAGE PDFs.
Sunday 1:14 PM - Wrote SQL to update file_mapping records
UPDATE file_mapping SET password = ? WHERE uuid = ?;
Sunday 2:55 PM - Changed the SQL and created an API to set a password on a document
UPDATE file_mapping SET password = ? pdf_type = ? page_count = ? WHERE uuid = ?;
As I coded the API, I realised I would want to change more fields on the file_mapping after a user successfully sets a password for a document. Once you can read the document, you can classify its pdf_type and count the number of pages it has.
val fileMappings = repository.getFileMappings(body.passwords.map { it.uuid })
val updates = mutableMapOf<String, UpdateFileMapping>()
val results = fileMappings.map { mapping ->
mapping.validateOwnership(userId, ipAddress)
val password = body.passwords.first { it.uuid == mapping.uuid}.password
val file = File(mapping.filename)
val result = uploadAction.analysePdf(file, password, userId, ipAddress, mapping.uuid, mapping.originalFilename)
if (result.state != UploadResponse.State.REQUIRES_PASSWORD) {
updates[mapping.uuid] = UpdateFileMapping(password, result.pdfType.toString(), result.numberOfPages)
}
result
}
// Set the passwords
repository.updateFileMappings(updates)
call.respondText(contentType = ContentType.Application.Json) {
stringify(results)
}
The API accepts document identifiers and passwords. It then tries to open the document with the provided password. It writes correct passwords into the database. It responds by indicating which documents were successfully opened with the provided passwords.
Sunday 3:21 PM - Changed document opening code to use a password when available
val document = if (password == null) Loader.loadPDF(statement) else Loader.loadPDF(statement, password)
If we’ve got a password, use it. This can actually be simpler. Loader.loadPDF is happy to receive a null value for the password.
Sunday 11:57 PM - Changing all Loader.loadPDF calls to provide a password
Pretty easy to do, but I also wrote a bunch of test cases to verify protected PDFs can go through all the conversion paths. There’s a big time gap because I went over to my old man’s house and had a nap, had dinner and then watched the Hong Kong Film Awards.
Monday 12:30 AM - Fixed a folder deletion bug in DEV
Periodically I run a clean up job to delete user data like PDFs, rendered images and optical character recognition results.
for (directory in directories) {
for (file in directory.walk()) {
// delete file
}
}
If the directory is empty, the code above will delete the directory. This doesn’t happen to the production servers because the directories are never empty since people constantly use the app. It happens in DEV though! I fixed this bug with the code below.
fun start(): AccountingProApp {
tempFileDirectories.forEach {
if (!it.exists()) {
it.mkdirs()
}
}
startServerWithRetries()
cleanup.scheduleExecution()
imagePdfWorker.scheduleExecution()
return this
}
On application launch, if the directory does not exist, the code will create it. As I write this post I realise this fix isn’t great. The empty folders can still be deleted, and to get them back again I need to bounce the server. The dev server bounces quite a lot because it re-deploys everytime I commit code. I reckon a better fix would be to not delete folders in the clean up job.
Monday 2:24 AM - UI to allow setting a password for a document
<Input onChange={onChange} value={value} placeholder="Password"/>
<Button
onClick={onClick}
css={{ marginLeft: '0.25em' }}
size="compact"
emphasis="primary">
Submit
</Button>
Provide an input element when an uploaded PDF is protected. Hit the API when the submit button is pressed. Reconcile the API response with the local state in the browser. It was all very fiddly, especially since I’m not very good at writing front end code.
Doesn’t look so good, the input boxes are too large. I don’t like writing CSS, and the functionality worked so I stopped there.
Monday 2:32 AM - Fixed a logic bug in the API
I noticed that text based PDFs were being rendered and OCRed. That isn’t correct. So I fixed the bug. I pushed the code into production and went to sleep.
Monday 11:13 AM - UI finishing touches
I go into my office, check my Grafana dashboard and see that someone successfully used the set password feature. Nice!
I add loading indicators and make the Input element smaller.
<Input
size="compact"
onChange={onChange}
value={value}
placeholder="Password"
disabled={disabled}
/>
<Button
onClick={onClick}
css={{ marginLeft: '0.25em' }}
size="compact"
emphasis="primary"
disabled={disabled}
decoratorLeft={decoratorLeft}
>
Submit
</Button>
Monday 11:38 AM - Fixed another API bug
// Before
val state = if (it.uuid in inProgressUuids) UploadResponse.State.PROCESSING else UploadResponse.State.READY
// After
val state = when {
it.uuid in inProgressUuids -> UploadResponse.State.PROCESSING
it.pdfType == PDFType.UNKNOWN -> UploadResponse.State.REQUIRES_PASSWORD
else -> UploadResponse.State.READY
}
The code old code assumed only two states, READY and PROCESSING. This new feature added in the REQUIRES_PASSWORD state.
Conclusion
Here’s a demo of the feature one more time. It’s hard to believe such a simple feature required sleeping twice and watching the Hong Kong Film Awards.
Was all this work worth it? Only time will tell. If you’re reading this in 2032 please contact me, I’ll do some analysis to figure out how much money this feature made. “Sum the revenue from users whose first uploaded file was protected”. If the year is 2033, I’m not going to do the analysis.
-
Your money ↩︎