Handling Library Version Upgrades
- #Product Development
- #OSS
- #Productivity
If you develop, operate, and maintain a web service, you need to keep upgrading the libraries you depend on. Our team batches library upgrades on a regular cadence.
This post covers a recent round of upgrade work we did at work.
Why we upgrade
In my view, library upgrades serve two major purposes: addressing vulnerabilities and adopting new features.
Addressing vulnerabilities
When a vulnerability is discovered, a fix is often released. If you are on an older version, you run the risk of being unable to apply the patch right away.
You also need to keep track of each library’s support window.
When you fall out of support, vulnerabilities may never be fixed, which is unacceptable from a security standpoint.
That is why I believe it is important to stay within the supported version range.
Adopting new features
Major releases—especially those that bump the major version—tend to ship new functionality, whether that means features that make development easier or improvements that boost performance.
New capabilities also keep motivation high for those of us building with the library, which I consider another benefit.
For those reasons we try to use recent versions to keep development productive.
Our upgrade workflow
Here is the workflow we follow in the product we are building.
- Review the release notes
We use Django, a Python web framework. Whenever a new version comes out, the official site publishes release notes.
For example, the LTS release of Django 4.2 in April has notes here:
https://docs.djangoproject.com/en/4.2/releases/4.2/
We read the notes to identify areas that might affect our system.
Deprecated logic is particularly important. Classes and methods marked as deprecated are frequently removed in future releases, so we try to update our code to the recommended approach as soon as they are deprecated.
We take the same approach with other libraries: if release notes are available, we review them and assess the impact.
- Upgrade locally and verify behavior
After we understand the impact and fix the deprecated pieces, we run the application.
We start by verifying behavior on our own machines. This is where we identify and fix code that no longer works or behaves unexpectedly.
Upgrading web application frameworks and front-end libraries can touch a wide surface area, so we take our time testing.
We also run the test suite locally. Tests can fail because of the upgrades, and running them helps reveal impacted areas. When tests break, we review and fix them.
Having a solid test suite really pays off during upgrades. Writing tests regularly keeps the system resilient to change.
- Verify on infrastructure that matches production
Once the local upgrade and fixes are complete, we test on infrastructure that mirrors production.
We deploy to AWS, just like production.
It is common to see issues in production-like environments that never appear locally. That is why we always test in an environment that matches production as closely as possible.
We use Docker locally and Fargate in production to keep things aligned, but we still run into bugs that only surface on AWS. For that reason we prioritize verification in the production-like setup even more than local testing.
- Deploy to production after all verifications pass
Examples of issues we encountered this time
Here are two concrete issues we ran into during the upgrade.
HTML changes not updating locally
After the Django upgrade, HTML modifications no longer showed up in the local environment. Starting with Django 4.1, template caching is enabled by default.
https://docs.djangoproject.com/en/4.1/ref/templates/api/#django.template.loaders.cached.Loader
To fix this we disabled template caching specifically for the local environment.
Because we upgraded Django from 3.2 to 4.2, we had to read all release notes in between as well.
Stack traces missing from server error logs
We also noticed that stack traces for 500-level server errors stopped appearing in the logs.
We closely monitor server errors to maintain quality. When they occur, we receive alerts, investigate the root cause, and remediate.
Django provides a built-in error page for server errors, and we extend that in our service.
During the upgrade our custom extensions ended up swallowing the stack trace for server errors, preventing it from being logged. We fixed this by explicitly logging the stack trace as an error.
As you can see, upgrading impactful libraries can trigger issues in unexpected places. That is why I believe multi-faceted testing is essential.
Closing thoughts
This was a look at how we handle library upgrades.
Web development almost always involves external libraries. If you rely on them, you have to upgrade them intentionally and stay within the supported window to reduce incident risk.
Many of us still remember the Log4j vulnerability from December 2021. Keeping your dependencies up to date makes it easier to apply patches quickly. It is a strong reminder of how important it is to stay on top of upgrades.