Summer 2022
Handling Tiered Access Tokens
Covie makes insurance data more accessible. It offers an API and developer tools to collect and monitor insurance policies.
In early 2022, we embarked on building the first version of a user dashboard. We already had customers connecting data who were manually onboarded with the API. With policy data flowing through the system, we needed to ship as soon as possible.
I thought I'd highlight one of the coding challenges I faced during this project. There's also a post on the design of the dashboard too.
Tiered Access Tokens
As we're dealing with potentially sensitive personal information, secure access to the dashboard was a must. The backend team engineered a solution of tiered access tokens. To make sense of this, I'll highlight the key types of tokens that exist:
- Session: only used during the authentication process
- User: scoped to account level information associated with the current user
- Account: scoped to application level information associated with that account
- Application: scoped to a single application and used to access most API functionality
- Elevated: reserved for sensitive requests with a short life span
With most OAuth or token based authentication patterns, you'll usually have an access token with a limited life span (hours) alongside a refresh token with a longer life span (days). When an access token expires, the user will exchange their refresh token a new token pair. If this renewal fails, they will be booted out and asked to login again.
As users can belong to multiple accounts, it was important to have this clear separation so that a single access token could not potentially read information from another entity. Similarly, while we don't restrict users to a single application, having that additional layer gives us extra peace of mind and flexibility.
Refreshing Tokens
On the frontend, you can bake standard refresh functionality into your API service layer. Let's assume we're using Axios:
Using this code, if a request fails with a 301 (unauthenticated), we will attempt to grab a new access token using our refresh token and and finally retry the original request. If we can't refresh the token, the original request will ultimately fail and the UI should react accordingly.
Refreshing Tiered Tokens
To handle the tiered tokens, we'll need to make some changes to this approach. Only the user access token comes with a refresh token in our case. Both account and application level tokens require their parent's access token in order to regenerate.
Let's say we're down the chain and an application access token has expired. To get a new one, we'll need to use the account access token. Now what happens if the account level token has also expired? We'll need to generate that first before attempting to generating the application access token as originally intended.
To achieve this, we need to replace refreshAccessToken()
with something that does the following:
- If an application token expires, first check if we have a valid account access token
- If the account access token has expired, attempt to renew it
- Generate a new application token using the new account token from step 2
- Retry the original request using the new application token from step 3
There are some considerations to take into account with this:
- We need to know what level token is being attempted at each stage so we know which ancestor token we need to reference and potentially renew
- We need to take into account what state management solution we're using to avoid race conditions. For example, if another request comes in subsequently, is it using the new tokens that have been generated as part of the refresh attempt?
The state consideration was one we were battling with during the first iteration of the dashboard. We'd sometimes run into a scenario where requests were happening asynchronously and referencing stale tokens and failing to refresh as the refresh token had changed.
We were using Zustand persisted storage to store tokens. This basically provides global state and synchronises changes with local storage to persist it. It seemed there was a small delay or occurrences of stale data being returned on requests that were happening in quick succession.
Server Side Rendering (SSR)
Since its original design and after monitoring usage and usage patterns, I identified a range of improvements we could make to the dashboard - many of these impacting its core design, structure and performance. I embarked on a complete redesign and rewrite.
This rebuild also gave me an opportunity to revisit authentication handling from the ground up. I knew at the heart of it I'd need some reliable way to keep tokens in sync and up to date. I also wanted to try being a little more pro-active on the refresh approach. When making a successful request, could I pro-actively refresh the access token behind the scenes if it's close to expiring - avoiding a retry in the near future?
Leveraging SSR improvements that had been introduced with Next.js 13 and its app directory, I took the following approach:
- All tokens would be stored as cookies with corresponding expiry dates
- When a token expired, it simply wouldn't exist due to it being deleted by the browser
- Use middleware to ensure tokens are refreshed between page loads
- Fetch global and initial page data via SSR
- Use CSR for dynamic data and mutations (with token refresh handling)
The third step would happen server side and synchronously. It would also only refresh tokens that had expired. In the worst case scenario where all tokens had expired bar the user token, it would result in only two synchronous requests. As middleware runs between each page transition, the only time a client side request would have to refresh a token is if they have been inactive on the page for 30 minutes.
I did originally tinker with fetching core data in middleware, however it was being run between each page load it was overkill and resulted in poor performance. Instead, I only used middleware to ensure that the user had a valid set of tokens. Fetching the global data in the root layout via SSR instead meant that it would only run on hard reloads due to caching behaviour.
I had to battle some strange behaviour with how Next.js interacts with cookies in the app directory. Essentially I had to abstract cookie logic and create different initiators depending on the environment they were being access (middleware, server, client). I'm pretty sure this is why Supabase has different initiators when creating instances.
I had never implemented a tiered token approach before building this dashboard. No doubt if I had tinkered with our state solution and wrote directly to cookies on the initial build I may have solved the race conditions and avoided flakey requests.
The rebuild resulted in a way more stable and performant experience. It also gave me an effective crash course on Next.js's shiny new app directory, warts and all.
You can read about the design of the dashboard too if you fancy it.