Handling Tiered Access Tokens

Covie makes insurance data more accessible. It offers an API and developer tools to collect and monitor insurance policies.

In early 2022, we embarked on building the first version of a user dashboard. We already had customers connecting data who were manually onboarded with the API. With policy data flowing through the system, we needed to ship as soon as possible.

I thought I'd highlight one of the coding challenges I faced during this project. There's also a post on the design of the dashboard too.

Tiered Access Tokens

As we're dealing with potentially sensitive personal information, secure access to the dashboard was a must. The backend team engineered a solution of tiered access tokens. To make sense of this, I'll highlight the key types of tokens that exist:

Session: only used during the authentication process
User: scoped to account level information associated with the current user
Account: scoped to application level information associated with that account
Application: scoped to a single application and used to access most API functionality
Elevated: reserved for sensitive requests with a short life span

With most OAuth or token based authentication patterns, you'll usually have an access token with a limited life span (hours) alongside a refresh token with a longer life span (days). When an access token expires, the user will exchange their refresh token a new token pair. If this renewal fails, they will be booted out and asked to login again.

As users can belong to multiple accounts, it was important to have this clear separation so that a single access token could not potentially read information from another entity. Similarly, while we don't restrict users to a single application, having that additional layer gives us extra peace of mind and flexibility.

Refreshing Tokens

On the frontend, you can bake standard refresh functionality into your API service layer. Let's assume we're using Axios:

const axios = require("axios")
 
// Define the base API service
const api = axios.create({
	baseURL: process.env.NEXT_PUBLIC_API_URL,
})
 
// Modify default request logic
api.interceptors.request.use((config) => {
	// Get access token
	const token = JSON.parse(localStorage.getItem("access_token"))
 
	// Append token
	config.headers.Authorization = `Bearer ${token}`
 
	// Return the updated config
	return config
})
 
// Modify default response logic
api.interceptors.response.use(
	// Successful response
	function (response) {
		return response
	},
 
	// Failed response
	async function (error) {
		let originalRequest = error.config
 
		// Have we failed as a result of a rejected token
		if (error?.response?.status === 401 && !originalRequest?._retry) {
			// Generate new access token with your refresh token
			const hasTokenRefreshed = await refreshAccessToken()
 
			// Retry with new token
			if (hasTokenRefreshed) {
				// Set the retry header so it doesn't attempt to do this again
				originalRequest._retry = true
 
				// Return the API service
				return api(originalRequest)
			}
		}
 
		// Reject by default
		return Promise.reject(error)
	}
)

Using this code, if a request fails with a 301 (unauthenticated), we will attempt to grab a new access token using our refresh token and and finally retry the original request. If we can't refresh the token, the original request will ultimately fail and the UI should react accordingly.

Refreshing Tiered Tokens

To handle the tiered tokens, we'll need to make some changes to this approach. Only the user access token comes with a refresh token in our case. Both account and application level tokens require their parent's access token in order to regenerate.

Let's say we're down the chain and an application access token has expired. To get a new one, we'll need to use the account access token. Now what happens if the account level token has also expired? We'll need to generate that first before attempting to generating the application access token as originally intended.

To achieve this, we need to replace refreshAccessToken() with something that does the following:

If an application token expires, first check if we have a valid account access token
If the account access token has expired, attempt to renew it
Generate a new application token using the new account token from step 2
Retry the original request using the new application token from step 3

There are some considerations to take into account with this:

We need to know what level token is being attempted at each stage so we know which ancestor token we need to reference and potentially renew
We need to take into account what state management solution we're using to avoid race conditions. For example, if another request comes in subsequently, is it using the new tokens that have been generated as part of the refresh attempt?

// Generate ancestor tokens based on requested scope
// We introduce the scope property to determine which token to use / refresh
let hasTokenRefreshed = false
switch (originalRequest.headers.scope) {
	case "session":
	case "user":
		hasTokenRefreshed = await refreshUserAccessToken()
		break
	case "account":
		hasTokenRefreshed = await generateAccountAccessToken()
		break
	case "application":
		hasTokenRefreshed = await generateApplicationAccessToken()
		break
}
 
// Re-attempt original request with new token
if (hasTokenRefreshed) {
	// Set the retry header so it doesn't attempt to do this again
	originalRequest._retry = true
 
	// Return the API service
	return api(originalRequest)
}

The state consideration was one we were battling with during the first iteration of the dashboard. We'd sometimes run into a scenario where requests were happening asynchronously and referencing stale tokens and failing to refresh as the refresh token had changed.

We were using Zustand persisted storage to store tokens. This basically provides global state and synchronises changes with local storage to persist it. It seemed there was a small delay or occurrences of stale data being returned on requests that were happening in quick succession.

Server Side Rendering (SSR)

Since its original design and after monitoring usage and usage patterns, I identified a range of improvements we could make to the dashboard - many of these impacting its core design, structure and performance. I embarked on a complete redesign and rewrite.

This rebuild also gave me an opportunity to revisit authentication handling from the ground up. I knew at the heart of it I'd need some reliable way to keep tokens in sync and up to date. I also wanted to try being a little more pro-active on the refresh approach. When making a successful request, could I pro-actively refresh the access token behind the scenes if it's close to expiring - avoiding a retry in the near future?

Leveraging SSR improvements that had been introduced with Next.js 13 and its app directory, I took the following approach:

All tokens would be stored as cookies with corresponding expiry dates
When a token expired, it simply wouldn't exist due to it being deleted by the browser
Use middleware to ensure tokens are refreshed between page loads
Fetch global and initial page data via SSR
Use CSR for dynamic data and mutations (with token refresh handling)

The third step would happen server side and synchronously. It would also only refresh tokens that had expired. In the worst case scenario where all tokens had expired bar the user token, it would result in only two synchronous requests. As middleware runs between each page transition, the only time a client side request would have to refresh a token is if they have been inactive on the page for 30 minutes.

I did originally tinker with fetching core data in middleware, however it was being run between each page load it was overkill and resulted in poor performance. Instead, I only used middleware to ensure that the user had a valid set of tokens. Fetching the global data in the root layout via SSR instead meant that it would only run on hard reloads due to caching behaviour.

I had to battle some strange behaviour with how Next.js interacts with cookies in the app directory. Essentially I had to abstract cookie logic and create different initiators depending on the environment they were being access (middleware, server, client). I'm pretty sure this is why Supabase has different initiators when creating instances.

I had never implemented a tiered token approach before building this dashboard. No doubt if I had tinkered with our state solution and wrote directly to cookies on the initial build I may have solved the race conditions and avoided flakey requests.

The rebuild resulted in a way more stable and performant experience. It also gave me an effective crash course on Next.js's shiny new app directory, warts and all.

You can read about the design of the dashboard too if you fancy it.