Safely Expose Azure Cosmos DB to Untrusted Client Applications - A Case Study for a Hobby Game

I’ll once again use my game as a vessel for making things concrete. I’ll start off with a diagram that explains how persistence in my game works. This will be the central point of discussion for the entire rest of my story today:

client-side apps with cloud storage

This diagram may contain a lot of information to absorb in one go. But don’t worry, we’ll dive deep into it during this blog. There are a few things I want to focus on right now. First notice the MSAL part at the bottom. This represents my game (a WinUI 3 app). With my game, I want to read and write Cosmos DB data for users all over the world (as it will soon conquer the world and get played globally, obviously). My game will run on local windows computers of my players and thus, as an untrusted client-side app, it cannot be trusted with secrets. However, in part 5 of the flow above, we can see that my game directly accesses the cloud database without going to an in-between web API! Apparently, we can safely access the database. In fact, we can even provide users with secure access to their own data on a granular level!

Now, if we keep in mind the massive horizontal scaling capabilities of Cosmos DB (that I’ve explained in detail in my previous blog) and if I now tell you that Cosmos DB data can also replicate to locations all over the world, the true power of Cosmos DB becomes visible. My game is in its early stages of development but thanks to cosmos DB, I already have a solid strategy for letting my users store and retrieve data in a lightning-fast manner all over the world! If you’re curious to learn in detail how this works, keep reading. I’ll cover, in detail, the technical implementation I have used to accomplish this. I’ll also touch on slightly different approaches to securely access data in Cosmos DB and when to go for which strategy.

To kick things off, I’ll dive briefly into the topic of geo-replication in Cosmos DB and its ability to provide low latency read- and write access to stored data almost anywhere in the world. We have a lot of ground to cover, so let’s go!

World-wide availability in Azure Cosmos, and consistency levels

Microsoft learn explains this topic concisely so let me start off by referencing their docs: Distribute data globally with Azure Cosmos DB | Microsoft Learn.  In (very brief) summary, while creating a cosmos database, you can choose to make it available in any number of Azure regions. Note that this option is only available for the provisioned throughput payment mode. All data from the Cosmos database will then be replicated to the selected regions. To paint a clearer picture of this, consider the following screenshot of the Cosmos DB resource In the Azure portal:

worldwide data replication cosmos

Here, we can see that I have a core (starting) region in Europe selected (the light blue checked region). From here, my data is replicated to my other selected data center in West-Us. I only allow writing to the database from my original core region (toggle on bottom left). However, on saving my configuration I get the warning at the top: Picking another region to replicate my database to, will in this case “increase my traffic to a level above my configured maximum”. Hence, I received an error while attempting this action. The latter illustrates how you can prevent unexpected costs while experimenting with the Cosmos DB technology. 

In any case, imagine that I have indeed configured my database to get replicated over 2 regions, West-EU and East-US. How would accessing the data from either region now look like? 

Well, we can simply choose which one to connect to while using the Cosmos Client package, which is the conventional way to interact with Cosmos DB from code:

connecting to Cosmos DB with the CosmosClient

In this small example we can get an impression of how easy It is to setup a Client in C# targeting a specific replication region. (In the sample above, I hardcoded the region, but you’d normally use an environment variable). Keep in mind that when geo-replicating data, Cosmos always does it for the entire database. You configure it from the Azure portal, or your Bicep/ Arm template when creating the database.

Horizontal scaling, as explained in my last blog post, takes place separately in each geo instance. After a user writes data in one region, the speed at which it becomes available to read in another region is typically very fast (>10 ms). The amount of data traffic needed to accomplish this cross-region data access in Cosmos DB, is impacted by another setting at database level, called the consistency level. This is not the focus of this blog today, but it’s important to know that the database offers 5 different consistency levels, which balance the consistency of the order of read and writes to the database (locally, and globally), with throughput in terms of request units. For example, for my save games, I may not have strict requirements for data coming available to users in the exact order in which it is written. So, I will choose a lower consistency level and accept it will take a little bit of time for a universal truth to appear regarding my save-games across the world. For a financial transaction system that uses Cosmos DB for example, requirements may be different, and you would use a strong consistency model. In summary, when configuring Cosmos DB, think about where in the world users will be using your app, and think about the level of consistency for out-of-order reads and writes your app requires.

Secure access to Cosmos DB,

Now let’s get into the meat of this paper: Securing database access for the users of our game. In cosmos DB, you can restrict access to the Database in 3 fundamental ways:

1) With a connection string.
2) With role-based access.
3) With Resource tokens (granular access).

The first two options are relatively easy to set up. You use a connection string or a managed identity with a set of built-in (or custom) roles (RBAC) to get read- or write access to Cosmos DB on a container level. These are great for many simple scenarios in which the identity that you use to approach the database can work in an ‘access to all-or-nothing’ manner. This is often the case for trusted server-side applications (they can protect secrets and can therefore be trusted with full access to a container). According to Microsoft, RBAC is the recommended way to secure Cosmos DB: https://learn.microsoft.com/en-us/azure/cosmos-db/security#resource-tokens-

There is a big, ‘however’, with the ‘recommended’ option in that you need an extra HTTP request each time which results in extra costs and latency. If my game later reaches millions of people all over the world (who knows, huh?), this can become quite an issue!

Gaming and Cosmos DB suggestion by microsoft learn

In the setup from figure 4 (taken from Microsoft.Learn), an Azure landscape with several Azure components is shown which could be leveraged to deliver a complex, modern game. Using a CDN for delivering media to the client app, Databricks for analytics and machine-learning and utilizing the Cosmos DB change feed for notifying parties of changes in the database. I won’t cover these topics in this blog but wanted to briefly point them out for a broader perspective on how Azure can be utilized in gaming. What I do want to especially point out is the part of figure 4 within the red rectangle. Here, we see that a server-side backend Azure API is used as an in-between layer for the client app (mobile phone) to communicate with the database. This is a completely different approach from figure 1, where the client app queries Cosmos DB directly.
There may be some future in which I need to answer the question which approach is best for my game. For sure, the middleware approach will offer security and monitoring benefits, at the very least. Basically, it will come down to a trade-off: Extra costs and latency versus the extra options that this in-between API layer would provide, like e.g. application insights, a traffic manager, a firewall, etc. For this blog, however, I will focus on the relatively more straightforward approach of figure 1 in this blog: Direct database access from the client app. I have found documentation on this topic lacking while it is mechanically quite interesting. So, let’s figure this out ourselves!

Sample app

Luckily, I found this excellent sample application a few years ago: link. Next, this is my own app that represents what architecture I’m currently using to save and retrieve my game state. It borrows heavily from the first application by 1iveOwl, but is slightly cleaned up, uses more modern frameworks and contains a client app implementation. However, I mostly worked on this project last year and technology moves fast, so note that as of publishing this blog, some elements are already mildly outdated.

  • The auth token broker runs ‘in-process’, while the default for Azure functions is ‘out-of-process.
  • .NET 10 was recently released. My apps run on .NET 8.
  • For the sample app, I’m using Azure B2C with the Microsoft Authentication Library for .NET (MSAL). I set this up last year and B2C is now replaced by Azure external identity, but you’ll find Entra external identity has largely the same concepts. (Maybe a blog topic for another day.)

Implementation details
Ok, so let’s dive into the app. It’s quite a bit of code, but don’t worry, I’ll walk you through the key concepts.

project structure

Figure 5 shows the projects that constitute my sample application. “Soulsseeker” contains my Winui3 game which needs users who can log in to retrieve and store game state. It has a dependency on AuthTokenBroker.Core for interacting with the Cosmos DB Client and a dependency on B2CAuthClient to sign users in with Azure B2C. ‘AuthTokenBroker’ is the Azure function shown in figure 1 and it has a dependency on AuthTokenBroker.Core for interacting with Cosmos DB. I’ll now explain how these components work in sequence to store and retrieve game state:

-Step 1: Authenticate a user

My Winui3 application (called Soulsseeker in the sample code), has a dependency on the B2CAuthClient project. This project is essentially a wrapper for a Microsoft.Identity.Client.PublicClientApplication, which signs the user in and returns an access token (JWT) via the OAuth2 authorization code flow for public client applications.

-Step 2: Get a Resource token

We use the JWT access token from step 1, together with the signed-in user info, to construct a CosmosDB client. The client in turn calls an Azure Function called ‘AuthTokenBroker’. This function validates the JWT access token with B2C, and then constructs a new ‘Resource token’ by interacting with the native ‘Permission’ and ‘User’ concepts of Cosmos DB. It hands this (short-lived) resource token to the CosmosDB client of the WinUI Application.

-2a: Creating new user and permissions

In case a user or permission for a user to a certain resource does not exist yet, the AuthTokenBroker will create it on the spot. The call to get the database access token looks like this (figure 6):

figure6
figure7

So, these users and permissions are used (by the .NET Cosmos DB client) to construct a resource token which can be used by the WinUI app to access partitions of a certain PermissionUser specifically. I kindly refer to part 1 of this blog series in case you’re confused about partitions in Cosmos DB. These Users and Permissions are native Cosmos DB concepts that are a bit poorly documented and easy to miss for developers interested in using Cosmos DB. If you look at the Cosmos DB resource explorer in the Azure portal for example, you will not see them at all after creating them! However, they are an interesting and sophisticated concept that allows you to create resource tokens for Cosmos DB that offer data access on a granular level! Perhaps the reason the Microsoft documentation is so minimalistic on this topic has to do with the fact that Microsoft recommends the RBAC approach for most projects. However, granular access to partition data is a very powerful and interesting idea that I feel is worth trying out and exploring!

-Step 3: Query the database, provided the Resource token

The client app can finally use the resource token to continuously, and directly, access the CosmosDB api to retrieve and store data on a granular level. From inside my WinUI3 app this looks as follows (Figure 8):

figure8

The Cosmos DB Client which handles updating the cosmos DB documents via (amongst others) the Replace method in figure 8, directly takes the resource token broker response to construct its connection with Cosmos DB. This construction therefore ensures that the logged in user can only access information for partitions with a partition key that matches the user and (user)permission combination that the resource token was issued for (Figure 9):

figure9

As you can see, it even gets the location (EndpointUrl) of where to access the database, as well as the database name and container that we need, from the token broker response (in addition to the user-permission scoped access token of course). How convenient! Note that Cosmos DB enforces these permission tokens using cryptographic verification at both the gateway and data replication layer of Cosmos DB, on every request made via the Cosmos DB Client. I won't dive into these specific topics for this blog, but especially the data replication of partitions, that Cosmos DB does by default, is good to take not of.

Wrapping up

I hope to have helped you with your Cosmos DB efforts or inspired you to check out Cosmos DB!
I have explained how to achieve direct safe, granular access to Cosmos DB from untrusted Client-side apps and provided you with the conceptual understanding of Cosmos DB required to follow what’s going on. I think it’s a very powerful and cool concept and, together with Cosmos DB’s worldwide replication strategy, can mean extremely low latency and flexible access to data anywhere in the world. I created a video game using this technology, mostly for fun, but I imagine it could easily have applications in many different domains that need low latency, tightly controlled access to data. There remain some interesting questions that I could dive into deeper:

  • How much faster is this than communicating via an in-between web-api?
  • How much Azure costs does this strategy save you exactly?
  • How can I effectively utilize caching for my game data?
  • How can we utilize the new full change feed/ transactional history capabilities of cosmos DB?

Maybe I’ll dive into these topics in the future. For now, I have a solid grasp on the kind of things you can do with Cosmos DB and will first move on to different topics. I plan to keep you posted on these topics via my blog. I hope to see you next time!

 

Thomas Slippens

Thomas Slippens

Thomas is all-round software developer with an interest in all aspects of application development. But in particular Azure and .NET Core