Getting the Most of AWS DynamoDB Primary Keys

Simple primary keys in AWS DynamoDB are already cool, providing data retrieving by key at almost constant time. But their lesser-known composite counterparts can be even cooler when it comes to handling complex relationships and data organization.

Primary Keys in DynamoDB

While the “primary key” term is well-known in the database world, DynamoDb also operates with the terms “partition key” and “sort key.” The “partition key” refers to a column defining the partition where an item is stored, ensuring efficient distribution across nodes. The exact partition is calculated through a hash function from the column’s value. Which is why it was originally called a “hash attribute.” A simple single-column primary key consists of just a partition key.

Things get more interesting when we add a sort key (or a “range attribute”) to the primary key. Together, they form a composite primary key with two columns. The values of the primary key should be unique, but since we have two columns now, we can have multiple records with the same partition key as long as their sort keys are different. But why would we want to have the data stored this way?

Relations

Imagine we’re storing comments for blog posts and need to fetch them by post ID. In a relational database, we might structure it like this:

CommentId	PostId	Comment
1	123	First comment
2	123	Second comment
3	456	First comment, but on another blog post

To make queries efficient, we’d create an index on the PostId column. This way, retrieving comments by post ID would be fast.

But how do we achieve the same in DynamoDB? We can’t use PostId as the primary key since its values aren’t unique. Using CommentId as the primary key wouldn’t work either, as searching by PostId would require a full table scan. What we can do is create a composite key with PostId as the partition key and CommentId as the sort key:

PostId	CommentId	Comment
123	1	First comment
123	2	Second comment
456	1	First comment, but on another blog post

Don’t worry about duplicate comment IDs in the example above. The values of the sort key don’t need to be globally unique; they only need to be unique within the same partition key, or in our case, the same post ID.

The cool thing about this design? Querying by partition key and sort key is still highly efficient, only slightly slower than querying a single-column primary key. Plus, you can retrieve all items for a given partition key in one query (or just a subset if needed, as we’ll discuss further).

The power of a sort key

As we mentioned earlier, the sort key is a range attribute. If we define our table in YML format, it would look something like this:

CommentsTable:
    Type: AWS::DynamoDB::Table
    Properties:
    AttributeDefinitions:
        - AttributeName: PostId
        AttributeType: N
        - AttributeName: CommentId
        AttributeType: N
    KeySchema:
        - AttributeName: PostId
        KeyType: HASH
        - AttributeName: CommentId
        KeyType: RANGE

A range type provides two immediate benefits: when querying by a partition key, the items are returned in order based on the sort key, and we can use range queries with operators like begins_with, between, >, and <. Let’s take advantage of this! Instead of using a meaningless comment ID, we can replace it with a comment timestamp, either in a date-time string format or as a Unix timestamp (seconds or milliseconds):

PostId	CommentId	Comment
123	1737407675	First comment
123	1737407678	Second comment
456	1737407676	First comment, but on another blog post

Now, all comments are sorted by time, and there’s no need for an extra date column. Nice! And if we need to reference a specific comment, we can build the ID by combining the PostId and CommentId, like this: 123-1737407678.

A composite value inside a composite primary key

It might feel like we’re in the movie Inception, but this is where we can unlock even more power from composite keys. As we mentioned earlier, we can use range operators in queries for composite keys. So, what can we do with begins_with? Let’s imagine our comments aren’t just dull single-level comments but instead a fancy tree of comments. In this case, we’d need a way to track the parent comment for each one. We can store this as a separate attribute, using 0 for first-level comments:

PostId	CommentId	ParentCommentId	Comment
123	1737407675	0	First comment
123	1737407678	1737407675	Second comment, a reply to the first one
456	1737407676	0	First comment, but on another blog post

To display comments, we can fetch all comments for a given post and then build the comment tree on the server or client. But what if we have thousands of comments and want to show only the first-level ones, expanding the tree as the user requests? We can achieve this by storing both comment ID and parent comment ID in a single sort key column (and the column type now becomes a string): [parent_comment_id]#[comment_id]

Our table would now look like this:

PostId	CommentId	Comment
123	0#1737407675	First comment
123	1737407675#1737407678	Second comment, a reply to the first one
456	0#1737407676	First comment, but on another blog post

All comments under the same parent are still sorted by date. However, since our sort key follows a structured pattern, we can also query nested comments efficiently. This is where the begins_with enters the stage! For example, to fetch all replies to a specific comment, we query for comments where CommentId begins with [parent_id]#:

Query({
  TableName: "Comments",
  KeyConditionExpression: "PostId = :postId AND begins_with(CommentId, :commentPrefix)",
  ExpressionAttributeValues: {
    ":postId": 123,
    ":commentPrefix": "1737407675#"
  }
})

Concatenating values in the sort key opens up many possibilities for organizing data. It’s not just limited to two values; you can include as many as needed to match your data hierarchy. However, keep in mind that very long strings can impact performance.

How much does it cost?

Another advantage of composite keys is that there are no additional computation costs. You pay for storage of the sort key column and might pay extra for read and write capacity units as they are affected by the item size. However, in our blog comment example, we would likely store comment dates anyway. By making it the sort key, the total record size remains unchanged. No extra cost!

When we switched to storing the comment tree and changed CommentId to a string, the item size did increase, but not dramatically. So, be mindful of this when designing your table.

***

As you see, composite keys in DynamoDB aren’t just about storing data. With the right design, you can speed things up and keep your data organized in a way that just makes sense. So, next time you’re setting up a table, it’s worth considering composite keys to make the most of DynamoDB’s capabilities.

Last modified on 2025-01-22