Simple primary keys in AWS DynamoDB are already cool, providing data retrieving by key at almost constant time. But their lesser-known composite counterparts can be even cooler when it comes to handling complex relationships and data organization.
Primary Keys in DynamoDB
While the “primary key” term is well-known in the database world, DynamoDb also operates with the terms “partition key” and “sort key.” The “partition key” refers to a column defining the partition where an item is stored, ensuring efficient distribution across nodes. The exact partition is calculated through a hash function from the column’s value. Which is why it was originally called a “hash attribute.” A simple single-column primary key consists of just a partition key.
Things get more interesting when we add a sort key (or a “range attribute”) to the primary key. Together, they form a composite primary key with two columns. The values of the primary key should be unique, but since we have two columns now, we can have multiple records with the same partition key as long as their sort keys are different. But why would we want to have the data stored this way?
Relations
Imagine we’re storing comments for blog posts and need to fetch them by post ID. In a relational database, we might structure it like this:
CommentId | PostId | Comment |
---|---|---|
1 | 123 | First comment |
2 | 123 | Second comment |
3 | 456 | First comment, but on another blog post |
To make queries efficient, we’d create an index on the PostId
column. This way, retrieving
comments by post ID would be fast.
But how do we achieve the same in DynamoDB? We can’t use PostId
as the primary key since
its values aren’t unique. Using CommentId
as the primary key wouldn’t work either, as searching
by PostId
would require a full table scan. What we can do is create a composite key with PostId
as the partition key and CommentId
as the sort key:
PostId | CommentId | Comment |
---|---|---|
123 | 1 | First comment |
123 | 2 | Second comment |
456 | 1 | First comment, but on another blog post |
Don’t worry about duplicate comment IDs in the example above. The values of the sort key don’t need to be globally unique; they only need to be unique within the same partition key, or in our case, the same post ID.
The cool thing about this design? Querying by partition key and sort key is still highly efficient, only slightly slower than querying a single-column primary key. Plus, you can retrieve all items for a given partition key in one query (or just a subset if needed, as we’ll discuss further).
The power of a sort key
As we mentioned earlier, the sort key is a range attribute. If we define our table in YML format, it would look something like this:
CommentsTable:
Type: AWS::DynamoDB::Table
Properties:
AttributeDefinitions:
- AttributeName: PostId
AttributeType: N
- AttributeName: CommentId
AttributeType: N
KeySchema:
- AttributeName: PostId
KeyType: HASH
- AttributeName: CommentId
KeyType: RANGE
A range type provides two immediate benefits: when querying by a partition key, the items
are returned in order based on the sort key, and we can use range queries with operators like
begins_with
, between
, >
, and <
. Let’s take advantage of this! Instead of using
a meaningless comment ID, we can replace it with a comment timestamp, either in a date-time
string format or as a Unix timestamp (seconds or milliseconds):
PostId | CommentId | Comment |
---|---|---|
123 | 1737407675 | First comment |
123 | 1737407678 | Second comment |
456 | 1737407676 | First comment, but on another blog post |
Now, all comments are sorted by time, and there’s no need for an extra date column. Nice!
And if we need to reference a specific comment, we can build the ID by combining the PostId
and CommentId
, like this: 123-1737407678
.
A composite value inside a composite primary key
It might feel like we’re in the movie Inception, but this is where we can unlock even more
power from composite keys. As we mentioned earlier, we can use range operators in queries for
composite keys. So, what can we do with begins_with
? Let’s imagine our comments aren’t just
dull single-level comments but instead a fancy tree of comments. In this case, we’d need a way
to track the parent comment for each one. We can store this as a separate attribute, using 0
for first-level comments:
PostId | CommentId | ParentCommentId | Comment |
---|---|---|---|
123 | 1737407675 | 0 | First comment |
123 | 1737407678 | 1737407675 | Second comment, a reply to the first one |
456 | 1737407676 | 0 | First comment, but on another blog post |
To display comments, we can fetch all comments for a given post and then build the comment
tree on the server or client. But what if we have thousands of comments and want to show only
the first-level ones, expanding the tree as the user requests? We can achieve this by storing
both comment ID and parent comment ID in a single sort key column (and the column type now
becomes a string): [parent_comment_id]#[comment_id]
Our table would now look like this:
PostId | CommentId | Comment |
---|---|---|
123 | 0#1737407675 | First comment |
123 | 1737407675#1737407678 | Second comment, a reply to the first one |
456 | 0#1737407676 | First comment, but on another blog post |
All comments under the same parent are still sorted by date. However, since our sort key
follows a structured pattern, we can also query nested comments efficiently. This is where
the begins_with
enters the stage! For example, to fetch all replies to a specific comment,
we query for comments where CommentId
begins with [parent_id]#
:
Query({
TableName: "Comments",
KeyConditionExpression: "PostId = :postId AND begins_with(CommentId, :commentPrefix)",
ExpressionAttributeValues: {
":postId": 123,
":commentPrefix": "1737407675#"
}
})
Concatenating values in the sort key opens up many possibilities for organizing data. It’s not just limited to two values; you can include as many as needed to match your data hierarchy. However, keep in mind that very long strings can impact performance.
How much does it cost?
Another advantage of composite keys is that there are no additional computation costs. You pay for storage of the sort key column and might pay extra for read and write capacity units as they are affected by the item size. However, in our blog comment example, we would likely store comment dates anyway. By making it the sort key, the total record size remains unchanged. No extra cost!
When we switched to storing the comment tree and changed CommentId
to a string, the item size
did increase, but not dramatically. So, be mindful of this when designing your table.
***
As you see, composite keys in DynamoDB aren’t just about storing data. With the right design, you can speed things up and keep your data organized in a way that just makes sense. So, next time you’re setting up a table, it’s worth considering composite keys to make the most of DynamoDB’s capabilities.
Last modified on 2025-01-22