Flooding DynamoDB

Dale Frohman
2 min readJun 12, 2021

--

There are three DynamoDB limits you must know

  1. An individual record in DynamoDB is called an item, and a single DynamoDB item cannot exceed 400KB.
  2. There is a 1MB limit on the size of an individual scan or query request.
  3. You can use up to 3,000 Read Capacity Units (RCUs) and up to 1,000 Write Capacity Units (WCUs) on a single partition per second.

During a high velocity event one afternoon we learned the important lesson of #3

DynamoDB has a read limit of 3,000 per second

Our user traffic is very sporadic and comes in bursts. We receive a lot of requests in just 10 seconds!!

We managed to get through the rest of the day, painfully, but we made it. We were now tasked with solving this for the next day.

In 12 hours!

The data stored is small and only in a handful of rows. Partitioning and sharding would not help. We didn’t have time to incorporate DAX.

The team came up with two options:

  1. Setup Redis Cache
  2. Cache the data at the edge in our CDN

I took ownership of setting up the CDN proof of concept. I found the following header:

The Cache-Control max-age directive lets you specify how long (in seconds) that you want an object to remain in the cache before the CDN gets the object again from the origin server. The minimum expiration time our CDN supports is 0 seconds. The maximum value is 100 years.

Hmmm….how would 1 second work?

What would be our cache hit vs cache misses?

We ran some tests and experienced favorable results:

1.6 cache misses per second = 84% cache hits

This exceeded our expectations.

We ran more tests and averaged 81% cache hits

Ultimately the Redis cache prevailed and that saved the day. It was kept in place and is currently running until we could architect the right solution, but it was good knowing that we could use the CDN if and when needed for this challenge as well as any similar future use cases.

Looking forward into the future, we will be using edge computing with MQTT to solve this challenge and reduce calls to the origin while reducing response time to the client.

Hope this helps if you are stuck in the same predicament with time ticking…

--

--

Dale Frohman

Principal Site Reliability Engineer. Cyber Security Professional. Technologist. Leader.