AWS(Amazon Web Services) DynamoDB - Roadmap
DynamoDB is a NoSQL database provided by Amazon Web Services. It is a database that has been used within Amazon for years. One of the NoSQL database types, it can be said to be of key-value type. In a large table, the hash is kept in the form of values. Image for post
Schemaless All NoSQL databases of key-value type are designed without a schema. Since it is schematically, hash information is kept in the same table.
Consistency Most NoSQL databases do not support ACID (Atomicity, Consistency, Isolation, Durability). There are two different solutions for consistency in DynamoDB. (Eventually
Consistent and Strongly Consistent) There is a possibility that Eventually Consistent may not read any recent updates while reading the data. In the DynamoDB documentation, the new update is expressed as the updates made within 1 second. Strongly Consistent is defined as reading the final version of the data at each reading. Reading you make Strongly Consistent costs more than Eventually Consistent.
While it is not possible to expand the Relational Database (RDS) after a point, DynamoDB can scale up / down unlimitedly without causing down-time by calculating Provisioned Capacity.
DynamoDB holds replica on three different A-Zs. One of them is the primary replica. If you want to read data from the primary replica, this is Strong Consistency. If you want to read data from replicas other than primary, this becomes Eventual Consistency. Because there are write operations on the primary, the accuracy of the data is certain. The data is imprecise as others may not have received the update yet. According to the Eventual Consistency, which is Strong Consistency, it needs twice the capacity. Therefore, if there is no need for consistency in queries to be made, if ms delays are not a problem, choosing Eventual Consistency will provide cost-saving. Default: Eventual Consistency.
Scalability The predicted number of reads and writes per second is requested before scalability in DynamoDB. Based on this information, it decides how much resources it should use, how it will distribute DynamoDB tables to the servers, and it is charged. A ThrottlingException error will be received if more seconds of reading and writing is performed in the predicted number of reads and writes.
Partition Key Although it is called a key-value store for DynamoDB, it also includes document store-like features due to schemaless. The reason it is called Partition Key is that this information also determines how the information will be distributed. Under DynamoDB scalability, it is sharding. While sharding, it decides which data to distribute to which server through this key. DynamoDB keeps data in partition form. Some partition kept data is considered as Hot Partition due to its frequent use. General capacity is divided and given to each partition at the same level. However, some are summoned very often, while some are less frequently summoned. This causes some to consume their capacity, while others create a capacity that they do not use. To solve this problem, capacity management is provided between Adaptive Capacity and partition. With the TTL feature, data can be exported after a specified period of time. DynamoDB keeps all capacity partitioned. If there are many partitions, the capacity per partition will decrease. At this point, TTL provides serious benefits and allows the partitions that are less used to be transferred to a different table. It can offer a high capacity table for favorites, and a low capacity table for lesser users. Thus, the cost of infinite capacity increase is eliminated.
Sort Key Although it is possible to search over every field of the tables, in addition to the Hash Key, you need a Sort Key to make a quick search. Primary Key PrimaryKey (PK) expresses singular information like in RDB (Relational Database). In fact, it is not possible to define PK directly in DynamoDB. PK in a DynamoDB table can be of two types: Partition Key Partition Key + Sort Key
Secondary Indexes It is possible to add additional fields for quick searches. But be careful while doing this, according to the index type you defined, additional tables are created for the index. This situation is reflected in the cost. When defining DynamoDB tables, the index structure should be thought through. Since Local Secondary Index is an index that needs to be created when creating a table, it should be considered specially. Global Secondary Index needs a new capacity instead of the basic capacity of the table. For this reason, it should not be defined if it is not needed because it increases the cost.
Provisioned Throughput / Capacity Units Provisioned Throughput, the projected load, is a common issue both in scaling and pricing. If it is set low, the load will be exceeded frequently and will not be able to take full advantage of DynamoDB. If it is set higher than necessary, the cost will increase. With Capacity Units, it is calculated according to the size of the information you store in DynamoDB. The dimensions you use vary according to the reading and writing process.
If the capacity determination is not made incorrectly or regularly, it will increase the cost. To avoid this problem, AWS DynamoDB auto-scaling feature allows automatic scale-up / down if the capacity is used above a defined value for a certain period of time.
Query operators should be used instead of the scan operator. While performing operations between Scan partitions, Query only operates in one partition. By using the Query operator, both speed gain and Capacity Unit gain are achieved. Due to the DynamoDB NoSQL structure, complex queries can be avoided by using Redshift. Especially in order to use it on our reporting side, DynamoDB data at a certain moment can be taken from DynamoDB and taken into Redshift. SQL queries can be written on the data and reports can be generated. With Redshift, SQL operation is performed only on the copied data, while SQL operation can be performed on real-time DynamoDB data with AWS EMR. AWS EMR is managed by Apache Hadoop Cluster. It uses Apache Hive Datawarehouse running on Hadoop. In this way, it is possible to query real-time data with SQL.
Read Rows read 4KB: If the line or data to be read is 8.1 KB, 3 RCUs must be spent.
Write Data to be written 1KB: If the line or data to be written is 8.1 KB, 9 WCU should be spent. One of the DynamoDB constraints is that the size of data to be kept as an attribute is a maximum of 400KB. For the need to store attributes above 400KB, it can be done in three ways: 1.Downsizing with GZIP 2.Put on S3 Bucket and keep the link DynamoDB
- Attribute different partition and partition with sort key
Using DAX (DynamoDB Accelerator) (working with Eventual Consistency), DynamoDB solves the Hot Partition problem by putting its own cache system in front of it. In other words, the partition is able to receive data via DAX without consuming its own capacity.
It is imperative to design that DAX and DAX are on the same VPC. When the DAX is created, if the default cache retention time is not changed for 5 minutes, it expires in 5 minutes and is deleted. With DynamoDB Streams, Lambda can be triggered automatically on any change. Apart from regular backups that can be taken manually or with a lamp, the feature called Point in Time Recovery can return DynamoDB to the desired time up to 35 days. It just needs to be active and automatically manages. Using the Global Table feature, it is possible to work in multiple regions simultaneously. When it collapses in the Europe (Frankfurt) region, it continues to serve through another replica located in the Asia Pacific (Tokyo). DynamoDB dump data can be exported to Amazon S3 Bucket using the AWS Data Pipeline. Similarly, it can import dump data from Amazon S3 and easily create a new table.
Free Tier Free Tier Amazon DynamoDB offers 25 GB, 25 RCUs, and 25 WCUs. This service does not stop at the end of the year.
See you in my next article …