How do pick the database more correctly?
Last updated
Last updated
There are variety databases that we can choose. So that, many developers choose the database by feeling or by popularity. This article give the developer some advices to choose database by some criterial.
The history of databases (the development of databases) helps developer pick which types of databases in rough. (RDBMS, NoSQL, NewSQL)
The theories of data processing mechanism helps developer pick which types of databases more precisely. (RDBMS, NoSQL, NewSQL)
Through data structure inside of a database, help developer picks appropriately NoSQL database.
Consider the database security in various databases.
[1] According to the survey paper, the data store history start at the tape to NewSQL during 40 years. The Fg 1 pic shows the major database from 1960 to 2020 and explains the character of each type.
[2] In 2020, the major of database type are RDBMS (Relational Database Management System), NoSQL (Not Only SQL) and NewSQL. What are the RDBMS, NoSQL and NewSQL:
RDBMS: Rising from the web age, it is good at struct data. Use the relational table to Store data, and support ACID (automatic, consistency, isolation, durability) in the transaction process.
NoSQL: Rising from big data Analytics demands. The problem is how to store semi-structure or un-structure data. It's good at store these data and have horizontal scaling ability. (p.s RDBMS only have vertical scaling ability )
NewSQL: Be proposed in a 2011 paper. It means having the ACID feature on the transaction process on NoSQL. So it has good performance to store all data types (structure, semi-structure, or un-structure).
ACID will introduce in the other paragraph.
There are a few aspects from the development history:
Concerns data type you want to store. If the data type is structured, the RDBMS is better and more stable. Neither structure data, the NoSQL or NewSQL is better.
Concerning the scaling ability, if have a big demand for horizontal scaling should pick the NoSQL Database.
If the developer decides to choose a NoSQL database, should consider a question about the role of the database. If should guarantee the transaction process, the NewSQL is better.
There are three ways to evaluate the database aspects in the database development history.
<1> ACID(automatic, consistency, isolation, durability): In late 1970s Jim Gray defined the most widely accepted transaction model and later it became popularized as ACID transactions[1].
Automaticity: Means a transaction must finish or do not change from origin status.
consistency: Means the database data status is consistent before and after the transaction.
isolation: Means the transaction executes will not impact the other transactions.
durability: Means the data is durability in any condition
The ACID term is to evaluate RDBMS aspects and it's a feature for all RDBMS. Like MySQL, MSSQL, and Postgres... these popular databases support this term.
<2> CAP(Consistency, Availability, and Partition tolerance ): In 2000, Eric Brewer presented a conjecture explaining trade-offs in distributed systems, later popularized as CAP theorem[1].
Consistency: All clients have the same view of the data.
Availability: The system continues to operate even in the presence of node failures.
Partition-Tolerance: The system continues to operate and upholds its consistency guarantees in spite of network partitions.
BASE(Basically Available, Eventually consistent): has been proposed which was derived from the CAP theorem[1].
concerns the purpose of using the data then choose the proper database: If focus on:
Data Consistency and Transaction Stability: NewSQL and RDBMS are better because they support ACID.
High throughput and don't care about consistency: It must be NoSQL series, but should consider the database supports AP in CAP.
High throughput and care about consistency: It must be NoSQL series, but should consider the database supports CP in CAP.
<relation>: The data model focus on structure data that present with a form that can use the primary key and foreign key to link tables. It's the most popular database in the development of a product like MySQL, Postgres, and MariaDB.
In Relational Database, familiar tools help developers to design the data model like "entity-relationship diagram" (ERD) which is an old method but the most efficient method to design RDBMS. ERD is simply the diagram or model that is used to represent or show the relationship between the entities or data objects that are stored in a database[9].
<Key Value>: These are the simplest and most popular NoSQL stores, in which data are managed and represented as pairs. There are many data structures to handle data efficiently, highly scalable, and key-based lookups such as Distributed Hash Tables (DHTs) and Log-Structured Merge-trees (LSM-trees).
In-memory key-value stores, such as Memory cached, provide an extremely fast access to information by keeping it in memory.
Persistent key-value stores, such as RiakKV and Oracle NoSQL, provide a highly available access to non-transient information by storing it in HDD/SSD.
Hybrid key-value stores, such as Redis and Aerospike, first keep data in memory and then persist them when some conditions are satisfied.
<Wide column> Data in wide-column stores can be efficiently partitioned horizontally (by rows) and vertically (by column-families), which make them suitable for storing huge datasets.
Row-key: it's like a new table in RDBMS
Column Family: it's similar the column RMDBS that have name and type still a key-value structure. Column Family + RowKey that is primary key in RMDBS
Super Column Family: It is a tuple (pair) that consists of a key–value pair, where the key is mapped to a value that are column families[4]. In the other works, that can combine the columns be a single type of column.
<Document> These are extended key-value stores in which the value is represented as a document encoded in standard semistructured formats such as XML, JSON, or BSON.
With its rise on Web2.0, the application schema is constant. The Document store can support these data directly instead change the database schema, and allow querying data inside a document without having to retrieve the whole document (via its key) and then inspect it.
so that it's a low maintenance and cost in modern application.
<Graph> [5]The last big NoSQL database type is the most complex one, geared toward storing relations between entities in an efficient manner. When the data is highly interconnected, such as for social networks, scientific paper citations, or capital asset clusters, graph databases are the answer.
Node: The entities themselves. In a social network this could be people.
Edge: The relationship between two entities. This relationship is represented by a line and has its own properties. An edge can have a direction, for example, if the arrow indicates who is whose boss.
there is a positive point in the graph design:
The Graph DB is shrunk the size can compare with RDBMS.
Comparison of Database Data Models
Relation
Relational databases are based on the relational model, an intuitive, straightforward way of representing data in tables[8].
It has good performance OLTP (online transaction process). It's stable and has diversity tool pool in open source.
Key-value
Objects are only accessed via a single Key, object caching, and where objects are not related.
Scalable, a very fast random access via Key, and ease of data partitioning
Wide column
Batch-oriented parallel processing of large aggregated datasets
With regard to query workload, a hierarchy of aggregates, such as column-families, are designed that, in turn, increase the performance of queries.
Document
Data can be easily interpreted as documents and are constantly evolving.
A rich data model to store data with arbitrary complexity, such as nested structures, arrays, and scalar values; each component of a document can be accessed via secondary indices.
Graph
There is the need to traverse several levels of relationships among intensely related data.
Fast and simple querying of linked datasets, and easy mapping of entity-relationship diagrams
According to the data, the model chooses the properest database. Can Refer to the Comparison form that can help to choose.
General Architecture of Database Design in the whole system
The Database always does not access by outside services that hide insight into the service with middleware. So the authentication ways in database alway for internal service, such as Password base, Directory based services, PKI base services, and Kerberos.
Password base
The basic password is a string to access the database.
Directory base services
Directory service or name service maps the names of network resources to their respective network addresses[10].
PKI (public key infrastructure) base
It is a set of roles, policies, hardware, software, and procedures needed to create, manage, distribute, use, store and revoke digital certificates and manage public-key encryption[11].
Kerberos
Kerberos is a network authentication protocol. It is designed to provide strong authentication for client/server applications by using secret-key cryptography[12].
There are no suggestions because the database supports auth method is familiar so we can ignore that.
RDBMS
MySQL
Several auth methods
Role base Access Control
ACID
PostgreSQL
Several auth methods
Role base Access Control
ACID
DB2
Through OS,
Domain controller,
Kerberos
Role base Access Control
ACID
MsSQL
Through OS or mixed auth mode
Role base Access Control
ACID
NoSQL- Key-Val
Redis
Using password
No
CP
Riak Kv
Using password,
Pluggable auth module,
client certificate
using different permissions.
AP
NoSQL- Wide Column
Cassandra
Using password, External mechanism
Role base Access Control
AP
HBase
simple auth by password Kerberos
Role base Access Control.
Attribute-base access control by groups and access control list.
AP
NoSQL- Document
mongoDB
Using password, External mechanism, Kerberos
Role base Access Control
ACID
CouchDB
Using password, External mechanism, Kerberos
Role base Access Control
BASE
NoSQL- Graph
neo4j
Using password, External mechanism, Kerberos
Role base Access Control
ACID
EdgeDB
Using password, External mechanism, Kerberos
Role base Access Control
ACID[13]
NewSQL
Apache Ignite
simple auth by password
No
ACID
VoltDB
Kerberos
using procedure base
ACID
[1] Davoudian, Ali, Liu Chen, and Mengchi Liu. "A survey on NoSQL stores." ACM Computing Surveys (CSUR) 51.2 (2018): 1-43. [2] Samaraweera, G. Dumindu, and J. Morris Chang. "Security and privacy implications on database systems in Big Data era: A survey." IEEE Transactions on Knowledge and Data Engineering 33.1 (2019): 239-258. [3] https://www.geeksforgeeks.org/acid-properties-in-dbms/ [4] https://en.wikipedia.org/wiki/Super_column_family [5] https://dzone.com/articles/nosql-database-types-1 [6] https://neo4j.com/developer/graph-db-vs-nosql/ [7] https://db-engines.com/en/ranking [8] https://www.oracle.com/database/what-is-a-relational-database/ [9] https://www.geeksforgeeks.org/difference-between-dfd-and-erd/?ref=gcse [10] https://en.wikipedia.org/wiki/Directory_service [11] https://en.wikipedia.org/wiki/Public_key_infrastructure [12] https://web.mit.edu/kerberos/ [13] https://www.edgedb.com/blog/edgedb-a-new-beginning