How do pick the database more correctly?

There are variety databases that we can choose. So that, many developers choose the database by feeling or by popularity. This article give the developer some advices to choose database by some criterial.

  • The history of databases (the development of databases) helps developer pick which types of databases in rough. (RDBMS, NoSQL, NewSQL)

  • The theories of data processing mechanism helps developer pick which types of databases more precisely. (RDBMS, NoSQL, NewSQL)

  • Through data structure inside of a database, help developer picks appropriately NoSQL database.

  • Consider the database security in various databases.

The Development of Database

[1] According to the survey paper, the data store history start at the tape to NewSQL during 40 years. The Fg 1 pic shows the major database from 1960 to 2020 and explains the character of each type.

[2] In 2020, the major of database type are RDBMS (Relational Database Management System), NoSQL (Not Only SQL) and NewSQL. What are the RDBMS, NoSQL and NewSQL:

  • RDBMS: Rising from the web age, it is good at struct data. Use the relational table to Store data, and support ACID (automatic, consistency, isolation, durability) in the transaction process.

  • NoSQL: Rising from big data Analytics demands. The problem is how to store semi-structure or un-structure data. It's good at store these data and have horizontal scaling ability. (p.s RDBMS only have vertical scaling ability )

  • NewSQL: Be proposed in a 2011 paper. It means having the ACID feature on the transaction process on NoSQL. So it has good performance to store all data types (structure, semi-structure, or un-structure).

ACID will introduce in the other paragraph.

the suggestions:

There are a few aspects from the development history:

  1. Concerns data type you want to store. If the data type is structured, the RDBMS is better and more stable. Neither structure data, the NoSQL or NewSQL is better.

  2. Concerning the scaling ability, if have a big demand for horizontal scaling should pick the NoSQL Database.

  3. If the developer decides to choose a NoSQL database, should consider a question about the role of the database. If should guarantee the transaction process, the NewSQL is better.

Evaluate The Database Theory

There are three ways to evaluate the database aspects in the database development history.

<1> ACID(automatic, consistency, isolation, durability): In late 1970s Jim Gray defined the most widely accepted transaction model and later it became popularized as ACID transactions[1].

  • Automaticity: Means a transaction must finish or do not change from origin status.

  • consistency: Means the database data status is consistent before and after the transaction.

  • isolation: Means the transaction executes will not impact the other transactions.

  • durability: Means the data is durability in any condition

The ACID term is to evaluate RDBMS aspects and it's a feature for all RDBMS. Like MySQL, MSSQL, and Postgres... these popular databases support this term.

<2> CAP(Consistency, Availability, and Partition tolerance ): In 2000, Eric Brewer presented a conjecture explaining trade-offs in distributed systems, later popularized as CAP theorem[1].

  • Consistency: All clients have the same view of the data.

  • Availability: The system continues to operate even in the presence of node failures.

  • Partition-Tolerance: The system continues to operate and upholds its consistency guarantees in spite of network partitions.

  • BASE(Basically Available, Eventually consistent): has been proposed which was derived from the CAP theorem[1].

the suggestions:

concerns the purpose of using the data then choose the proper database: If focus on:

  1. Data Consistency and Transaction Stability: NewSQL and RDBMS are better because they support ACID.

  2. High throughput and don't care about consistency: It must be NoSQL series, but should consider the database supports AP in CAP.

  3. High throughput and care about consistency: It must be NoSQL series, but should consider the database supports CP in CAP.

The Data Model of Databases

<relation>: The data model focus on structure data that present with a form that can use the primary key and foreign key to link tables. It's the most popular database in the development of a product like MySQL, Postgres, and MariaDB.

In Relational Database, familiar tools help developers to design the data model like "entity-relationship diagram" (ERD) which is an old method but the most efficient method to design RDBMS. ERD is simply the diagram or model that is used to represent or show the relationship between the entities or data objects that are stored in a database[9].

<Key Value>: These are the simplest and most popular NoSQL stores, in which data are managed and represented as pairs. There are many data structures to handle data efficiently, highly scalable, and key-based lookups such as Distributed Hash Tables (DHTs) and Log-Structured Merge-trees (LSM-trees).

  • In-memory key-value stores, such as Memory cached, provide an extremely fast access to information by keeping it in memory.

  • Persistent key-value stores, such as RiakKV and Oracle NoSQL, provide a highly available access to non-transient information by storing it in HDD/SSD.

  • Hybrid key-value stores, such as Redis and Aerospike, first keep data in memory and then persist them when some conditions are satisfied.

<Wide column> Data in wide-column stores can be efficiently partitioned horizontally (by rows) and vertically (by column-families), which make them suitable for storing huge datasets.

  • Row-key: it's like a new table in RDBMS

  • Column Family: it's similar the column RMDBS that have name and type still a key-value structure. Column Family + RowKey that is primary key in RMDBS

  • Super Column Family: It is a tuple (pair) that consists of a key–value pair, where the key is mapped to a value that are column families[4]. In the other works, that can combine the columns be a single type of column.

<Document> These are extended key-value stores in which the value is represented as a document encoded in standard semistructured formats such as XML, JSON, or BSON.

With its rise on Web2.0, the application schema is constant. The Document store can support these data directly instead change the database schema, and allow querying data inside a document without having to retrieve the whole document (via its key) and then inspect it.

so that it's a low maintenance and cost in modern application.

<Graph> [5]The last big NoSQL database type is the most complex one, geared toward storing relations between entities in an efficient manner. When the data is highly interconnected, such as for social networks, scientific paper citations, or capital asset clusters, graph databases are the answer.

  • Node: The entities themselves. In a social network this could be people.

  • Edge: The relationship between two entities. This relationship is represented by a line and has its own properties. An edge can have a direction, for example, if the arrow indicates who is whose boss.

there is a positive point in the graph design:

  • The Graph DB is shrunk the size can compare with RDBMS.

Comparison of Database Data Models

DatabaseFit scenario(s)Strength

Relation

Relational databases are based on the relational model, an intuitive, straightforward way of representing data in tables[8].

It has good performance OLTP (online transaction process). It's stable and has diversity tool pool in open source.

Key-value

Objects are only accessed via a single Key, object caching, and where objects are not related.

Scalable, a very fast random access via Key, and ease of data partitioning

Wide column

Batch-oriented parallel processing of large aggregated datasets

With regard to query workload, a hierarchy of aggregates, such as column-families, are designed that, in turn, increase the performance of queries.

Document

Data can be easily interpreted as documents and are constantly evolving.

A rich data model to store data with arbitrary complexity, such as nested structures, arrays, and scalar values; each component of a document can be accessed via secondary indices.

Graph

There is the need to traverse several levels of relationships among intensely related data.

Fast and simple querying of linked datasets, and easy mapping of entity-relationship diagrams

the suggestions:

According to the data, the model chooses the properest database. Can Refer to the Comparison form that can help to choose.

Security Issues of Database

General Architecture of Database Design in the whole system

The Database always does not access by outside services that hide insight into the service with middleware. So the authentication ways in database alway for internal service, such as Password base, Directory based services, PKI base services, and Kerberos.

  • Password base

    • The basic password is a string to access the database.

  • Directory base services

    • Directory service or name service maps the names of network resources to their respective network addresses[10].

  • PKI (public key infrastructure) base

    • It is a set of roles, policies, hardware, software, and procedures needed to create, manage, distribute, use, store and revoke digital certificates and manage public-key encryption[11].

  • Kerberos

    • Kerberos is a network authentication protocol. It is designed to provide strong authentication for client/server applications by using secret-key cryptography[12].

the suggestions:

There are no suggestions because the database supports auth method is familiar so we can ignore that.

Attachements

  • RDBMS

DatabaseAuthenticationAccess ControlConsistency Model

MySQL

Several auth methods

Role base Access Control

ACID

PostgreSQL

Several auth methods

Role base Access Control

ACID

DB2

Through OS,

Domain controller,

Kerberos

Role base Access Control

ACID

MsSQL

Through OS or mixed auth mode

Role base Access Control

ACID

  • NoSQL- Key-Val

DatabaseAuthenticationAccess ControlConsistency Model

Redis

Using password

No

CP

Riak Kv

Using password,

Pluggable auth module,

client certificate

using different permissions.

AP

  • NoSQL- Wide Column

DatabaseAuthenticationAccess ControlConsistency Model

Cassandra

Using password, External mechanism

Role base Access Control

AP

HBase

simple auth by password Kerberos

Role base Access Control.

Attribute-base access control by groups and access control list.

AP

  • NoSQL- Document

DatabaseAuthenticationAccess ControlConsistency Model

mongoDB

Using password, External mechanism, Kerberos

Role base Access Control

ACID

CouchDB

Using password, External mechanism, Kerberos

Role base Access Control

BASE

  • NoSQL- Graph

DababaseAuthenticationAccess ControlConsistency Model

neo4j

Using password, External mechanism, Kerberos

Role base Access Control

ACID

EdgeDB

Using password, External mechanism, Kerberos

Role base Access Control

ACID[13]

  • NewSQL

DatabaseAuthenticationAccess ControlConsistency Model

Apache Ignite

simple auth by password

No

ACID

VoltDB

Kerberos

using procedure base

ACID

Reference

[1] Davoudian, Ali, Liu Chen, and Mengchi Liu. "A survey on NoSQL stores." ACM Computing Surveys (CSUR) 51.2 (2018): 1-43. [2] Samaraweera, G. Dumindu, and J. Morris Chang. "Security and privacy implications on database systems in Big Data era: A survey." IEEE Transactions on Knowledge and Data Engineering 33.1 (2019): 239-258. [3] https://www.geeksforgeeks.org/acid-properties-in-dbms/ [4] https://en.wikipedia.org/wiki/Super_column_family [5] https://dzone.com/articles/nosql-database-types-1 [6] https://neo4j.com/developer/graph-db-vs-nosql/ [7] https://db-engines.com/en/ranking [8] https://www.oracle.com/database/what-is-a-relational-database/ [9] https://www.geeksforgeeks.org/difference-between-dfd-and-erd/?ref=gcse [10] https://en.wikipedia.org/wiki/Directory_service [11] https://en.wikipedia.org/wiki/Public_key_infrastructure [12] https://web.mit.edu/kerberos/ [13] https://www.edgedb.com/blog/edgedb-a-new-beginning

Last updated