24 - NoSQL Databases

24 - NoSQL Databases#

Pros and Cons of Relational Databases#

Relational Positives#

Can represent relationships in data
Easy to understand relational model/SQL
Disk-oriented storage
Indexing structures
Consistent values in database (transactions)

Relational Negatives#

RDBS can be strict and complex
- Want more freedom and/or simplicity
Overhead of object to relational mapping
- Want to store as is
Cannot always partition/distribute from a single database server
- Want to distribute data
RDBS limited in throughput
- Want higher throughput
Must scale up (expensive servers)
- Want to scale out (wide - cheap servers)
RDBS providers were slow to move to the cloud
- Everyone wants to use the cloud

Data Today#

Structured Data#

Information in databases

Data organized into chunks, similar entities grouped together
Descriptions for entities in groups (same format, length, etc.)

Semi-structured Data#

Data has a certain structure, but not all items are identical

Similar entities are grouped together (may have different attributes)
Schema may be mixed in with data values
Self-describing data, e.g. XML
May be displayed as a graph

Unstructured Data#

Data can be of any type, may have no format or sequence

Cannot be represented by any type of schema
- HTML web pages
- Video, sound, images

NoSQL#

NoSQL is rejection of the relational model. It can stand for “No to SQL” or “Not only SQL.”

There are 4 main types of NoSQL databases:

Column stores (BigTable, Hbase, Cassandra)
Key-value stores (DynamoDB, Voldemort)
Document stores (MongoDB, CouchDB)
Graph-based stores (Neo4j)

Column vs Row Stores#

Row Storage#

A relational table is serialized as rows that are appended and flushed to disk
Whole datasets can be R/W with a I/O operation
Good locality of access on disk
Operations on columns are expensive since extra data must be read

Column Storage#

Serializes tables by appending columns and flushing to disk
Operations on columns are fast and cheap
Operations on rows are expensive since extra data must be read
Good for aggregations

Column Storage with Locality Groups#

Like column storage, but groups columns expected to be accessed together
Store groups together and physically separated from other column groups
- Google’s BigTable

Column Stores#

Stores data as tables
- Advantages for data warehouses, customer relationship management (CRM) systems
- More efficient for:
  - Aggregates
  - Update rows in same column
  - Easier to compress

Keys#

Most NoSQL databases utilize the concept of keys
In column store - called key or row key
Row keys are strings that must have the property that they can be lexicographically ordered
Each column/column family data stored along with key

HBase Column Store#

HBase is an open-source, distributed, versioned, non-relational, column-oriented data store
It is an Apache project whose goal is to provide storage for the Hadoop Distributed Computing
Data is logically organized into tables, rows, and columns
Based on Google’s BigTable
Each column family is stored in a separate file
Different sets of column families may have different properties and access patterns
Tables have one primary index: the row key
- Tables sorted by row key
HBase treats everything as bites

Basic CRUD Operations#

Creating a table: create <tablename>, <column family>, <column family>, ...
Inserting data: put <tablename>, <rowid>, <column family>:<column qualifier>, <value>
Reading data (all in a table): scan <tablename>
Retrieve data (one item): get <tablename>, <rowid>

Examples#

Creating a table called EMPLOYEE with three column families: Name, Address, and Details

create 'EMPLOYEE', 'Name', 'Address', 'Details'

Inserting data

put 'EMPLOYEE', 'row1', 'Name:Fname', 'John'
put 'EMPLOYEE', 'row1', 'Name:Lname', 'Smith'
put 'EMPLOYEE', 'row1', 'Name:Nickname', 'Johnny'
put 'EMPLOYEE', 'row1', 'Details:Job', 'Engineer'
put 'EMPLOYEE', 'row1', 'Details:Review', 'Good'