26 - MongoDB#
Document Stores#
Documents encapsulate and encode data in some standard formats or encodings. These encodings include:
JSON and XML
Binary forms like BSON, PDF, and Microsoft Office documents
Documents are good for storing semi-structured data, but only okay structured and unstructured data.
Document stores are similar to key-value stores, but provide more functionality. They recognize the structure of the objects stored in the database, and these objects are grouped into collections. Document stores have simple query mechanisms to search collections for attribute values.
Structure of Document Stores#
Collections correspond to tables in RDBS
Documents correspond to rows in RDBS
Field corresponds to attributes in RDBS
However, not all documents in a collection have the same attributes
Documents are addressed in a database via a unique key
Allows beyond the simple key-value lookup (think nested lists and dictionaries in JSON)
APIs and query languages allow retrieval of documents based on their fields
MongoDB Overview#
MongoDB (from huMONGOus) is the product of a MongoDB Inc. In MongoDB,
Each document has an ID (key-value pair)
Collections can be created at run-time
Documents’ structure not required to be the same, although it may be
MongoDB stores objects in BSON format
Binary encoding of JSON
Uses associative arrays
A field in MongoDB can be any BSON data type
{
name: {first: "Sue", last: "Sky"},
age: 39,
teaches: ["database", "cloud"],
degrees: [{school: "UIUC", degree: "PhD"}, {school: "SIU", degree: "MS"}, {school: "Northwestern", degree: "BA"}]
}
JSON data types
An object (JSON object)
An array
A string
A number
A boolean
NULL
Operations within MongoDB queries are limited, and additional operations must be supported by a programming language.
MongoDB has no join, but it does have $lookup
Mongo shell scripts are also an option
Many performances optimizations must be implemented by the developer.
MongoDB uses indexes:
Single field indexes used the top level and in sub-documents
Text indexes are used in the searching of string content documents
Hashed indexes
Geospatial indexes and queries
Using MongoDB#
To issue a command in MongoDB, you must first specify the database:
use DatabaseName
Collection methods:
CRUD
insert(), find(), update(), remove()
Aggregate
count(), aggregate(), etc.
databaseName.collectionName.methodName()
Create a collection:
db.createCollection(name, options)
db.createCollection("project", {capped: true, size: 1310720, max: 500})
Can specify the
size
,index
, andmax #
If collection is
capped
, the size is fixed and going over the limit will overwrite old data, or you can useinsert
and it will be created
With MongoDB, you can build a database incrementally without modifying the schema, since there is no schema.
Each document in a database automatically gets an “_id” field.
db.hotels.insert({name: "Motel 6", options: {smoking: "yes", pet: "yes"}});
d1 = {name: "Metro Blu", address: "Chicago, IL", rating: 3.5};
db.hotels.insert(d1);
CRUD#
Create
db.createCollection(collection)
Insert
db.collection.insert({name: ‘Sue’, age: 39})
Remove can be used to delete all documents or just some documents
db.collection.remove({}) // removes all docs
db.collection.remove({status: “D”}) // some docs
Update
db.collection.update({age: {$gt: 21}}, // criteria
{$set: {status: “A”}}, // action
{multi: true}) // updates multiple docs
Read returns a cursor that can be used in subsequent cursor methods
db.Hotels.find({rating: 4.5});
db.hotels.find({address: {$regex: "CA"}});
The find() query can be generalized as either of the following:
db.collection.find(<criteria>, <projection>)
or
`db.collections.find({select conditions}, {project columns})
Selection#
To match the value of a field, use:
db.collection.find({c1:5})
For multiple “and” conditions, you can list them:
db.collection.find({c1:5, c2: "Sue"})
We can also use other comparators, e.g. $gt, $lt, $regex, etc:
db.collection.find({c1: {$gt: 5}})
We can connect several conditions with \(and or \)or and brackets []:
db.collection.find({$and:[{c1:{$gt:5}},{c2:{$lt:2}}]})
Note that this is the same as:
db.collection.find({c1:{$gt:5},{c2:{$lt:2}}})
Projection#
To specify a subset of fields, use 0 to exclude and 1 to include. Note that _id is set to 1 by default. You cannot mix 0s and 1s, except for _id.
db.collection.find({Name: ''Sue''}, {Name:1, Address:1, _id:0})
You can also specify a set of fields without any select conditions:
db.collection.find({},{Name:1, Address:1, _id:0})
When referencing a field within a document, use dot notation with quotes around the dotted name (e.g. “address.zipcode”)