26 - MongoDB#

Document Stores#

Documents encapsulate and encode data in some standard formats or encodings. These encodings include:

  • JSON and XML

  • Binary forms like BSON, PDF, and Microsoft Office documents

Documents are good for storing semi-structured data, but only okay structured and unstructured data.

Document stores are similar to key-value stores, but provide more functionality. They recognize the structure of the objects stored in the database, and these objects are grouped into collections. Document stores have simple query mechanisms to search collections for attribute values.

Structure of Document Stores#

  • Collections correspond to tables in RDBS

  • Documents correspond to rows in RDBS

  • Field corresponds to attributes in RDBS

    • However, not all documents in a collection have the same attributes

  • Documents are addressed in a database via a unique key

  • Allows beyond the simple key-value lookup (think nested lists and dictionaries in JSON)

  • APIs and query languages allow retrieval of documents based on their fields

MongoDB Overview#

MongoDB (from huMONGOus) is the product of a MongoDB Inc. In MongoDB,

  • Each document has an ID (key-value pair)

  • Collections can be created at run-time

  • Documents’ structure not required to be the same, although it may be

MongoDB stores objects in BSON format

  • Binary encoding of JSON

  • Uses associative arrays

A field in MongoDB can be any BSON data type

{
    name: {first: "Sue", last: "Sky"},
    age: 39,
    teaches: ["database", "cloud"],
    degrees: [{school: "UIUC", degree: "PhD"}, {school: "SIU", degree: "MS"}, {school: "Northwestern", degree: "BA"}]
}

JSON data types

  • An object (JSON object)

  • An array

  • A string

  • A number

  • A boolean

  • NULL

Operations within MongoDB queries are limited, and additional operations must be supported by a programming language.

  • MongoDB has no join, but it does have $lookup

  • Mongo shell scripts are also an option

Many performances optimizations must be implemented by the developer.

MongoDB uses indexes:

  • Single field indexes used the top level and in sub-documents

  • Text indexes are used in the searching of string content documents

  • Hashed indexes

  • Geospatial indexes and queries

Using MongoDB#

To issue a command in MongoDB, you must first specify the database:

use DatabaseName

Collection methods:

  • CRUD

    • insert(), find(), update(), remove()

  • Aggregate

    • count(), aggregate(), etc.

databaseName.collectionName.methodName()

Create a collection:

db.createCollection(name, options)
db.createCollection("project", {capped: true, size: 1310720, max: 500})
  • Can specify the size, index, and max #

  • If collection is capped, the size is fixed and going over the limit will overwrite old data, or you can use insert and it will be created

With MongoDB, you can build a database incrementally without modifying the schema, since there is no schema.

Each document in a database automatically gets an “_id” field.

db.hotels.insert({name: "Motel 6", options: {smoking: "yes", pet: "yes"}});

d1 = {name: "Metro Blu", address: "Chicago, IL", rating: 3.5};
db.hotels.insert(d1);

CRUD#

Create

db.createCollection(collection)

Insert

db.collection.insert({name: ‘Sue’, age: 39})

Remove can be used to delete all documents or just some documents

db.collection.remove({})            // removes all docs
db.collection.remove({status: “D”}) // some docs

Update

db.collection.update({age: {$gt: 21}},      // criteria
                     {$set: {status: “A”}}, // action
                     {multi: true})         // updates multiple docs

Read returns a cursor that can be used in subsequent cursor methods

db.Hotels.find({rating: 4.5});
db.hotels.find({address: {$regex: "CA"}});

The find() query can be generalized as either of the following:

db.collection.find(<criteria>, <projection>)

or

`db.collections.find({select conditions}, {project columns})

Selection#

To match the value of a field, use:

db.collection.find({c1:5})

For multiple “and” conditions, you can list them:

db.collection.find({c1:5, c2: "Sue"})

We can also use other comparators, e.g. $gt, $lt, $regex, etc:

db.collection.find({c1: {$gt: 5}})

We can connect several conditions with \(and or \)or and brackets []:

db.collection.find({$and:[{c1:{$gt:5}},{c2:{$lt:2}}]})

Note that this is the same as:

db.collection.find({c1:{$gt:5},{c2:{$lt:2}}})

Projection#

To specify a subset of fields, use 0 to exclude and 1 to include. Note that _id is set to 1 by default. You cannot mix 0s and 1s, except for _id.

db.collection.find({Name: ''Sue''}, {Name:1, Address:1, _id:0})

You can also specify a set of fields without any select conditions:

db.collection.find({},{Name:1, Address:1, _id:0})

When referencing a field within a document, use dot notation with quotes around the dotted name (e.g. “address.zipcode”)