+ - 0:00:00
Notes for current slide
Notes for next slide

Designing Data Intensive Applications

Chapter 2: Data Models and Query Languages

By Tarik Eshaq and Co-pilot 🐙
1 / 48

Why Data Models matter 🐬

2 / 48

Why Data Models matter 🐬

Abstractions all the way down!

3 / 48

Why Data Models matter 🐬

Application developers model the real world using data structures

4 / 48

Why Data Models matter 🐬

Application developers model the real world using data structures

Data structures are modeled and stored using general-purpose data models (e.g., JSON, XML, CSV, etc.)

5 / 48

Why Data Models matter 🐬

Application developers model the real world using data structures

Data structures are modeled and stored using general-purpose data models (e.g., JSON, XML, CSV, etc.)

The database software you use stores those general purpose data models in terms of bytes in memory or on disk

6 / 48

Why Data Models matter 🐬

Application developers model the real world using data structures

Data structures are modeled and stored using general-purpose data models (e.g., JSON, XML, CSV, etc.)

The database software you use stores those general purpose data models in terms of bytes in memory or on disk

The hardware represents those bytes in terms of electrical currents and pulses of light
7 / 48

So many models nowadays 📷 💃 🕺

8 / 48

So many models nowadays 📷 💃 🕺

So pick the right model for the job

9 / 48

To SQL or not to SQL 🐢

The Relational Model

10 / 48

The Relational Model

How it works

11 / 48

The Relational Model

How it works

  • Data is organized into relations or tables
  • Each relation is an unordered collection of tuples
12 / 48

The Relational Model

Example Relational table

The people table

id name age
1 Tarik 21
2 Miguel 5
3 Ethan 69
13 / 48

The Relational Model

Example Relational table

The people table

id name age
1 Tarik 21
2 Miguel 5
3 Ethan 69

The friends table

friend1 friend2
1 2
1 3
2 3
14 / 48

The Relational Model

Lets write a query to find out Ethan's friends

SELECT name FROM people
WHERE id IN (SELECT friend2 FROM friends WHERE friend1 = 3 or friend2 = 3));
15 / 48

NoSql 📦

16 / 48

The Document Model

Motivation

17 / 48

The Document Model

Motivation

  • Need for greater scalability that relational databases can easily provide, when it comes to very high write throughput
  • Free and open source software FTW
  • Specialized queries that SQL may not support well
  • SQL is pretty darn restrictive
18 / 48

The Document Model

What it is

  • A document is a collection of key-value pairs
  • Documents are stored in a collection
  • Documents are identified by a unique key or ID
19 / 48

The one-to-many problem 🤔

20 / 48

The one-to-many problem 🤔

  • SQL kinda craps out when it comes to one-to-many relationships
  • You have one of the following three options:
    • Store the one-to-many relationship in a separate table
    • Use structured data types to represent the one-to-many relationship (e.g., JSON, XML, CSV, etc.)
    • Use a structured datatype and encode it into text
21 / 48

The one-to-many problem

SQL

id name location
1 Ethan New York

id job_id job_title
1 1 Software Engineer at Hootsuite
1 2 Software Engineer at PDT
1 3 Floor mopper at The Miguel Company
22 / 48

The one-to-many problem

NoSQL

{
"id": 1,
"name": "Ethan",
"jobs": [
{
"id": 1,
"title": "Software Engineer at Hootsuite"
},
{
"id": 2,
"title": "Software Engineer at PDT"
},
{
"id": 3,
"title": "Floor mopper at The Miguel Company"
}
],
"location": "New York"
}
23 / 48

EVERYTHING GETS A TABLE 💻

id name location
1 Ethan New York
2 Miguel New York

What if The city of New York was renamed?

24 / 48

EVERYTHING GETS A TABLE 💻

id name location
1 Ethan 314141414
2 Miguel 314141414

location_id location_name
314141414 Goblins
25 / 48

NoSQL trying to join the cool table

People Locations
{
"id": 1,
"name": "Ethan",
"jobs": [
{
"id": 1,
"title": "Software Engineer at Hootsuite"
},
{
"id": 2,
"title": "Software Engineer at PDT"
},
{
"id": 3,
"title": "Floor mopper at The Miguel Company"
}
],
"location": 314141414
},
{
"id": 2,
"name": "Miguel",
"jobs": [],
"location": 314141414
}
{
"location_id": 314141414,
"location_name": "Goblins",
}
f0
26 / 48

NoSQL joining the table

// Joins the people table with the locations table
// to find out where Ethan and Miguel live
db.people.find({
$or: [
{ id: 1 },
{ id: 2 }
]
}, {
_id: 0,
name: 1,
location: 1
}).forEach(function(person) {
db.locations.findOne({
location_id: person.location
}, {
_id: 0,
location_name: 1
}, function(location) {
console.log(person.name + " lives in " + location.location_name);
});
});
$ node index.js
Ethan lives in Goblins
Miguel lives in Goblins
27 / 48

NoSQL vs SQL - Fight! 🤜 🤛

28 / 48

NoSQL vs SQL - Fight! 🤜 🤛

  • Pros for Document Database:
    • Schema flexibility
    • Better performance due to locality
    • Closer to the data structures the used by the application
29 / 48

NoSQL vs SQL - Fight! 🤜 🤛

  • Pros for Document Database:
    • Schema flexibility
    • Better performance due to locality
    • Closer to the data structures the used by the application
  • Pros for Relational Database:
    • Better support for joins
    • Many-to-many and many-to-one relationships are better supported
30 / 48

NoSQL vs SQL - Fight! 🤜 🤛

What data model leads to simpler application code?

  • If your data has a document like structure (with one-to-many relationships), then most likely document database!
  • If your data has a relational structure (with many-to-many relationships), then most likely relational database!
31 / 48

NoSQL vs SQL - Fight! 🤜 🤛

Schema Flexibility

  • Document databases are schema-on-read:
    • The data assumes the structure of the schema, when it is read
    • The schema is not enforced
  • Relational databases are schema-on-write:
    • The schema is enforced
    • The data is assumed to have the structure of the schema when it is written to the database
32 / 48

NoSQL vs SQL - Fight! 🤜 🤛

Schema Flexibility

How to change the schema

  • SQL:
    --- Migrate the people table to have a first and last name
    --- Then move the split the old name column into first and last name columns
    ALTER TABLE people
    ADD COLUMN first_name VARCHAR(255)
    ADD COLUMN last_name VARCHAR(255);
    --- Split the name column into first and last name columns
    UPDATE people
    SET first_name = SUBSTRING(name, 1, LOCATE(' ', name) - 1),
    last_name = SUBSTRING(name, LOCATE(' ', name) + 1);
    --- Drop the old name column
    ALTER TABLE people
    DROP COLUMN name;
33 / 48

NoSQL vs SQL - Fight! 🤜 🤛

Schema Flexibility

How to change the schema

  • NoSQL:
// If the user doesn't have a `first_name` or `last_name` field,
// then we can add them by splitting the `name` field
if (!person.first_name || !person.last_name) {
person.first_name = person.name.split(' ')[0];
person.last_name = person.name.split(' ')[1];
}
// Then we can assume the user has a `first_name` and `last_name` field...
34 / 48

NoSQL vs SQL - Fight! 🤜 🤛

Schema Flexibility

And what if not all documents have the same structure? 😱

35 / 48

NoSQL vs SQL - Fight! 🤜 🤛

Data locality

  • In the Document model, all the data is stored in the same document, so there is an advantage of loading all of it up
  • In the Relational model, the data could be split across multiple tables, requiring multiple index lookups
  • That said, newer technologies have done wonders..
36 / 48

NoSQL and SQL - Unite! 🤝

  • SQL started supporting structured data types, like JSON and XML
  • MongoDB and others started supporting native joins..
37 / 48

Query Languages and Abstractions

38 / 48

Declarative vs Imperative Query Languages

  • Declarative languages hide the implementation details of the query, and describe only the data that it needs
  • Imperative languages describe the steps needed to retrieve the data
39 / 48

Declarative vs Imperative Query Languages

SQL

--- We join the location table to get the people
--- who live in the Goblins
SELECT people.first_name
FROM people
JOIN locations
ON people.location = locations.location_id
WHERE locations.location_name = 'Goblins';

JavaScript

// we have to look at the people array and
// location array to get the people that live in Goblins
const result = [];
for (const person of people) {
for (const location of locations) {
if (person.location === location.location_id &&
location.location_name === 'Goblins') {
result.push(person.first_name);
}
}
}
40 / 48

Declarative vs Imperative Query Languages

MapReduce

  • We'll talk about MapReduce later (psst Batch Processing!)
  • It's not either of the two, but it's a nice abstraction.
  • Consumers write two functions, a map function and a reduce function. The map function is called once for each document in the collection. The reduce function is called once for each key in the collection.
41 / 48

Declarative vs Imperative Query Languages

// MongoDB mapReduce example
// to emit the number of people in each location
const res = db.people.mapReduce(
// The map function
function() {
emit(this.location, 1);
},
// The reduce function
function(key, values) {
return Array.sum(values);
},
{
out: 'people_by_location'
}
);
// output the results
res.find().forEach(function(doc) {
const location = db.locations.findOne({
location_id: doc.key
});
console.log(doc.value + " people live in " + location.location_name);
});
$ node index.js
2 people live in Goblins
42 / 48

ENTER GRAPH DATA MODEL 📈 📈 📈

43 / 48

ENTER GRAPH DATA MODEL 📈 📈 📈

  • Graph data models are a way to represent relationships between data
  • Graph data models are great for modeling very connected data
  • In graph data models, relationships are represented by edges and nodes represent entities
  • Graph data models are often used to represent social networks
44 / 48

ENTER GRAPH DATA MODEL 📈 📈 📈

// dgraph query to get the people and their locations (Looks like GraphQL!)
const query = `
query {
people(func: has(location)) {
name
location {
location_name
}
}
}
`;
const res = await dgraph.newTxn().query(query);
console.log(res);
45 / 48

ENTER GRAPH DATA MODEL 📈 📈 📈

$ node index.js
{
"people": [
{
"name": "Ethan",
"location": {
"location_name": "Goblins"
}
},
{
"name": "Miguel",
"location": {
"location_name": "Goblins"
}
}
]
}
46 / 48

Summary

  • Deciding on a data model is a big decision
  • It's important to understand the data models you can choose from
  • There are a few models that are great for different types of data
  • There are many many models out there, the following are the most common nowadays
    • Relational data models
    • Document data models
    • Graph data models
  • The query language matters too, and there are two main types:
    • Declarative query languages
    • Imperative query languages
47 / 48

PS I didn't cover those:

  • Network data models
  • The history of it all
  • RDF data models
  • SPARQL
  • Cypher query languages
  • Datalog and functional programming (almost)
48 / 48

Why Data Models matter 🐬

2 / 48
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow