Database Schema & MySQL/Node.js Integration

Carrying on with the MySQL course…

Database Schema Design

Picking up from where I left off yesterday, today we built the schema for the Instagram clone database. Here’s what I could have improved on in my attempt, and some new things I learned to consider:

Don’t forget about using UNIQUE on fields that aren’t a primary key, but should also still be unique (like a username, email address, etc.)
If data in a table won’t be referenced by other tables, they don’t need to be given an id field. For example likes on a photo and follower-followee relationships may not need to be referenced specifically, so no need to create that extra field or use this memory.
You can set PRIMARY KEY to multiple columns; doing so will make the combination of those columns the primary key. This is useful when you want to allow a relationship only once (for example, only 1 like per user, per photo).

My method for the hashtags table was very inefficient; it would require hashtags to be stored multiple times and to be uniquely attributed to each photo. Instead it would be better to create two tables: one only for hashtags, and another solely for the relationships between tags & photos:

-- Not Good:
CREATE TABLE hashtags (
    id INT AUTO_INCREMENT PRIMARY KEY,
    hashtag VARCHAR(50),
    photo_id INT,
    FOREIGN KEY(photo_id) REFERENCES photos(id)
);

--Better:
CREATE TABLE tags (
    id INT AUTO_INCREMENT PRIMARY KEY,
    tag_name VARCHAR(255) UNIQUE,
    created_at TIMESTAMP DEFAULT NOW()
);

CREATE TABLE photo_tags (
    photo_id INT NOT NULL,
    tag_id INT NOT NULL,
    FOREIGN KEY(photo_id) REFERENCES photos(id),
    FOREIGN KEY(tag_id) REFERENCES tags(id),
    PRIMARY KEY(photo_id, tag_id)
);

Note on the hashtags: another good option for smaller (compared to hashtags on Instagram) data set might be to store the tags as text with each photo, as it would make searching for them faster. There’s a study on comparing the query times for all three of these hashtag search methods but apparently it was taken offline…but it’s in the course video for future reference.

Database Queries

Once the schema was written we got a huge data set to work with in order to practice writing queries. In going through the exercises another tip come up:

When looking for the “top 5” of a category one way to do it is ORDER BY category LIMIT 5. In the event of ties, the results will display whatever comes first—which would be a direct result of how you structure the query. So probably better to limit the result to slightly higher so that you can check for ties:

-- Notice a tie for 5th place:
+----------+-------+
| tag_name | count |
+----------+-------+
| smile    |    59 |
| beach    |    42 |
| party    |    39 |
| fun      |    38 |
| food     |    24 |
| lol      |    24 |
| concert  |    24 |
+----------+-------+

-- This query returns FOOD as number 5:
SELECT
    tag_name,
    COUNT(photo_id) AS count
FROM tags
JOIN photo_tags
    ON tags.id = photo_tags.tag_id
GROUP BY tags.id
ORDER BY count DESC
LIMIT 5;

-- This query returns CONCERT as number 5:
SELECT
    COUNT(pt.photo_id),
    t.tag_name
FROM photo_tags AS pt
INNER JOIN tags AS t
    ON pt.tag_id = t.id
GROUP BY pt.tag_id
ORDER BY COUNT(pt.photo_id) DESC
LIMIT 5;

Using MySQL with Node.js

Next we started building a Node app in order to see how MySQL can be integrated with a web application. The most popular Node ORM for MySQL is appropriately named mysql:

npm install mysql

We create a database from the MySQL CLI, then setting up the connection is similar to what I’ve seen before with Sequelize and Mongoose:

var mysql = require('mysql');

var connection = mysql.createConnection({
  host     : 'localhost',
  user     : 'username',
  database : 'join_us'  
});

And that simply we can write database queries from Node!:

var q = 'SELECT CURTIME() as time, CURDATE() as date, NOW() as now';
connection.query(q, function (error, results, fields) {
  if (error) throw error;
  console.log(results[0].time);
  console.log(results[0].date);
  console.log(results[0].now);
});

Another way to write queries or commands using objects of data (rather than a single datapoint) is as follows:

var person = {
    email: faker.internet.email(),
    created_at: faker.date.past()
};

var end_result = connection.query('INSERT INTO users SET ?', person, function(err, result) {
  if (err) throw err;
  console.log(result);
});

In this case the mysql ORM translates the query into INSTER INTO users (email, created_at) VALUES (e@b.com, 2017-01-08 14:28:39).

Up Next

Finish up this class tomorrow and then look for some other resources to practice writing queries.