Guide Area

Social Media Network Database Design using Graph Database (Neo4j)

Although relational databases are a powerful tool, they don’t fit in well in situations where many, many joins are required. The execution time becomes very high as soon as you do 3 or 4 joins on tables with tens of thousands records in them. Social Media Network is one of these cases – their complicated functionality require a lot of database tables, as well as many JOIN statements.

When it comes to social media, one of the most important things is response time. Imagine opening your favourite social media website’s home page and waiting for 8 seconds to see your newsfeed. You see your friends post and want to add a comment; you do so, and then you wait for another 5 seconds until the comment is added. That sound horrible to me!

This is where advantages of graph database kick in. Difference in performance between RDBMS and a Graph DB is not huge initially. Although, as your database grows to millions of entries, you will notice a huge difference. Of course, there are situations where relational database performs better and the fact is that the perfect social media database setup would consist of cluster of RDBMS/NoSQL databases, in-memory databases (such as Redis) and Graph databases, each used in specific scenario due to their advantages. But for a small home project, or a startup, a single graph DB will be more than enough.

Before you start, I suggest you read about how graph databases work. There is an excellent introduction article written by Neo4J community – give it a read.

Graph Databases for Beginners: Why Graph Technology Is the Future

The world of graph technology has changed (and is still changing), so we’re rebooting our “Graph Databases for Beginners” series to reflect what’s new in the world of graph tech – while also helping newcomers catch up to speed with the graph paradigm.

Examples, which you will see in this tutorial, were programmed in Python. You can of course use any language you want, as long as there is a Neo4J driver available for it.

A little about this tutorial

This tutorial is meant to do one and one thing only: get you started very quickly, by providing code which you can copy-paste into your project. I will not try to explain theory behind Neo4j, nor will I try to defend its advantages over relational/NoSQL databases.

The code examples that you will see on this page come from a working prototype of a social network based on links. People do not upload content; they either write a text-based status, or upload a link to their asset, such as news article, image on Tumlbr, or video on Youtube. After importing code from this article, you will have a backend with following functionality:

  • Create, update, retrieve user
  • Send and accept friend requests, delete friends
  • Display feed with your posts and posts of your friends (timeline view)
  • Add new feed item (a new link)
  • Like and comment on other posts

Disclaimer: Queries in this tutorial are not perfect …

This is my first Neo4j project and I simply wanted to share this experience with the world. There is a high possibility that you will find a piece of query which does not make a lot of sense to you the way it is, or could be optimized. I no longer have time for development of this project. The code provided in this article will work just fine, but needs to be polished if you plan to publish it.

To give you a piece of inspiration, my implementation of the social network looked like this:

You can extend properties of any node based on your requirements. I tried to simplify code examples as much as possible. If you’re having troubles with the code in this tutorial, don’t hesitate to leave a comment on the bottom of this page and I will do my best to help you 🙂

Regarding syntax…

Graph operations in Neo4j are executed using a language called Cypher. In order to write Cypher queries in a readable way, you will need to split them to multiple lines. In Python, this is possible by simply starting and ending each line with a double quote character. Remember to end each line by a whitespace character, otherwise Neo4j will throw exception.

Graph nodes in my tutorial carry one character names. I like to keep things simple. If you want to make your code more readable, go ahead – for instance, you can rename n:User to defaultuser:User if you like.

Requirements

You need a Neo4j installation to execute queries in this tutorial. The easiest way is to create a Docker instance. Because you also need the APOC library, you need to download it to a folder somewhere on your drive and attach this folder as a volume to the Neo4j container. This is a full example from the official Neo4j APOC documentation, modified by me:

mkdir plugins
cd plugins
wget https://github.com/neo4j-contrib/neo4j-apoc-procedures/releases/download/3.4/apoc-3.4-all.jar
cd ..
docker run --rm -e NEO4J_AUTH=none -p 7474:7474 -v $PWD/plugins:/plugins -v neo4j:/data -p 7687:7687 neo4j:3.4

I added the volume neo4j:/data so that you don’t lose your data every time you start Neo4j. Running the docker run command will create a neo4j/ directory in the path from which you call this command, with all DB data in it.

Create user

Command for creation of a new user with specific email, username and password, and retrieving the newly created user’s ID, goes like this:

userid = tx.run(
  "CREATE (n:User { email: $email, username: $username, password: $password }) "
  "RETURN ID(n)",
  email=your_email, 
  username=your_username, 
  password=your_password, 
)

Get user (for authentication)

This command will retrieve a single user filtered by specific username. You can of course filter by any other field, or by multiple fields at the same time.

user = tx.run( 
  "MATCH (m:User { username: $username }) RETURN ID(m) as id, "
  "m.username as username, m.email as email, "
  "m.password as password", 
  username=your_username
).single()

You can then work with user’s fields in the following manner:

result = {
  'username': user['username'],
  'email': user['email'],
  'password': user['password']
}
return result

Update user

Update user based on username:

tx.run(
  "MATCH (u:User { username: $username }) "
  "SET u.private = $private "
  username=your_username, private=private
)

Or based on ID:

tx.run(
  "MATCH (u:User) "
  "WHERE ID(u)=$userid "
  "SET u.email = $email"
  userid=your_userid, email=your_email
)

Get feed items

So far, things were easy, but feed operations are a bit more complicated. In order to retrieve and display all information about a feed (or a timeline if you wish), we need to request following data:

  • Find all feed items that either I or my friends created
  • If I did not create feed item, find the owner, so that I can display his name, profile picture, etc.
  • Get number of likes, comments, and a date of creation
  • Get all comments of all feed items, sorted by date and limited to a specific number, in order to “paginate” them

We can get all of it by executing this huge Cypher query:

feed_items = session.run(
    "MATCH (c:User { username: $username })<-[v:BELONGS_TO|VISIBLE_TO|VISIBLE_TO_AND_LIKED]-(a:FeedItem) "
    "OPTIONAL MATCH (a)--(b:Comment)-[:CREATED_BY]-(e:User) "
    "OPTIONAL MATCH (a)-[:BELONGS_TO]-(d:User) "
    "WITH a,c,d,v,b{.*, created_by: e.username, author_profile_image: e.profile_image} as ct "
    "RETURN ID(a) as id, a.text as text, a.type as type, a.data as data,  "
    "a.created_at as created_at,  "
    "c.username as username, TYPE(v) as rel, d.username as author_username,  "
    "d.profile_image as author_profile_image, count(ct) as comment_count, "
    "a.likes as likes_count, "
    "apoc.coll.sortMaps(collect(ct), 'created_at')[0..2] as comments "
    "ORDER BY a.created_at DESC "
    "SKIP $skip LIMIT $limit ",
    username=username, skip=skip, limit=limit
)

Let me try to explain. First, we try to find any a:FeedItem that the specific c:User can see. The user can either be an owner of the feed item, or it can be visible to him. After that, we need to find comments added to all nodes of type a:FeedItem found. This is an OPTIONAL MATCH, because any a:FeedItem can have 0 or more b:Comment nodes attached to it. If any exist, we also find their owners. Next step is also optional – if we are not owners of the a:FeedItem, we need to find the d:User who is.

Next step consists of some changes that we need to do in the b:Comment nodes. We want to keep all of their existing properties (.*) and add the username of the e:User who created the b:Comment and also URL to his profile picture, so that we can display it next to the user’s name.

Last thing that needs to be explained is the line

apoc.coll.sortMaps(collect(ct), 'created_at')[0..2] as comments

Here, we’re using a Neo4j library called APOC (click here to read more about the library). What happened to our nodes b:Comment so far is that we get all of them for every single feed item and enrich them with information about author. Now it’s time to sort them and retrieve a subset, for instance last two. The problem with Cypher is that you cannot do ORDER BY on two collections in one query. If you try to sort comments first, and then feed items (both by timestamp), Cypher will only sort the last collection – feed items. APOC library extends Cypher functionality and let’s us sort collections as we go. After we enriched the collection of b:Comment, we got a list of comments. Now, we transform them into a map using collect() function and use this map, together with field by which we want to sort, in function apoc.coll.sortMaps().

After that, we retrieve a subset 0..2 because the latest two comments are enough to be shown on timeline. If you want to read more comments, you either click “Comments” link, or switch to the specific feed item view, where you can show more of them at once. Check my example of a social media network in section #AboutThisTutorial again to see what I mean.

As soon as we have comments sorted and split into subset, we can sort the a:FeedItem result by timestamp using ORDER BY a.created_at DESC and limit the number of results. This allows us to do an interactive scrolling. At first, we only load five results using SKIP 0 LIMIT 5. As soon as we scroll on bottom of the page, we call this huge query again, but this time we start at position 5 and end at 10, so that we load another five: SKIP 5 LIMIT 5.

The result of this query will be an array of feed items, each having an array of comments. Therefore, to process them, iteration is needed:

result = []
for item in feed_items:
    comments = []
    for comment in item['comments']:
        comments.append({
            'created_by': comment['created_by'],
            'created_at': comment['created_at'],
            'author_profile_image': comment['author_profile_image'],
            'text': comment['text']
        })
    result.append({
        'id': item['id'],
        'text': item['text'],
        'type': item['type'],
        'data': json.loads(item['data']),
        'created_at': item['created_at'],
        'author_username': item['author_username'],
        'author_profile_image': item['author_profile_image'],
        'username': item['username'],
        'rel': item['rel'],
        'comments': comments,
        'comment_count': item['comment_count'],
        'likes_count': item['likes_count']
    })
return result

Get a single feed item

The functionality here is very similar to the one above. There are two differences: we only search for one specific feed item, not all of them, and we do not retrieve only a subset of comments, but all of them. The only thing you need to do here is to insert username who tries to display the feed item (in order to check that the user has correct relationships to view the item) and feed_item_id.

item = tx.run(
    "MATCH (n:User { username: $username })-[v:BELONGS_TO|VISIBLE_TO|VISIBLE_TO_AND_LIKED]-(f:FeedItem) "
    "WHERE ID(f) = $feed_item_id "
    "OPTIONAL MATCH (f)--(b:Comment)-[:CREATED_BY]-(e:User) "
    "OPTIONAL MATCH (f)-[:BELONGS_TO]-(d:User) "
    "WITH n,v,f,d,b{.*, created_by: e.username, author_profile_image: e.profile_image} as ct "
    "RETURN ID(f) as id, f.text as text, f.type as type, f.data as data,  "
    "f.created_at as created_at,  "
    "n.username as username, TYPE(v) as rel, d.username as author_username,  "
    "d.profile_image as author_profile_image, count(ct) as comment_count, f.likes as likes_count, "
    "apoc.coll.sortMaps(collect(ct), 'created_at') as comments ",
    username=username, feed_item_id=int(feed_item_id)
).single()

comments = []
for comment in item['comments']:
    comments.append({
        'created_by': comment['created_by'],
        'created_at': comment['created_at'],
        'author_profile_image': comment['author_profile_image'],
        'text': comment['text']
    })

result = {
    'id': item['id'],
    'text': item['text'],
    'type': item['type'],
    'data': json.loads(item['data']),
    'created_at': item['created_at'],
    'author_username': item['author_username'],
    'author_profile_image': item['author_profile_image'],
    'username': item['username'],
    'rel': item['rel'],
    'comments': comments,
    'comment_count': item['comment_count'],
    'likes_count': item['likes_count']
}

return result

Create feed item

When you create a feed item, two things need to happen. Firstly, you have to create the item itself and attach it to its owner using BELONGS_TO relationship. Secondly, you need to remember that when you add a feed item to your feed, you need to add it to feeds of your friends too by creating a VISIBLE_TO relationship between the feed item and all user’s that have a FRIEND relationship with you.

In this tutorial, we work with a social media network based on links. Every feed item has a data property in JSON format. When user is creating a new feed item and enters the URL (link) that he/she wants to add, you should fetch that URL and retrieve some information from its metadata. In my example, it’s the title, description and image. These three metadata fields should be stored in the data property. There are plenty of libraries that can crawl a URL and extract metadata, so go ahead and find one that you think is the best.

Even though the metadata of the link being added is stored in data property, we want to give users a possibility to enter their own status too. What you need to change is the username of the user that tries to create feed item, user’s text and a data JSON object which contains information about the link in feed item.

feed_item_id = tx.run(
    "CREATE (a:FeedItem { text: $text, type: $type, data: $data, "
    "created_at: $created_at, belongs_to: $username, likes: 0 }) "
    "WITH a "
    "MATCH (c:User { username: $username }) "
    "MERGE (a)-[:BELONGS_TO]->(c) "
    "RETURN ID(a)",
    username=username, text=text, type=data_type, 
    data=json.dumps(data), created_at=int(time.time())
).single().value()

tx.run(
    "MATCH (n:User { username: $username })-[:FRIEND]-(m:User) "
    "MATCH (i:FeedItem) WHERE ID(i) = $feed_item_id "
    "MERGE (i)-[:VISIBLE_TO]->(m)",
    username=your_username, feed_item_id=feed_item_id
)

Add comment to feed item

What happens here is that we check if the n:User has rights to view the specific f:FeedItem selected by feed_item_id. Then, we create the c:Comment node and we attach it to its owner n using CREATED_BY relationship, and to the corresponding feed item f using BELONGS_TO relationship.

You need to enter username of author of the comment, text of the comment and feed_item_id of the item to which the comment c is being added.

tx.run(
    "MATCH (n:User { username: $username })-[v:BELONGS_TO|VISIBLE_TO|VISIBLE_TO_AND_LIKED]-(f:FeedItem) "
    "WHERE ID(f) = $feed_item_id "
    "CREATE (c:Comment { text: $text, created_at: $created_at }) "
    "MERGE (c)-[:BELONGS_TO]->(f) "
    "MERGE (c)-[:CREATED_BY]->(n) "
    "RETURN c", 
    text=text, username=username, created_at=int(time.time()),
    feed_item_id=int(feed_item_id)
)

Like/Unlike feed item

When user likes a feed item, the relationship between these two nodes become of type VISIBLE_TO_AND_LIKED. But if the user already liked the specific feed item (meaning that this relationship already exists), he/she shouldn’t be able to like the item. Instead, an Unlike button should be shown. Therefore, before actually giving the user a possibility to like the feed item, we need to recognize what the current relationship between these two nodes is.

If the relationship is of type VISIBLE_TO_AND_LIKED and the user clicks the button, he/she unlikes the feed item. If the relationship is of type VISIBLE_TO, he/she likes the item.

can_like = tx.run(
    "MATCH (n:User { username: $username })-[v]-(f:FeedItem) "
    "WHERE ID(f) = $feed_item_id "
    "RETURN v", 
    username=username, feed_item_id=feed_item_id
).single().value()

if can_like.type == "VISIBLE_TO":
    tx.run(
        "MATCH (n:User { username: $username })-[v:VISIBLE_TO]-(f:FeedItem) "
        "WHERE ID(f) = $feed_item_id "
        "DETACH DELETE v "
        "MERGE (n)<-[:VISIBLE_TO_AND_LIKED]-(f) "
        "SET f.likes = f.likes + 1 "
        "RETURN f", username=username, feed_item_id=feed_item_id
    )

if can_like.type == "VISIBLE_TO_AND_LIKED":
    tx.run(
        "MATCH (n:User { username: $username })-[v:VISIBLE_TO_AND_LIKED]-(f:FeedItem) "
        "WHERE ID(f) = $feed_item_id "
        "DETACH DELETE v "
        "MERGE (n)<-[:VISIBLE_TO]-(f) "
        "SET f.likes = f.likes - 1 "
        "RETURN f", username=username, feed_item_id=feed_item_id
    )

What happens here is that we verify if the n:User has any relationship [v] with the f:FeedItem. Then, we check what the relationship is and based on this, we choose if the feed item should be liked or unliked. We delete the old relationship using DETACH DELETE v and apply a new relationship using MERGE. Also, we increase or decrease the f.likes count, depending on user action (like or unlike).

Get user (for profile view)

Remember the chapter #GetUser? This is a good method for getting user information upon login and storing them into session. But in order to show specific user’s profile and see, if we are friends with them, or not, we need an extended query.

result = tx.run(
    "MATCH (m:User { username: $friend_username }) "
    "OPTIONAL MATCH (m)-[r:FRIEND|FRIEND_REQUEST]-(n:User { username: $username }) "
    "RETURN m.username as username, m.private as private, " 
    "m.profile_image as profile_image, m.bio as bio, "
    "TYPE(r) as relationship, (startNode(r)).username as friend_requester",
    username=username, friend_username=friend_username
).single()

You can then work with user’s fields in the following manner:

n_result = {
    'username': result['username'],
    'private': result['private'],
    'profile_image': result['profile_image'],
    'bio': result['bio'],
    'relationship': result['relationship'],
    'friend_requester': result['friend_requester']
}

What happened in the query is that we check if we have relationship r of type either FRIEND or FRIEND_REQUEST. This would mean that we either already are friends, ore one of us asked the other to be his/her friends. If this relationship was found, we need to find the user who initiated the friend request, in order to display either “Friend request sent” on our profile page if it was us who sent the request, or “Accept friend request” button if it was the other user who asked us to be his/her friends. We figure this out by using query (startNode(r)).username as friend_requester, which tells us who initiated the friend request. Read the next chapter #SendFriendRequest to see how relationship directions are created.

Send friend request

This snippet is very simple, but there is one thing that you need to be careful with. When u:User asks v:User to become friends, the direction of the friend request needs to be created so that u-[]->v (notice the arrow). This is important because we need to display correct information in the UI. User u should see text “Friend request sent” and user v should see a button with text “Accept friend request“. Direction of the relationship allows us to differentiate between these two situations.

tx.run(
    "MATCH (u:User { username: $username }) "
    "MATCH (v:User { username: $friend_username }) "
    "MERGE (u)-[:FRIEND_REQUEST]->(v)",
    username=username, friend_username=friend_username
)

Now, let’s say we want to find out who initiated friend request. Having direction in place, it’s well knows who was the initiator. We can then match two users using any relationship, for instance u:User-[r:FRIEND_REQUEST]-v:User (no arrows here, Neo4j will find out if there is any direction and return it) and then find the initiator by using (startNode(r)).username. In our case, this would return user u.

Accept friend request

Accepting friend request means that we become friends with the target user. It also means that we should now be able to see all of their feed items. Remember chapter #CreateFeedItem? When creating feed item, we traversed through the graph to find all of our friends and linking our newly created feed item to their feed by adding a VISIBLE_TO relationship.

But this only covers feed items which will be created in the future. In order to also add feed items created in the past, a link is required for each of them.

tx.run(
    "MATCH (u:User { username: $username })<-[rel:FRIEND_REQUEST]-(t:User { username: $friend_username }) "
    "DELETE rel "
    "MERGE (u)-[:FRIEND]->(t) ",
    username=username, friend_username=friend_username
)

tx.run(
    "MATCH (f:FeedItem)-[:BELONGS_TO]->(u:User { username: $friend_username }) "
    "MATCH (t:User { username: $username }) "
    "MERGE (f)-[:VISIBLE_TO]->(t)",
    username=username, friend_username=friend_username
)

tx.run(
    "MATCH (f:FeedItem)-[:BELONGS_TO]->(u:User { username: $username }) "
    "MATCH (t:User { username: $friend_username }) "
    "MERGE (f)-[:VISIBLE_TO]->(t)",
    username=username, friend_username=friend_username
)

In the first run statement, we delete the old relationship between u:User and t:User and replace it with a FRIEND relationship, because we accepted their friend request. Great, now we’re friends. Let’s make sure that user u sees feed items of user t, and vice versa. This is exactly what we do in the second and third run statement. We find all f:FeedItem nodes which belong to user u and we make them VISIBLE_TO to user t. And we do the same with the other user.

Delete friend

Functionality in this chapter is very similar to the chapter #AcceptFriendRequest above, so I suggest you compare these two to find a difference. When deleting friend, we again need to update their relationship. Because we will no longer be friends, there won’t be any relationship at all, so we DELETE it. And because we shouldn’t be able to see each other’s feed items anymore, we also DELETE the relationship between our feed items and user nodes.

tx.run(
    "MATCH (u:User { username: $username })<-[rel:BELONGS_TO]-(f:FeedItem)-[rel2:VISIBLE_TO]->(t:User { username: $friend_username }) "
    "DELETE rel2 ",
    username=username, friend_username=friend_username
)

tx.run(
    "MATCH (u:User { username: $friend_username })<-[rel:BELONGS_TO]-(f:FeedItem)-[rel2:VISIBLE_TO]->(t:User { username: $username }) "
    "DELETE rel2 ",
    username=username, friend_username=friend_username
)

tx.run(
    "MATCH (u:User { username: $username })-[rel:FRIEND]-(t:User { username: $friend_username }) "
    "DELETE rel ",
    username=username, friend_username=friend_username
)

Interesting thing to notice is that we need to be a bit more selective when deleting relationships between t:User and f:FeedItem. During lifetime of the social media website, each user will have thousands, if not millions feed items linked to their account, but only a small subset of these feed items will belong to them. At the same time, these feed items can be visible to many other users, not only user t. So it is very important that we find feed items that BELONG_TO our user u:User and at the same time are VISIBLE_TO user t:User who we plan to unfriend. Other feed items have to remain untouched.

And that’s about it!

Coding a social media website takes a lot of work. You have to implement an authentication process and role management, develop a backend and a frontend. I was aiming to take weight off your shoulders by providing backend database queries. Feel free to pick any programming language to include them in your backend.

Leave a comment if you have any issues and I will try to help 🙂  I hope this tutorial helped you and good luck with development!

Vladimir Marton

DevOps Engineer focused on cloud infrastructure, automation, CI/CD and programming in Javascript, Python, PHP and SQL. Guidearea is my oldest project where I write articles about programming, marketing, SEO and others.

1 comment