Graph Databases Simplified: A Neo4j Exploration

Category Data Engineering, Data Science

What are graph databases: A graph database is a collection of nodes and edges. Graph DB stores relationships that resemble our natural way of looking at things. for example, if we need to store information about a family, in relational DB we can model it in one way as a table with column-like [Id, name, fathers_id, married_to]. Here if we see at this table, we can’t see the relationships directly, we can get the required result for questions like “who is the father of X?” by joins and other SQL clauses.
But if we use a graph database for the same, we can see the relationships in a natural way.


Till this point only one to one kind of relation are covered, what if we need to store information about Mother’s name as well, at this time in graph DB we just need to add a relationship(edge) from son to mother with relationship name. But in relational we need to alter the schema. Graph DB’s don’t have a fixed schema(Mostly). Schema depends on data and is evolving.

Now here comes the question when graph DB’s are so good, why don’t we use them everywhere or when and where to use graph DB’s or relational DB’s
While reading about Graph DB’s I came across a very nice answer about the same.

The primary difference is that in a graph database, the relationships are stored at the individual record level, while in a relational database, the structure is defined at a higher level (the table definitions).

This has important ramifications:

A relational database is much faster when operating on huge numbers of records. In a graph database, each record has to be examined individually during a query in order to determine the structure of the data, while this is known ahead of time in a relational database.
Relational databases use less storage space, because they don’t have to store all of those relationships.
Storing all of the relationships at the individual-record level only makes sense if there is going to be a lot of variation in the relationships; otherwise you are just duplicating the same things over and over. This means that graph databases are well-suited to irregular, complex structures. But in the real world, most databases require regular, relatively simple structures. This is why relational databases predominate.

Some popular graph Db’s are –Neo4j, OrientDB, ArangoDB, MarkLogic, etc.
Here we will be discussing Neo4j.


What Is a Graph Database and Property Graph | Neo4j

While other databases compute relationships expensively at query time, only a database that embraces relationships as a…


Neo4j is an open-source NoSQL native graph database which provides an ACID-compliant transactional backend for your applications.
Neo4j is referred to as a native graph database because it implements the Property Graph Model efficiently down to the storage level. As opposed to graph processing or in-memory libraries, Neo4j provides full database characteristics including ACID transaction compliance, cluster support, and runtime failover, making it suitable to use graph data in production scenarios.

Some particular features make Neo4j very popular among developers, architects and DBAs:

Cypher, a declarative query language similar to SQL, but optimized for graphs. Now used by other databases like SAP HANA Graph and Redis graph via the openCypher project.
Constant time traversals in big graphs both in depth and in breadth due to efficient representation of nodes and relationships. Enables scale-up to billions of nodes on moderate hardware.
Flexible property graph schema that can adapt over time, making it possible to materialize and use new relationships later on to “shortcut” and speed up the domain data when the business needs change.
Drivers for popular programming languages, including Java, JavaScript, .NET and Python.

While learning Neo4j, the focus was fixed on a basic understanding of
1:Understand what is neo4j
2:Basic CRUD operations using Cypher

Running Neo4j in localhost

but before we start, we need to run Neo4j
Docker Neo4j image was used to run neo4j locally

docker run --publish=7474:7474 --volume=$HOME/neo4j/data:/data neo4j:2.3

Create– single node, relationships

Creating a single node:
To create a node, the basic syntax is

create(variable:Label{attribute:attribute value,})

This variable can be used in the next part of the cipher.


here we haven’t used variable because we just want to create the node and not using in the current operation. The label is ‘Person’ and ‘name’ is an attribute with “Me” as an attribute value.

Creating other nodes and relationships:
Let’s create some of my friends and relationships with them. for this, we need a node that contains my info to create a relationship with.
for that match, operation is used.

foreach(name in [‘A’,’B’,’C’,’D’]|create (p)-[:Friend]->(:Person{name:name}))

here a for each loop is used to create relationships and nodes from a list of names.

Creating friend relation b/w A and B:

create (a)-[relation:Friend]->(b)
return a,relation,b

Added A’s relationship:

create (a)-[c:ows]->(b:Animal{name:’J’})
return a,c,bmatch(a:Person{name:’A’})
create (a)-[c:Friend]->(b:Person{name:’E’})
return a,c,b

Created D’s Friends

create (a)-[:Friend]->(b:Person{name:’G’})
create (b)-[:Friend]->(c:Person{name:’H’})

Created C’s friend and expert as F(both person and expert) who worked on neo database

create (a)-[:Friend]->(b:Person:Expert{name:’F’})
create (b)-[:Worked_on]->(c:Database{name:’Neo’})

after running all this queries and few more, The final graph looks like


Read type queries- match queries for nodes and relationships

for directly searching using a property name:

match(p {name:’J’})
return p

Finding in relationships:
finding my friends:

return a,my_friends

Friend of Friend:

return a,my_friends,my_friends_friend

3rd degree friend of friend:

return a,my_friends

The shortest path for searching an expert in database

match path = shortestPath((user)-[*..3]->(exp))
return user,path,exp,knows,db

Update:-Set command is used for the update.
Here also the way is same, first find the node/relationship, store it into a variable, and use set command on the variable attribute.

return db

to delete everything, we use the below command, for selective delete we can find the node in the match clause and then delete the same. detach is used in case the node is in some relationship, we need to delete the relationship first. but delete detach will remove the relationship as well.

detach delete p

Ready to embark on a transformative journey? Connect with our experts and fuel your growth today!