In Computer Science Graph Data Structure is a collection of Nodes and edges. It’s used to represent relationships between different entities. Graph algorithms are methods used to manipulate and analyse graphs, solving various problems like finding the shortest path or detecting cycles. - geeksforgeeks (2024).https://www.geeksforgeeks.org/graph-data-structure-and-algorithms/
In Mathematics, a graph is a pictorial representation of any data in an organised manner. It was first introduced by a Swiss Mathematician known as Leonhard Euler. It was originally intended to solve many mathematical problems by constructing graphs based on given data or a set of points.
A graph database is a methodical assembly of information highlighting the interconnections among various data entities. Utilizing mathematical graph theory, NoSQL databases illustrate data associations. Unlike traditional relational databases with fixed table structures, graph databases organize data as a web of entities and connections, offering enhanced performance and adaptability for modelling real-world situations.
There are a lot of providers for graph DB, some of the popular cloud and SAAS providers are:
Neo4j
Amazon Neptune
Azure CosmosDB
Ontotext GraphDB
Let's visualize the following use case: There’s a user Jane. Jane is friends with Jack, Sophie and Mary. And similarly, they have their friends. A graphical representation of this would look like:
By considering individuals (nodes) and their connections (edges), it's possible to identify the acquaintances of a specific individual, such as Howard's associates, known as "friends of friends."
Neo4j has documented some great Use cases:
https://neo4j.com/use-cases/
I was curious about graph databases, so I decided to set up a sandbox to tinker around with. I chose to base it on a project I'm currently working on. I wanted to build a recommendation engine for a senior leader at a development-focused company.
There are multiple developers with a very particular set of skills, skills they have acquired over a very long career, skills that make them a nightmare for people. Along with this, there are projects that need these developers to complete said projects.
We have to create a recommendation engine that will suggest developers based on their “particular set of skills” and availability.
Like any solution, this also hinges on data. Here is how I modelled the data:
I had 3 nodes: Developers, Skills, and projects.
Developers just had a name and an ID for now.
Skills had the skill like Angular, React, and docker. And a proficiency rating.
Projects had a name and ID.
For the relationship between them, I had: Works on, Has Skills, and Requires.
Works-on has an ID and a capacity to indicate the percentage of assignment of a developer.
Has Skills, as the label suggests is a simple relationship.
“Requires” has skill and proficiency to indicate the relationship between skills and projects.
The next stage of the problem is to ingest data that makes sense and supports this model. I wrote a fairly simple script to insert “generative data”. Here’s the script:
This Script creates ~70 developer, ~14skills, ~20 projects. A snapshot of the nodes looks like this:
Let us look at each node-types relationships.
A developer has a “has_skill” relationship with the skills nodes. A snapshot of the graph will look like this:
Each project has a “requires” relationship with the skills nodes. This is to signify the skills a project would need in order to be completed.
Each developer has a “Works_on” relationship with projects. This relationship is created by matching the required skills of the project and the availability of the developer.
Now that we have the data in place, let us try to see if we can go ahead and write a query that will work as a recommendations engine. To test this out let us create a new project that needs the skills of an intermediate angular developer, a Beginner NodeJS dev and an Intermediate AWS specialist.
Now if we run the following query it should show us the developer who can work on this project:
the resulting graph for the above cypher query would be:
All the developers (in red) are available to be assigned to the project.
This was just a sandbox I had created to get some hands-on understanding of how the tech works. GraphDB is an interesting way to represent data and give a wide range of extracting meaning from the data stored. A graph database is most useful for data that has lots of connections and for tasks where you need to find both obvious and not-so-obvious relationships.