Databases. Which technology to choose? There is a multitude of information out in cyberspace about how different technologies work and which would be better for any given situation. I’d like to take some time to break down in simple terms, the difference between modeling data with a relational database and comparing to a graph database. I am not including a No-Sql or Document database in this comparison.
Let’s start with a simple scenario. We have people, who go to a library and interact with books.
In a relational database, everything is stored in tables (also called entities). A designer would create one table each for people, one for library and one for books. Each table has attributes (properties or columns). In the People table, you would store the name of that person. You would also have to store a unique identifier. There may be two John Smiths, so you would have to have a way to identify which John Smith you are referencing. This unique identifier (ID) is usually handled by the database automatically, for example the first record is 1, and the next is 2 and so on.
Next we create the table for library. Again, we add attributes for library name and a unique identifier (ID). Now with just this information we can connect people to libraries. In English we could say “John Smith belongs to Sunnyville Library” and to show that in a relational database we have to create another table. This table is called a join table because we are storing relationships between two other tables. Our join table is called PeopleLibraryJoin and it holds two attributes:
With this setup it is a MANY TO MANY join, meaning many people can belong to many libraries. The opposite saying is also true, many libraries have many people. Only the unique identifiers are stored, which matches with the unique identifier of the entity. If “John Smith” was the first person entered into the people table and his ID (identifier) was 1, then ID would store 1. And if the first library entered was “Sunnyville Library” and its ID was 1, then 1 is what its ID would contain.
Now we add books to our database structure. We start with table called Books and add properties of ID (unique identifier), book title, and author. We want to connect the books to a library that it is stored at. Since a book can only be stored at a single library at any given time, this type of join is a ONE TO MANY join. We create the relationship within the Books table (since it is the many side of the join) by storing LibraryID as a property of Books.
Lastly, we would like to show each book that a person has read. Since one person can read MANY books, and one book can be read by MANY people, we have to create another MANY TO MANY join table which we will call PeopleBooksJoin and the properties would be PeopleID and BookID.
This diagram of a basic relational database shows the complexity of even very simple joins.
With this all “wired up” (as I like to call it), here are some things we can do:
- Find all the libraries a person belongs to: From People, search for a person name (let’s say John Smith) and grab the ID of that person (1). Look in the PeopleLibraryJoin table and for each record where PeopleID = 1, grab the corresponding LibraryID and store it in a list. For each LibraryID in that list, look for the match in the actual Library table to return the library name.
- Find all the people that belong to a library: From Library, search for a library name (let’s say Sunnyville Library) and grab the ID of that library (1). Look in the PeopleLibraryJoin table and for each record where LibraryID = 1, grab the corresponding PeopleID and store it in a list. For each PeopleID in that list, look for the match in the actual People table and return the person name. This is the opposite as the last item, but same technique.
- Find all the books in a library: From Library, search a library name and grab the ID (Library.ID). Look in the Books table and create a list of all book titles that contain a match in the Books table property of LibraryID (Books.LibraryID). In code it would look something like, “where Library.ID equals Books.LibraryID”.
- Find all the books that a person has read: From People, search a person name and grab the ID (People.ID). In the PeopleBooksJoin table, create a list of all the records where that ID equals the PeopleID and return that list with corresponding BooksID. Match the BooksID to the ID in the Books table so you can return the book title property.
So how does all this compare to a graph database?
NOTE: I am trying to generalize graph databases, but each one is specific with different features and terminology. Since I have experience with Neo4j and they are one of the most popular graph database solutions, this information mostly aligns with their technology.
Some of the terminology is different. A table is a node and the relationships are normally referred to as edges. The edges can have direction. In our example, People read Books. Books, however, don’t read People back. Therefore, you can create a direction of People –Read-->Books. It makes sense. Now to find all the books a person has read, you can pass in the person, grab all the edges of “Read” and return the books on the other end of that relationship or edge. This is efficient and easy to comprehend.
The edges are treated as the same level as the nodes. This is referred to as a “First Class Citizen (or Entity)”. Compare to a relational database where another table has to be created to hold the matching ID’s as an attribute of the table and then go into the table to pull out those ID’s, a graph databases edges are stored in the same hierarchy as the nodes.
Another example: Library—Stores--> Books
There are two nodes, Library and Books, which are connected with the edge (relationship) of Stores. Again Stores is directional. You might be thinking that it seems obvious of the direction, so why have a direction? Let’s now say we need to know who follows who in our People. John—Follows-->Sally but Sally does not have to follow John back. We can simply add an edge of “Follows” and start connecting nodes.
As one can see by this diagram, a graph database can be more simple to model and easier to understand.
Another difference is handling of nulls (no data). A relational database is structured with what’s called a schema. The schema is explicitly defined, which tells the relational database what tables and attributes (columns) exist and what kind of data is allowed in each column. Think of it as a grid of squares. Each square is defined as to what kind of data it is expecting to hold. This square exists whether or not it contains data. In programming with relational databases, often times errors occur when trying to read data that isn’t there. Specific null checks have to be done and then dealt with. In graph databases, the properties of the nodes or the edges are either there or they are not. It doesn’t return a null; it simply doesn’t return anything it doesn’t find.
This article is meant to touch on the subject and show some key differences. Generally speaking, anything that can be created in one type of database can be mimicked in another with varying levels of complexity. Allow us at Code Level: Expert, LLC to analyse your unique business requirements and see which technique would be the most beneficial.