Gremlinq With Cosmos DB

One of the most impressive features of Cosmos DB is its support for many different APIs including the Gremlin API. When using the Gremlin API, Cosmos DB effectively becomes a graph database. Graph databases are particularly suitable for use cases with data models which involve highly and deeply connected entities with many links between them.

Here’s an example of a typical Gremlin query. This query retrieves users who are followers of the user with the id a24c891d-a118-46be-ae7c-1fe5faa02996.

V('a24c891d-a118-46be-ae7c-1fe5faa02996').in('Follows').hasLabel('User')

Notice how the Gremlin query has a nice, fluent syntax? Wouldn’t it be great if we could create Gremlin queries using C# in a similar fluent way, while maintaining strong-typing? Fortunately, the Gremlinq library let’s us do just that. Gremlinq is like entity framework for graph databases. The above Gremlin query could look like this in C# when using Gremlinq:

var bobsFollowers = await g
    .V(bob.Id)
    .In<Follows>()
    .OfType<User>()
    .ToArrayAsync();

We’re going to demonstrate the power of using Gremlinq with Cosmos Db by implementing a simple Twitter clone. The source code for the complete solution can be found here.

Scenario

The image above represents the situation we want to model. We’re going to use Gremlinq together with the Cosmos Db emulator to create the data above. Then we will write code to run queries which will answer the following questions:

  1. Who are Bob’s followers?
  2. Who liked Tweet1?
  3. Who liked and retweeted Tweet1?
  4. Who does Bob follow that follows him back?

Prerequisites

In order to follow this tutorial, we will need the following software installed:

After the Cosmos DB emulator is installed, we also need to run the following command in a command prompt to enable the Gremlin API. If the emulator is not installed in C:\Program Files\Azure Cosmos DB Emulator, we need to adjust the command accordingly.

C:\"Program Files"\"Azure Cosmos DB Emulator"\CosmosDB.Emulator.exe /EnableGremlinEndpoint

Project Setup

Let’s start by creating a C# console app. Let’s run the following commands in a command prompt to generate the console app:

dotnet new sln -n GremlinqDemo 
dotnet new console -n GremlinqDemo
dotnet sln add GremlinqDemo/GremlinqDemo.csproj

Now, let’s run the following commands in the same command prompt to add the required NuGet packages to the console app:

dotnet add .\GremlinqDemo\GremlinqDemo.csproj package Microsoft.Azure.Cosmos
dotnet add .\GremlinqDemo\GremlinqDemo.csproj package ExRam.Gremlinq.Providers.CosmosDb

The first NuGet package Microsoft.Azure.Cosmos will be used to create a new Cosmos DB database and container in the Cosmos DB emulator. The second NuGet package ExRam.Gremlinq.Providers.CosmosDb is the Gremlinq package, which includes the Cosmos DB provider for Gremlinq and the core Gremlinq functionality.

Data Modelling

Graphs are made up of vertices and edges. Vertices represent entities and edges represent the relationships between entities. In the diagram of our data model, the arrows represent edges and the circles and rectangles represent vertices.

From the diagram above we can see that there are two unique types of vertices:

  1. Users
  2. Tweets

We can also see that there are four unique types of edges:

  1. Follows
  2. Liked
  3. Tweeted
  4. Retweeted

We will need to add a class for every type of edge and every type of vertex. However, Gremlinq requires us to have a base Vertex class which all vertex classes will inherit from and a base Edge class which all edge classes will inherit from. Let’s open the solution Gremlinq.sln in Visual Studio add those base classes first:

//Vertex.cs
namespace GremlinqDemo.Models
{
    class Vertex
    {
        public string Id { get; set; }
        public string PartitionKey { get;} = "default";
    }
}
//Edge.cs
namespace GremlinqDemo.Models
{
    class Edge
    {
        public string Id { get; set; }
        public string PartitionKey { get;} = "default";
    }
}

Note that we have hardcoded the PartitionKey property to always have the same value for simplicity. In real applications, we should instead choose an appropriate value for the partition key.

Now that we’ve added our base classes, let’s add the classes for all our vertex and edge types:

//User.cs
namespace GremlinqDemo.Models
{
    class User : Vertex
    {
        public string Name { get; set; }
    }
}
//Tweet.cs
namespace GremlinqDemo.Models
{
    class Tweet : Vertex
    {
        public string Content { get; set; }
    }
}
//Follows.cs
namespace GremlinqDemo.Models
{
    class Follows : Edge
    {
    }
}
//Liked.cs
namespace GremlinqDemo.Models
{
    class Liked : Edge
    {
    }
}
//Retweeted.cs
namespace GremlinqDemo.Models
{
    class Retweeted : Edge
    {
    }
}

//Tweeted.cs
namespace GremlinqDemo.Models
{
    class Tweeted : Edge
    {
    }
}

Code Overview

Now, let’s open up the program.cs file and replace its contents with the following code:

using ExRam.Gremlinq.Core;
using GremlinqDemo.Models;
using Cosmos = Microsoft.Azure.Cosmos;
using System;
using System.Threading.Tasks;
using System.Linq;

namespace GremlinqDemo
{
    class Program
    {
        static private string DatabaseName = "twitter";
        static private string GraphName = "twitter";
        static private string PartitionKeyPath = "/PartitionKey";
        static private string CosmosConnectionString = "AccountEndpoint=https://localhost:8081/;AccountKey=C2y6yDjf5/R+ob0N8A7Cgv30VRDJIWEHLM+4QDU5DE2nQ9nDuVTqobD4b8mGGyPMbIZnqyMsEcaGQy67XIw/Jw==";
        static private string GremlinEndpointUrl = "ws://localhost:8901/";
        static private string CosmosDbAuthKey = "C2y6yDjf5/R+ob0N8A7Cgv30VRDJIWEHLM+4QDU5DE2nQ9nDuVTqobD4b8mGGyPMbIZnqyMsEcaGQy67XIw/Jw==";

        static async Task Main(string[] args)
        {
            await CreateDb();

            IGremlinQuerySource g = GetGremlinSource();

            await CreateData(g);

            await RunQueries(g);

            await DeleteData(g);
        }

        private static async Task CreateDb()
        {
            throw new NotImplementedException();
        }

        private static IGremlinQuerySource GetGremlinSource()
        {
            throw new NotImplementedException();
        }

        private static async Task CreateData(IGremlinQuerySource g)
        {
            throw new NotImplementedException();
        }

        private static async Task RunQueries(IGremlinQuerySource g)
        {
            throw new NotImplementedException();
        }

        private static async Task DeleteData(IGremlinQuerySource g)
        {
            throw new NotImplementedException();
        }

    }
}

We can see from our Main method that our console app performs five actions:

  1. Create the database
  2. Get the Gremlin query source
  3. Create the data
  4. Run queries
  5. Delete the data

We’re going to implement each of these actions, one at a time.

Note: If we were using a real Cosmos DB account instead of the emulator, we would have to change the values for the CosmosConnectionString, GremlinEndpointUrl and CosmosDbAuthKey fields appropriately in the code above.

Create The Database

We going to create the Cosmos DB database and container from within our console app. To do this we will implement our CreateDb method with the following code:

private static async Task CreateDb()
{
    var client = new Cosmos.CosmosClient(CosmosConnectionString);
    //Create Database
    var createResponse = await client.CreateDatabaseIfNotExistsAsync(DatabaseName, 4000);
    var db = createResponse.Database;
    //Create Container
    await db.CreateContainerIfNotExistsAsync(GraphName, PartitionKeyPath);
}

The above code uses the CosmosClient class from the Microsoft.Azure.Cosmos package in order to create the database and container.

Get The Gremlin Query Source

When using Gremlinq, all interaction with data happens through an object of type IGremlinQuerySource. We need to configure this object and make it available to our application so that we can create and query data.

When configuring the IGremlinQuerySource for use with Cosmos DB, we need to provide it the following information:

  • Base class types for edges and vertices
  • Cosmos DB Gremlin endpoint URL
  • Cosmos DB database name
  • Cosmos DB container name
  • Cosmos DB access key

Let’s configure the IGremlinQuerySource object by implementing our GetGremlinSource method with the following code:

private static IGremlinQuerySource GetGremlinSource()
{
    return GremlinQuerySource.g
        .ConfigureEnvironment(env => env
            .UseModel(GraphModel
                //Specify base classes for vertices(Vertex) and edges(Edge)
                .FromBaseTypes<Vertex, Edge>(lookup => lookup.IncludeAssembliesOfBaseTypes()))
            .UseCosmosDb(builder => builder
                //Specify CosmosDb Gremlin endpoint URL, DB name and graph name. 
                .At(new Uri(GremlinEndpointUrl), DatabaseName, GraphName)
                //Specify CosmosDb access key
                .AuthenticateBy(CosmosDbAuthKey)
                ));
}

Create The Data

Now that we’ve configured out Gremlin query source, we can use it to create our data. We know that we need to create the following:

  • Four users (Alice, Bob, Charlie, Diana)
  • One tweet
  • Follows relationships between users
  • Liked, tweeted and retweeted relationships between users and tweets

Let’s start by creating a small helper method in the Program class to create a user:

static async Task<User> CreateUser(IGremlinQuerySource g, string name) =>
    await g
        //AddV means add vertex (in this case the vertex is a user
        .AddV(new User { Name = name })
        //FirstAsync will return the created user
        .FirstAsync();

This helper method uses the IGremlinQuerySource to add a user vertex to the graph by using the AddV method.

Now let’s add another helper method to create a Follows relationship. The method will take in two users and create an edge between them such that the first user follows the second user:

static async Task Follows(IGremlinQuerySource g, User follower, User followee) =>
    await g
        //Get the first user vertex
        .V(follower.Id)
        //Add an edge from the first user of type "Follows" to the second user
        .AddE<Follows>()
        .To(__ => __.V(followee.Id))
        //awaiting FirstAsync will ensure the "Follows" edge gets created
        .FirstAsync();

Let’s add yet another helper method. This one will be used to create a Liked relationship between a user and the tweet which they liked. The definition of this helper method looks very similar to the previous one, except that the relationship is between a user and a tweet instead of a user and another user.

static async Task Liked(IGremlinQuerySource g, User liker, Tweet tweet) =>
    await g
        .V(liker.Id)
        //Add an edge from the user of type "Liked" to the tweet
        .AddE<Liked>()
        .To(__ => __.V(tweet.Id))
        .FirstAsync();

Let’s add one more method to create a Retweeted relationship between a user and the tweet they retweeted. This will be almost identical to the previous method above.

static async Task Retweeted(IGremlinQuerySource g, User retweeter, Tweet tweet) =>
    await g.V(retweeter.Id)
        .AddE<Retweeted>()
        .To(__ => __.V(tweet.Id))
        .FirstAsync();

We have just one more helper method to define before implementing the CreateData method. This is the most important one of all! The method will create a tweet vertex and add a tweeted relationship from the user in the same query:

static async Task<Tweet> CreateTweet(IGremlinQuerySource g, User tweeter, string content) =>
    await g
        //Get the tweeter(user) vertex
        .V(tweeter.Id)
        // add a "Tweeted" edge from the user to a new tweet vertex created in the same query
        .AddE<Tweeted>()
        .To(__ => __.AddV(new Tweet { Content = content }))
        //Traverse from the "Tweeted" edge to the actual tweet
        //InV means go from the edge to the vertex it is going into
        .InV<Tweet>()
        .FirstAsync();

With all our helper methods in place, we’re ready to create our data. We can use our diagram above as a reference and implement our CreateData method with the following code:

private static async Task CreateData(IGremlinQuerySource g)
{
    //Create users
    var alice = await CreateUser(g, "Alice");
    var bob = await CreateUser(g, "Bob");
    var charlie = await CreateUser(g, "Charlie");
    var diana = await CreateUser(g, "Diana");

    //Create "Follows" edges
    await Follows(g, alice, bob);
    await Follows(g, bob, charlie);
    await Follows(g, bob, diana);
    await Follows(g, charlie, bob);

    //Create tweet
    var tweet1 = await CreateTweet(g, bob, "I love using gremlinq!");

    //Create "Liked" edges
    await Liked(g, alice, tweet1);
    await Liked(g, charlie, tweet1);

    //Create "Retweeted" edge
    await Retweeted(g, alice, tweet1);
}

Run Queries

We’ve created our data, so we can now run some queries to answer our questions. Our first query will answer the question: who follows Bob? We’ll write a helper method to run this query and print the results to the console:

private static async Task WhoFollowsBob(IGremlinQuerySource g, User bob)
{
    var bobsFollowers = await g
        //Get all users with a Follows edge going "In" (pointing) to Bob
        .V(bob.Id)
        .In<Follows>()
        .OfType<User>()
        .ToArrayAsync();

    Console.WriteLine();
    Console.WriteLine("Users who follow Bob:");
    foreach (var user in bobsFollowers)
        Console.WriteLine(user.Name);
}

Let’s break down the above query. When we run V(bob.Id), we get the vertex which represents Bob. Then when we run In<Follows>() we get all vertices which have an edge of type Follows going into Bob. Running OfType<User>() filters those vertices to just ones which are of type User. Finally, calling and awaiting ToArrayAsync() will execute the query and return users who follow Bob.

Next let’s write a query to answer our second question: who liked Tweet1? The code for this query will look very similar to the previous one. The only difference is that we’re searching for Liked relationships instead of Follows relationships.

private static async Task WhoLikedTweet1(IGremlinQuerySource g, Tweet tweet1)
{
    var tweetLikers = await g
        .V(tweet1.Id)
        .In<Liked>()
        .OfType<User>().ToArrayAsync();

    Console.WriteLine();
    Console.WriteLine("Users who liked Tweet1:");
    foreach (var user in tweetLikers)
        Console.WriteLine(user.Name);
}

Now let’s try a slightly more interesting query. Let’s find all the users that both liked and retweeted Tweet1:

private static async Task WhoLikedAndRetweetedTweet1(IGremlinQuerySource g, Tweet tweet1)
{
    var likedAndRetweeted = await g
        //retrieve the list of users who liked the tweet
        .V(tweet1.Id)
        .In<Liked>()
        .OfType<User>()
        .Fold()
        //From the users who liked the tweet, get only the ones who also retweeted it.
        .As((__, likers) => __
            .V(tweet1.Id)
            .In<Retweeted>()
            .OfType<User>()
            .Where(retweeter => likers.Value.Contains(retweeter)))
        .ToArrayAsync();

    Console.WriteLine();
    Console.WriteLine("Users who liked and retweeted Tweet1:");
    foreach (var user in likedAndRetweeted)
        Console.WriteLine(user.Name);
}

The query above can be broken down into two pieces. The first part is almost identical to the previous query which retrieves users who liked Tweet1: V(tweet1.Id).In<Liked>().OfType<User>().Fold(). The only difference is the added Fold() method. This aggregates the users who liked the tweets into an array, so that we can run the second part of the query upon that array.

The second part of the query begins with the As method. This lets us run a query using the array of users who liked the tweet. We provide a lambda function inside the As method which represents this query. It starts like this: (__, likers) => __.V(tweet.Id).In<Retweeted>().OfType<User>(). This gets us the users which retweeted Tweet1. The second part of the query ends with Where(retweeter => likers.Value.Contains(retweeter). This Where function ensures that only users who both retweeted and liked Tweet1 are returned from the query.

We have just one more question to answer: who does Bob follow that also follows him? Here’s a query which will give us the answer:

private static async Task WhoFollowsBobBack(IGremlinQuerySource g, User bob)
{
    var followsBobBack = await g
        //Find the users who are followed by Bob
        .V(bob.Id)
        .Out<Follows>()
        .OfType<User>()
        .Fold()
        //Filter to the users who follow Bob back
        .As((__, bobFollows) => __
            .V(bob.Id)
            .In<Follows>()
            .OfType<User>()
            .Where(bobsFollower => bobFollows.Value.Contains(bobsFollower)));

    Console.WriteLine();
    Console.WriteLine("Users who follow Bob and are followed by him :");
    foreach (var user in followsBobBack)
        Console.WriteLine(user.Name);
}

This query has a very similar structure to the previous one. The first part of the query finds users who are followed by Bob: V(bob.Id).Out<Follows>().OfType<User>().Fold(). We’ve already seen quite a few queries like this, but there is one different part here. Instead of using In<Follows> we used Out<Follows>. This means that we are searching for the users who have a Follows edge which is coming out of Bob, meaning that Bob follows them.

The second part of the query filters the list of returned users to ones who also follow Bob: As((__, bobFollows) => __.V(bob.Id).In<Follows>().OfType<User>().Where(bobsFollower => bobFollows.Value.Contains(bobsFollower)).

So now that we have all our queries we just need to put them together by implementing the RunQueries method:

private static async Task RunQueries(IGremlinQuerySource g)
{
    var bob = await g
        .V()
        .OfType<User>()
        .Where(u => u.Name == "Bob")
        .FirstAsync();

    //We only created one tweet so we know tweet1 will be the first
    var tweet1 = await g
        .V()
        .OfType<Tweet>()
        .FirstAsync();

    await WhoFollowsBob(g,bob);
    await WhoLikedTweet1(g, tweet1);
    await WhoLikedAndRetweetedTweet1(g, tweet1);
    await WhoFollowsBobBack(g, bob);
}

When we finally run our console app and this method is called, we will see the results of all of our queries. But before we run the console app, we have one more step to finish: implementing the deletion of all the created data.

Delete The Data

We have one last method to implement: DeleteData. This method should delete all data in the database so we can run our console app again with perfectly repeatable results. The implementation consists of two lines of code, one to delete all the edges in the graph, and one to delete all the vertices:

private static async Task DeleteData(IGremlinQuerySource g)
{
    //Get all edges in the database and delete them 
    await g.E().Drop();
    //Get all vertices in the database and delete them 
    await g.V().Drop();
}

Run The App

We can now go ahead and run the console app from Visual Studio. Our results should be as follows:

If we compare this against our graph diagram, we can see that the results are correct:

More On Gremlinq and Cosmos DB

We hope that this post was helpful for anyone learning how to use Gremlinq with Cosmos DB. Here are some more resources on this subject: