Skip to content

Mark Needham
Syndicate content
Thoughts on Software Development
Updated: 2 hours 39 min ago

Neo4j/R: Analysing London NoSQL meetup membership

Sat, 05/31/2014 - 23:32

In my spare time I’ve been working on a Neo4j application that runs on tops of meetup.com’s API and Nicole recently showed me how I could wire up some of the queries to use her Rneo4j library:

@markhneedham pic.twitter.com/8014jckEUl

— Nicole White (@_nicolemargaret) May 31, 2014

The query used in that visualisation shows the number of members that overlap between each pair of groups but a more interesting query is the one which shows the % overlap between groups based on the unique members across the groups.

The query is a bit more complicated than the original:

MATCH (group1:Group), (group2:Group)
OPTIONAL MATCH (group1)<-[:MEMBER_OF]-()-[:MEMBER_OF]->(group2)
 
WITH group1, group2, COUNT(*) as commonMembers
MATCH (group1)<-[:MEMBER_OF]-(group1Member)
 
WITH group1, group2, commonMembers, COLLECT(id(group1Member)) AS group1Members
MATCH (group2)<-[:MEMBER_OF]-(group2Member)
 
WITH group1, group2, commonMembers, group1Members, COLLECT(id(group2Member)) AS group2Members
WITH group1, group2, commonMembers, group1Members, group2Members
 
UNWIND(group1Members + group2Members) AS combinedMember
WITH DISTINCT group1, group2, commonMembers, combinedMember
 
WITH group1, group2, commonMembers, COUNT(combinedMember) AS combinedMembers
 
RETURN group1.name, group2.name, toInt(round(100.0 * commonMembers / combinedMembers)) AS percentage		 
ORDER BY group1.name, group1.name

The next step is to wire that up to use Rneo4j and ggplot2. First we’ll get the libraries installed and loaded:

install.packages("devtools")
devtools::install_github("nicolewhite/Rneo4j")
install.packages("ggplot2")
 
library(Rneo4j)
library(ggplot2)

And now we’ll execute the query and create a chart from the results:

graph = startGraph("http://localhost:7474/db/data/")
 
query = "MATCH (group1:Group), (group2:Group)
         WHERE group1 <> group2
         OPTIONAL MATCH p = (group1)<-[:MEMBER_OF]-()-[:MEMBER_OF]->(group2)
         WITH group1, group2, COLLECT(p) AS paths
         RETURN group1.name, group2.name, LENGTH(paths) as commonMembers
         ORDER BY group1.name, group2.name"
 
group_overlap = cypher(graph, query)
 
ggplot(group_overlap, aes(x=group1.name, y=group2.name, fill=commonMembers)) + 
geom_bin2d() +
geom_text(aes(label = commonMembers)) +
labs(x= "Group", y="Group", title="Member Group Member Overlap") +
scale_fill_gradient(low="white", high="red") +
theme(axis.text = element_text(size = 12, color = "black"),
      axis.title = element_text(size = 14, color = "black"),
      plot.title = element_text(size = 16, color = "black"),
      axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1))
 
// as percentage
 
query = "MATCH (group1:Group), (group2:Group)
         WHERE group1 <> group2
         OPTIONAL MATCH path = (group1)<-[:MEMBER_OF]-()-[:MEMBER_OF]->(group2)
 
         WITH group1, group2, COLLECT(path) AS paths
 
         WITH group1, group2, LENGTH(paths) as commonMembers
         MATCH (group1)<-[:MEMBER_OF]-(group1Member)
 
         WITH group1, group2, commonMembers, COLLECT(id(group1Member)) AS group1Members
         MATCH (group2)<-[:MEMBER_OF]-(group2Member)
 
         WITH group1, group2, commonMembers, group1Members, COLLECT(id(group2Member)) AS group2Members
         WITH group1, group2, commonMembers, group1Members, group2Members
 
         UNWIND(group1Members + group2Members) AS combinedMember
         WITH DISTINCT group1, group2, commonMembers, combinedMember
 
         WITH group1, group2, commonMembers, COUNT(combinedMember) AS combinedMembers
 
         RETURN group1.name, group2.name, toInt(round(100.0 * commonMembers / combinedMembers)) AS percentage
 
         ORDER BY group1.name, group1.name"
 
group_overlap = cypher(graph, query)
 
ggplot(group_overlap, aes(x=group1.name, y=group2.name, fill=percentage)) + 
  geom_bin2d() +
  geom_text(aes(label = percentage)) +
  labs(x= "Group", y="Group", title="Member Group Member Overlap") +
  scale_fill_gradient(low="white", high="red") +
  theme(axis.text = element_text(size = 12, color = "black"),
        axis.title = element_text(size = 14, color = "black"),
        plot.title = element_text(size = 16, color = "black"),
        axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1))
2014 05 31 21 54 42

A first glance at the visualisation suggests that the Hadoop, Data Science and Big Data groups have the most overlap which seems to make sense as they do cover quite similar topics.

Thanks to Nicole for the library and the idea of the visualisation. Now we need to do some more analysis on the data to see if there are any more interesting insights.

Categories: Blogs

Thoughts on meetups

Sat, 05/31/2014 - 21:50

I recently came across an interesting blog post by Zach Tellman in which he explains a new approach that he’s been trialling at The Bay Area Clojure User Group.

Zach explains that a lecture based approach isn’t necessarily the most effective way for people to learn and that half of the people attending the meetup are likely to be novices and would struggle to follow more advanced content.

He then goes on to explain an alternative approach:

We’ve been experimenting with a Clojure meetup modelled on a different academic tradition: office hours.

At a university, students who have questions about the lecture content or coursework can visit the professor and have a one-on-one conversation.

At the beginning of every meetup, we give everyone a name tag, and provide a whiteboard with two columns, “teachers” and “students”.

Attendees are encouraged to put their name and interests in both columns. From there, everyone can [...] go in search of someone from the opposite column who shares their interests.

While running Neo4j meetups we’ve had similar observations and my colleagues Stefan and Cedric actually ran a meetup in Paris a few months ago which sounds very similar to Zach’s ‘office hours’ style one.

However, we’ve also been experimenting with the idea that one size doesn’t need to fit all by running different styles of meetups aimed at different people.

For example, we have:

  • An introductory meetup which aims to get people to the point where they can follow talks about more advanced topics.
  • A more hands on session for people who want to learn how to write queries in cypher, Neo4j’s query language.
  • An advanced session for people who want to learn how to model a problem as a graph and import data into a graph.

I’m also thinking of running something similar to the Clojure Dojo but focused on data and graphs where groups of people could work together and build an app.

I noticed that Nick Manning has been doing a similar thing with the New York City Neo4j meetup as well, which is cool.

I’d be interested in hearing about different / better approaches that other people have come across so if you know of any let me know in the comments.

Categories: Blogs

Neo4j: Cypher – UNWIND vs FOREACH

Sat, 05/31/2014 - 16:19

I’ve written a couple of posts about the new UNWIND clause in Neo4j’s cypher query language but I forgot about my favourite use of UNWIND, which is to get rid of some uses of FOREACH from our queries.

Let’s say we’ve created a timetree up front and now have a series of events coming in that we want to create in the database and attach to the appropriate part of the timetree.

Before UNWIND existed we might try to write the following query using FOREACH:

WITH [{name: "Event 1", timetree: {day: 1, month: 1, year: 2014}}, 
      {name: "Event 2", timetree: {day: 2, month: 1, year: 2014}}] AS events
FOREACH (event IN events | 
  CREATE (e:Event {name: event.name})
  MATCH (year:Year {year: event.timetree.year }), 
        (year)-[:HAS_MONTH]->(month {month: event.timetree.month }),
        (month)-[:HAS_DAY]->(day {day: event.timetree.day })
  CREATE (e)-[:HAPPENED_ON]->(day))

Unfortunately we can’t use MATCH inside a FOREACH statement so we’ll get the following error:

Invalid use of MATCH inside FOREACH (line 5, column 3)
"  MATCH (year:Year {year: event.timetree.year }), "
   ^
Neo.ClientError.Statement.InvalidSyntax

We can work around this by using MERGE instead in the knowledge that it’s never going to create anything because the timetree already exists:

WITH [{name: "Event 1", timetree: {day: 1, month: 1, year: 2014}}, 
      {name: "Event 2", timetree: {day: 2, month: 1, year: 2014}}] AS events
FOREACH (event IN events | 
  CREATE (e:Event {name: event.name})
  MERGE (year:Year {year: event.timetree.year })
  MERGE (year)-[:HAS_MONTH]->(month {month: event.timetree.month })
  MERGE (month)-[:HAS_DAY]->(day {day: event.timetree.day })
  CREATE (e)-[:HAPPENED_ON]->(day))

If we replace the FOREACH with UNWIND we’d get the following:

WITH [{name: "Event 1", timetree: {day: 1, month: 1, year: 2014}}, 
      {name: "Event 2", timetree: {day: 2, month: 1, year: 2014}}] AS events
UNWIND events AS event
CREATE (e:Event {name: event.name})
WITH e, event.timetree AS timetree
MATCH (year:Year {year: timetree.year }), 
      (year)-[:HAS_MONTH]->(month {month: timetree.month }),
      (month)-[:HAS_DAY]->(day {day: timetree.day })
CREATE (e)-[:HAPPENED_ON]->(day)

Although the lines of code has slightly increased the query is now correct and we won’t accidentally correct new parts of our time tree.

We could also pass on the event that we created to the next part of the query which wouldn’t be the case when using FOREACH.

Categories: Blogs

Neo4j: Cypher – Neo.ClientError.Statement.ParameterMissing and neo4j-shell

Sat, 05/31/2014 - 14:44

Every now and then I get sent Neo4j cypher queries to look at and more often than not they’re parameterised which means you can’t easily run them in the Neo4j browser.

For example let’s say we have a database which has a user called ‘Mark’:

CREATE (u:User {name: "Mark"})

Now we write a query to find ‘Mark’ with the name parameterised so we can easily search for a different user in future:

MATCH (u:User {name: {name}}) RETURN u

If we run that query in the Neo4j browser we’ll get this error:

Expected a parameter named name
Neo.ClientError.Statement.ParameterMissing

If we try that in neo4j-shell we’ll get the same exception to start with:

$ MATCH (u:User {name: {name}}) RETURN u;
ParameterNotFoundException: Expected a parameter named name

However, as Michael pointed out to me, the neat thing about neo4j-shell is that we can define parameters by using the export command:

$ export name="Mark"
$ MATCH (u:User {name: {name}}) RETURN u;
+-------------------------+
| u                       |
+-------------------------+
| Node[1923]{name:"Mark"} |
+-------------------------+
1 row

export is a bit sensitive to spaces so it’s best to keep them to a minimum. e.g. the following tries to create the variable ‘name ‘ which is invalid:

$ export name = "Mark"
name  is no valid variable name. May only contain alphanumeric characters and underscores.

The variables we create in the shell don’t have to only be primitives. We can create maps too:

$ export params={ name: "Mark" }
$ MATCH (u:User {name: {params}.name}) RETURN u;
+-------------------------+
| u                       |
+-------------------------+
| Node[1923]{name:"Mark"} |
+-------------------------+
1 row

A simple tip but one that saves me from having to rewrite queries all the time!

Categories: Blogs