Skip to content

Mark Needham
Syndicate content
Thoughts on Software Development
Updated: 47 min 30 sec ago

Neo4j 2.0.0: Query not prepared correctly / Type mismatch: expected Map

Sun, 04/13/2014 - 19:40

I was playing around with Neo4j’s Cypher last weekend and found myself accidentally running some queries against an earlier version of the Neo4j 2.0 series (2.0.0).

My first query started with a map and I wanted to create a person from an identifier inside the map:

WITH {person: {id: 1}} AS params
MERGE (p:Person {id:})

When I ran the query I got this error:

==> SyntaxException: Type mismatch: expected Map but was Boolean, Number, String or Collection<Any> (line 1, column 62)
==> "WITH {person: {id: 1}} AS params MERGE (p:Person {id:}) RETURN p"

If we try the same query in 2.0.1 it works as we’d expect:

==> +---------------+
==> | p             |
==> +---------------+
==> | Node[1]{id:} |
==> +---------------+
==> 1 row
==> Nodes created: 1
==> Properties set: 1
==> Labels added: 1
==> 47 ms

My next query was the following which links topics of interest to a person:

WITH {topics: [{name: "Java"}, {name: "Neo4j"}]} AS params
MERGE (p:Person {id: 2})
FOREACH(t IN params.topics | 
  MERGE (topic:Topic {name:})
  MERGE (p)-[:INTERESTED_IN]->(topic)

In 2.0.0 that query fails like so:

==> InternalException: Query not prepared correctly!

but if we try it in 2.0.1 we’ll see that it works as well:

==> +---------------+
==> | p             |
==> +---------------+
==> | Node[4]{id:2} |
==> +---------------+
==> 1 row
==> Nodes created: 1
==> Relationships created: 2
==> Properties set: 1
==> Labels added: 1
==> 53 ms

So if you’re seeing either of those errors then get yourself upgraded to 2.0.1 as well!

Categories: Blogs

install4j and AppleScript: Creating a Mac OS X Application Bundle for a Java application

Mon, 04/07/2014 - 02:04

We have a few internal applications at Neo which can be launched using ‘java -jar ‘ and I always forget where the jars are so I thought I’d wrap a Mac OS X application bundle around it to make life easier.

My favourite installation pattern is the one where when you double click the dmg it shows you a window where you can drag the application into the ‘Applications’ folder, like this:

2014 04 07 00 38 41

I’m not a fan of the installation wizards and the installation process here is so simple that a wizard seems overkill.

I started out learning about the structure of an application bundle which is well described in the Apple Bundle Programming guide. I then worked my way through a video which walks you through bundling a JAR file in a Mac application.

I figured that bundling a JAR was probably a solved problem and had a look at App Bundler, JAR Bundler and Iceberg before settling on Install4j which we used for Neo4j desktop.

I started out by creating an installer using Install4j and then manually copying the launcher it created into an Application bundle template but it was incredibly fiddly and I ended up with a variety of indecipherable messages in the system error log.

Eventually I realised that I didn’t need to create an installer and that what I actually wanted was a Mac OS X single bundle archive media file.

After I’d got install4j creating that for me I just needed to figure out how to create the background image telling the user to drag the application into their ‘Applications’ folder.

Luckily I came across this StackOverflow post which provided some AppleScript to do just that and with a bit of tweaking I ended up with the following shell script which seems to do the job:

rm target/DBench_macos_1_0_0.tgz
/Applications/install4j\ 5/bin/install4jc TestBench.install4j
rm -rf target/dmg && mkdir -p target/dmg
tar -C target/dmg -xvf target/DBench_macos_1_0_0.tgz
cp -r src/packaging/.background target/dmg
ln -s /Applications target/dmg
cd target
rm "${finalDMGName}"
umount -f /Volumes/"${title}"
hdiutil create -volname ${title} -size 100m -srcfolder dmg/ -ov -format UDRW pack.temp.dmg
device=$(hdiutil attach -readwrite -noverify -noautoopen "pack.temp.dmg" | egrep '^/dev/' | sed 1q | awk '{print $1}')
sleep 5
echo '
   tell application "Finder"
     tell disk "'${title}'"
           set current view of container window to icon view
           set toolbar visible of container window to false
           set statusbar visible of container window to false
           set the bounds of container window to {400, 100, 885, 430}
           set theViewOptions to the icon view options of container window
           set arrangement of theViewOptions to not arranged
           set icon size of theViewOptions to 72
           set background picture of theViewOptions to file ".background:'${backgroundPictureName}'"
           set position of item "'${applicationName}'" of container window to {100, 100}
           set position of item "Applications" of container window to {375, 100}
           update without registering applications
           delay 5
     end tell
   end tell
' | osascript
hdiutil detach ${device}
hdiutil convert "pack.temp.dmg" -format UDZO -imagekey zlib-level=9 -o "${finalDMGName}"
rm -f pack.temp.dmg
cd ..

To summarise, this script creates a symlink to ‘Applications’, puts a background image in a directory titled ‘.background’, sets that as the background of the window and positions the symlink and application appropriately.

Et voila:

2014 04 07 00 59 56

The Firefox guys wrote a couple of blog posts detailing their experiences writing an installer which were quite an interesting read as well.

Categories: Blogs

Clojure: Not so lazy sequences a.k.a chunking behaviour

Mon, 04/07/2014 - 00:07

I’ve been playing with Clojure over the weekend and got caught out by the behaviour of lazy sequences due to chunking – something which was obvious to experienced Clojurians although not me.

I had something similar to the following bit of code which I expected to only evaluate the first item of the infinite sequence that the range function generates:

> (take 1 (map (fn [x] (println (str "printing..." x))) (range)))

The reason this was annoying is because I wanted to shortcut the lazy sequence using take-while, much like the poster of this StackOverflow question.

As I understand it when we have a lazy sequence the granularity of that laziness is 32 items at a time a.k.a one chunk, something that Michael Fogus wrote about 4 years ago. This was a bit surprising to me but it sounds like it makes sense for the majority of cases.

However, if we want to work around that behaviour we can wrap the lazy sequence in the following unchunk function provided by Stuart Sierra:

(defn unchunk [s]
  (when (seq s)
      (cons (first s)
            (unchunk (next s))))))

Now if we repeat our initial code we’ll see it only prints once:

> (take 1 (map (fn [x] (println (str "printing..." x))) (unchunk (range))))
Categories: Blogs

Soulver: For all your random calculations

Sun, 03/30/2014 - 16:48

I often find myself doing random calculations and I used to do so part manually and part using Alfred‘s calculator until Alistair pointed me at Soulver, a desktop/iPhone/iPad app, which is even better.

I thought I’d write some examples of calculations I use it for, partly so I’ll remember the syntax in future!

Calculating how much memory Neo4j memory mapping will take up

800 mb + 2660mb + 6600mb + 9500mb + 40mb in GB = 19.6 GB

How long would it take to cover 20,000 km at 100 km / day?

20,000 km / 100 km/day in months = 6.57097681677241832481 months

How long did an import of some data using the Neo4j shell take?

4550855 ms in minutes = 75.84758333333333333333 minutes

Bit shift 1 by 32 places

1 << 32 = 4,294,967,296

Translating into easier to digest units

32381KB / second in MB per minute = 1,942.86 MB/minute
500,000 / 3 years in per hour = 19.01324310408685857874 per hour^2

How long would it take to process a chunk of data?

100 GB / (32381KB / second in MB per minute)  = 51.47051254336390681778 minutes

Hexadecimal to base 10

0x1111 = 4,369
1 + 16 + 16^2 + 16^3 = 4,369

I’m sure there’s much more that you can do that I haven’t figured out yet but even for these simple examples it saves me a bunch of time.

Categories: Blogs

Remote profiling Neo4j using yourkit

Tue, 03/25/2014 - 01:44

yourkit is my favourite JVM profiling tool and whilst it’s really easy to profile a local JVM process, sometimes I need to profile a process on a remote machine.

In that case we need to first have the remote JVM started up with a yourkit agent parameter passed as one of the args to the Java program.

Since I’m mostly working with Neo4j this means we need to add the following to conf/neo4j-wrapper.conf:

If we run lsof with the Neo4j process ID we’ll see that there’s now a socket listening on port 8888:

java    4388 markhneedham   20u    IPv6 0x901df453b4e9a125       0t0      TCP *:8888 (LISTEN)

We can connect to that via the ‘Monitor Remote Applications’ section of yourkit:

2014 03 24 23 39 59

In this case I’m demonstrating how to connect to it on my laptop and am using localhost but usually we’d specify the remote machine’s host name instead.

We also need to ensure that port 8888 is open on any firewalls we have in front of the machine.

The file we refer to in the ‘agentpath’ flag is a bit different depending on the operating system we’re using. All the details are on the yourkit website.

Categories: Blogs

Functional Programming in Java – Venkat Subramaniam: Book Review

Sun, 03/23/2014 - 23:18

I picked up Venkat Subramaniam’s ‘Functional Programming in Java: Harnessing the Power of Java 8 Lambda Expressions‘ to learn a little bit more about Java 8 having struggled to find any online tutorials which did that.

A big chunk of the book focuses on lambdas, functional collection parameters and lazy evaluation which will be familiar to users of C#, Clojure, Scala, Haskell, Ruby, Python, F# or libraries like totallylazy and Guava.

Although I was able to race through the book quite quickly it was still interesting to see how Java 8 is going to reduce the amount of code we need to write to do simple operations on collections.

I wrote up my thoughts on lambda expressions instead of auto closeable, using group by on collections and sorting values in collections in previous blog posts.

I noticed a couple of subtle differences in the method names added to collection e.g. skip/limit are there instead of take/drop for grabbing a subset of said collection.

There are also methods such as ‘mapToInt’ and ‘mapToDouble’ where in other languages you’d just have a single ‘map’ and it would handle everything.

Over the last couple of years I’ve used totallylazy on Java projects to deal with collections and while I like the style of code it encourages you end up with a lot of code due to all the anonymous classes you have to create.

In Java 8 lambdas are a first class concept which should make using totallylazy even better.

In a previous blog post I showed how you’d go about sorted a collection of people by age. In Java 8 it would look like this:

List<Person> people = Arrays.asList(new Person("Paul", 24), new Person("Mark", 30), new Person("Will", 28)); -> p.getAge())).forEach(System.out::println)

I find the ‘comparing’ function that we have to use a bit unintuitive and this is what we’d have using totallylazy pre Java 8:

Sequence<Person> people = sequence(new Person("Paul", 24), new Person("Mark", 30), new Person("Will", 28));
people.sortBy(new Callable1<Person, Integer>() {
    public Integer call(Person person) throws Exception {
        return person.getAge();

Using Java 8 lambdas the code is much simplified:

Sequence<Person> people = sequence(new Person("Paul", 24), new Person("Mark", 30), new Person("Will", 28));

If we use ‘forEach’ to print out each person individually we end up with the following:

Sequence<Person> people = sequence(new Person("Paul", 24), new Person("Mark", 30), new Person("Will", 28));
people.sortBy(Person::getAge).forEach((Consumer<? super Person>) System.out::println);

The compiler can’t work out whether we want to use the forEach method from totallylazy or from Iterable so we end up having to cast which is a bit nasty.

I haven’t yet tried converting the totallylazy code I’ve written but my thinking is that the real win of Java 8 will be making it easier to use libraries like totallylazy and Guava.

Overall the book describes Java 8′s features very well but if you’ve used any of the languages I mentioned at the top it will all be very familiar – finally Java has caught up with the rest!

Categories: Blogs

Neo4j 2.1.0-M01: LOAD CSV with Rik Van Bruggen’s Tube Graph

Mon, 03/03/2014 - 18:34

Last week we released the first milestone of Neo4j 2.1.0 and one its features is a new function in cypher – LOAD CSV – which aims to make it easier to get data into Neo4j.

I thought I’d give it a try to import the London tube graph – something that my colleague Rik wrote about a few months ago.

I’m using the same data set as Rik but I had to tweak it a bit as there were naming differences when describing the connection from Kennington to Waterloo and Kennington to Oval. My updated version of the dataset is on github.

With the help of Alistair we now have a variation on the original which takes into account the various platforms at stations and the waiting time of a train on the platform. This will also enable us to add in things like getting from the ticket hall to the various platforms more easily.

The model looks like this:

2014 03 03 16 15 58

Now we need to create a graph and the first step is to put an index on station name as we’ll be looking that up quite frequently in the queries that follow:

CREATE INDEX on :Station(stationName)

Now that’s in place we can make use of LOAD CSV. The data is very de-normalised which works out quite nicely for us and we end up with the following script:

LOAD CSV FROM "file:/Users/markhneedham/code/tube/runtimes.csv" AS csvLine
WITH csvLine[0] AS lineName, 
     csvLine[1] AS direction, 
     csvLine[2] AS startStationName,
     csvLine[3] AS destinationStationName, 
     toFloat(csvLine[4]) AS distance, 
     toFloat(csvLine[5]) AS runningTime
MERGE (start:Station { stationName: startStationName}) 
MERGE (destination:Station { stationName: destinationStationName}) 
MERGE (line:Line { lineName: lineName}) 
MERGE (line) - [:DIRECTION] -> (dir:Direction { direction: direction}) 
CREATE (inPlatform:InPlatform {name: "In: " + destinationStationName + " " + lineName + " " + direction})
CREATE (outPlatform:OutPlatform {name: "Out: " + startStationName + " " + lineName + " " + direction}) 
CREATE (inPlatform) - [:AT] -> (destination) 
CREATE (outPlatform) - [:AT] -> (start) 
CREATE (inPlatform) - [:ON] -> (dir) 
CREATE (outPlatform) - [:ON] -> (dir) 
CREATE (outPlatform) - [r:TRAIN {distance: distance, runningTime: runningTime}] -> (inPlatform)

This file doesn’t contain any headers so we’ll simulate them by using a WITH clause so that we don’t have index lookups all over the place. In this case we’re pointing to a file on the local file system but we could choose to point to a CSV file on the web if we wanted to.

Since stations, lines and directions appear frequently we’ll use MERGE to ensure they don’t get duplicated.

After that we have a post processing step to connect the ‘in’ and ‘out’ platforms shown in the diagram.

MATCH (station:Station) <-[:AT]- (platformIn:InPlatform), 
      (station:Station) <-[:AT]- (platformOut:OutPlatform), 
      (direction:Direction) <-[:ON]- (platformIn:InPlatform), 
      (direction:Direction) <-[:ON]- (platformOut:OutPlatform) 
CREATE (platformIn) -[:WAIT {runningTime: 0.5}]-> (platformOut)

After running a few queries on the graph I realised that it wasn’t possible to combine some journies through Kennington and Euston so I had to add some relationships in there as well:

// link the Euston stations
MATCH (euston:Station {stationName: "EUSTON"})<-[:AT]-(eustonIn:InPlatform)
MATCH (eustonCx:Station {stationName: "EUSTON (CX)"})<-[:AT]-(eustonCxIn:InPlatform)
MATCH (eustonCity:Station {stationName: "EUSTON (CITY)"})<-[:AT]-(eustonCityIn:InPlatform)
CREATE UNIQUE (eustonIn)-[:WAIT {runningTime: 0.0}]->(eustonCxIn)
CREATE UNIQUE (eustonIn)-[:WAIT {runningTime: 0.0}]->(eustonCityIn)
CREATE UNIQUE (eustonCxIn)-[:WAIT {runningTime: 0.0}]->(eustonCityIn)
// link the Kennington stations
MATCH (kenningtonCx:Station {stationName: "KENNINGTON (CX)"})<-[:AT]-(kenningtonCxIn:InPlatform)
MATCH (kenningtonCity:Station {stationName: "KENNINGTON (CITY)"})<-[:AT]-(kenningtonCityIn:InPlatform)
CREATE UNIQUE (kenningtonCxIn)-[:WAIT {runningTime: 0.0}]->(kenningtonCityIn)

I’ve been playing around with the A* algorithm to find the quickest route between stations based on the distances between stations.

The next step is to put a timetable graph alongside this so we can do quickest routes at certain parts of the day and the next step after that will be to take delays into account.

If you’ve got some data you want to get into the graph give LOAD CSV a try and let us know how you get on, the cypher team are keen to get feedback on this.

Categories: Blogs

Neo4j: Cypher – Finding directors who acted in their own movie

Sat, 03/01/2014 - 00:57

I’ve been doing quite a few Intro to Neo4j sessions recently and since it contains a lot of problems for the attendees to work on I get to see how first time users of Cypher actually use it.

A couple of hours in we want to write a query to find directors who acted in their own film based on the following model.

2014 02 28 22 40 02

A common answer is the following:

MATCH (a)-[:ACTED_IN]->(m)<-[:DIRECTED]-(d)

We’re matching an actor ‘a’, finding the movie they acted in and then finding the director of that movie. We now have pairs of actors and directors which we filter down by comparing their ‘name’ property.

I haven’t written SQL for a while but if my memory serves me correctly comparing properties or attributes in this way is quite a common way to test for equality.

In a graph we don’t need to compare properties – what we actually want to check is if ‘a’ and ‘d’ are the same node:

MATCH (a)-[:ACTED_IN]->(m)<-[:DIRECTED]-(d)
WHERE a = d

We’ve simplifed the query a bit but we can actually go one better by binding the director to the same identifier as the actor like so:

MATCH (a)-[:ACTED_IN]->(m)<-[:DIRECTED]-(a)

So now we’re matching an actor ‘a’, finding the movie they acted in and then finding the director if they happen to be the same person as ‘a’.

The code is now much simpler and more revealing of its intent too.

Categories: Blogs

Java 8: Lambda Expressions vs Auto Closeable

Wed, 02/26/2014 - 09:32

If you used earlier versions of Neo4j via its Java API with Java 6 you probably have code similar to the following to ensure write operations happen within a transaction:

public class StylesOfTx
    public static void main( String[] args ) throws IOException
        String path = "/tmp/tx-style-test";
        FileUtils.deleteRecursively(new File(path));
        GraphDatabaseService db = new GraphDatabaseFactory().newEmbeddedDatabase( path );
        Transaction tx = db.beginTx();

In Neo4j 2.0 Transaction started extending AutoCloseable which meant that you could use ‘try with resources’ and the ‘close’ method would be automatically called when the block finished:

public class StylesOfTx
    public static void main( String[] args ) throws IOException
        String path = "/tmp/tx-style-test";
        FileUtils.deleteRecursively(new File(path));
        GraphDatabaseService db = new GraphDatabaseFactory().newEmbeddedDatabase( path );
        try ( Transaction tx = db.beginTx() )
            Node node = db.createNode();

This works quite well although it’s still possible to have transactions hanging around in an application when people don’t use this syntax – the old style is still permissible.

In Venkat Subramaniam’s Java 8 book he suggests an alternative approach where we use a lambda based approach:

public class StylesOfTx
    public static void main( String[] args ) throws IOException
        String path = "/tmp/tx-style-test";
        FileUtils.deleteRecursively(new File(path));
        GraphDatabaseService db = new GraphDatabaseFactory().newEmbeddedDatabase( path );
        Db.withinTransaction(db, neo4jDb -> {
            Node node = neo4jDb.createNode();
    static class Db {
        public static void withinTransaction(GraphDatabaseService db, Consumer<GraphDatabaseService> fn) {
            try ( Transaction tx = db.beginTx() )

The ‘withinTransaction’ function would actually go on GraphDatabaseService or similar rather than being on that Db class but it was easier to put it on there for this example.

A disadvantage of this style is that you don’t have explicit control over the transaction for handling the failure case – it’s assumed that if ‘tx.success()’ isn’t called then the transaction failed and it’s rolled back. I’m not sure what % of use cases actually need such fine grained control though.

Brian Hurt refers to this as the ‘hole in the middle pattern‘ and I imagine we’ll start seeing more code of this ilk once Java 8 is released and becomes more widely used.

Categories: Blogs

Jersey: Ignoring SSL certificate –

Wed, 02/26/2014 - 02:12

Last week Alistair and I were working on an internal application and we needed to make a HTTPS request directly to an AWS machine using a certificate signed to a different host.

We use jersey-client so our code looked something like this:

Client client = Client.create();
// and so on

When we ran this we predictably ran into trouble:

com.sun.jersey.api.client.ClientHandlerException: No subject alternative DNS name matching found.
	at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(
	at com.sun.jersey.api.client.Client.handle(
	at com.sun.jersey.api.client.WebResource.handle(
	at com.neotechnology.testlab.manager.bootstrap.ManagerAdmin.takeBackup(
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(
	at org.junit.runners.ParentRunner.runLeaf(
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(
	at org.junit.runners.ParentRunner$
	at org.junit.runners.ParentRunner$1.schedule(
	at org.junit.runners.ParentRunner.runChildren(
	at org.junit.runners.ParentRunner.access$000(
	at org.junit.runners.ParentRunner$2.evaluate(
	at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(
	at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(
	at com.intellij.rt.execution.junit.JUnitStarter.main(
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(
	at com.intellij.rt.execution.application.AppMain.main(
Caused by: No subject alternative DNS name matching found.
	at com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(
	at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(
	... 31 more
Caused by: No subject alternative DNS name matching found.
	... 45 more

We figured that we needed to get our client to ignore the certificate and came across this Stack Overflow thread which had some suggestions on how to do this.

None of the suggestions worked on their own but we ended up with a combination of a couple of the suggestions which did the trick:

public Client hostIgnoringClient() {
        SSLContext sslcontext = SSLContext.getInstance( "TLS" );
        sslcontext.init( null, null, null );
        DefaultClientConfig config = new DefaultClientConfig();
        Map<String, Object> properties = config.getProperties();
        HTTPSProperties httpsProperties = new HTTPSProperties(
                new HostnameVerifier()
                    public boolean verify( String s, SSLSession sslSession )
                        return true;
                }, sslcontext
        properties.put( HTTPSProperties.PROPERTY_HTTPS_PROPERTIES, httpsProperties );
        config.getClasses().add( JacksonJsonProvider.class );
        return Client.create( config );
    catch ( KeyManagementException | NoSuchAlgorithmException e )
        throw new RuntimeException( e );

You’re welcome Future Mark.

Categories: Blogs

Java 8: Group by with collections

Sun, 02/23/2014 - 21:16

In my continued reading of Venkat Subramaniam’s ‘Functional Programming in Java‘ I’ve reached the part of the book where the Stream#collect function is introduced.

We want to take a collection of people, group them by age and return a map of (age -> people’s names) for which this comes in handy.

To refresh, this is what the Person class looks like:

static class Person {
    private String name;
    private int age;
    Person(String name, int age) {
 = name;
        this.age = age;
    public String toString() {
        return String.format("Person{name='%s', age=%d}", name, age);

And we can write the following code in Java 8 to get a map of people’s names grouped by age:

Stream<Person> people = Stream.of(new Person("Paul", 24), new Person("Mark", 30), new Person("Will", 28));
Map<Integer, List<String>> peopleByAge = people
    .collect(groupingBy(p -> p.age, mapping((Person p) ->, toList())));
{24=[Paul], 28=[Will], 30=[Mark]}

We’re running the ‘collect’ function over the collection, grouping by the ‘age’ property as we go and grouping the names of people rather than the people themselves.

This is a little bit different to what you’d do in Ruby where there’s a ‘group_by’ function which you can call on a collection:

> people = [ {:name => "Paul", :age => 24}, {:name => "Mark", :age => 30}, {:name => "Will", :age => 28}]
> people.group_by { |p| p[:age] }
=> {24=>[{:name=>"Paul", :age=>24}], 30=>[{:name=>"Mark", :age=>30}], 28=>[{:name=>"Will", :age=>28}]}

This gives us back lists of people grouped by age but we need to apply an additional ‘map’ operation to change that to be a list of names instead:

> people.group_by { |p| p[:age] }.map { |k,v| [k, { |person| person[:name] } ] }
=> [[24, ["Paul"]], [30, ["Mark"]], [28, ["Will"]]]

At this stage we’ve got an array of (age, names) pairs but luckily Ruby 2.1.0 has a function ‘to_h’ which we can call to get back to a hash again:

> people.group_by { |p| p[:age] }.map { |k,v| [k, { |person| person[:name] } ] }.to_h
=> {24=>["Paul"], 30=>["Mark"], 28=>["Will"]}

If we want to follow the Java approach of grouping by a property while running a reduce over the collection we’d have something like the following:

> people.reduce({}) { |acc, item| acc[item[:age]] ||=[]; acc[item[:age]] << item[:name]; acc }
=> {24=>["Paul"], 30=>["Mark"], 28=>["Will"]}

If we’re using Clojure then we might end up with something like this instead:

(def people
  [{:name "Paul", :age 24} {:name "Mark", :age 30} {:name "Will", :age 28}])
> (reduce (fn [acc [k v]] (assoc-in acc [k] (map :name v))) {} (group-by :age people))
{28 ("Will"), 30 ("Mark"), 24 ("Paul")}

I thought the Java version looked a bit weird to begin with but it’s actually not too bad having worked through the problem in a couple of other languages.

It’d be good to know whether there’s a better way of doing this the Ruby/Clojure way though!

Categories: Blogs

Java 8: Sorting values in collections

Sun, 02/23/2014 - 16:43

Having realised that Java 8 is due for its GA release within the next few weeks I thought it was about time I had a look at it and over the last week have been reading Venkat Subramaniam’s book.

I’m up to chapter 3 which covers sorting a collection of people. The Person class is defined roughly like so:

static class Person {
    private String name;
    private int age;
    Person(String name, int age) { = name;
        this.age = age;
    public String toString() {
        return String.format("Person{name='%s', age=%d}", name, age);

In the first example we take a list of people and then sort them in ascending age order:

List<Person> people = Arrays.asList(new Person("Paul", 24), new Person("Mark", 30), new Person("Will", 28));, p2) -> p1.age - p2.age).forEach(System.out::println);
Person{name='Paul', age=24}
Person{name='Will', age=28}
Person{name='Mark', age=30}

If we were to write a function to do the same thing in Java 7 it’d look like this:

Collections.sort(people, new Comparator<Person>() {
    public int compare(Person o1, Person o2) {
        return o1.age - o2.age;
for (Person person : people) {

Java 8 has reduced the amount of code we have to write although it’s still more complicated than what we could do in Ruby:

> people = [ {:name => "Paul", :age => 24}, {:name => "Mark", :age => 30}, {:name => "Will", :age => 28}]
> people.sort_by { |p| p[:age] }
=> [{:name=>"Paul", :age=>24}, {:name=>"Will", :age=>28}, {:name=>"Mark", :age=>30}]

A few pages later Venkat shows how you can get close to this by using the Comparator#comparing function:

Function<Person, Integer> byAge = p -> p.age ;;

I thought I could make this simpler by inlining the ‘byAge’ lambda like this: -> p.age)).forEach(System.out::println);

This seems to compile and run correctly although IntelliJ 13.0 suggests there is a ‘cyclic inference‘ problem. IntelliJ is happy if we explicitly cast the lambda like this:<Person, Integer>) p -> p.age)).forEach(System.out::println);

IntelliJ also seems happy if we explicitly type ‘p’ in the lambda, so I think I’ll go with that for the moment: p) -> p.age)).forEach(System.out::println);
Categories: Blogs

Automating Skype’s ‘This message has been removed’

Fri, 02/21/2014 - 01:16

One of the stranger features of Skype is that that it allows you to delete the contents of a message that you’ve already sent to someone – something I haven’t seen on any other messaging system I’ve used.

For example if I wrote a message in Skype and wanted to edit it I would press the ‘up’ arrow:

2014 02 20 23 02 28

Once I’ve deleted the message I’d see this in the space where the message used to be:

2014 02 20 23 00 41

I almost certainly am too obsessed with this but I find it quite amusing when I see people posting and retracting messages so I wanted to see if it could be automated.

Alistair showed me Automator, a built in tool on the Mac for automating work flows.

Automator allows you to execute Applescript so we wrote the following code which selects the current chat in Skype, writes a message and then deletes it one character at a time:

on run {input, parameters}
	tell application "Skype"
	end tell
	tell application "System Events"
		set message to "now you see me, now you don't"
		keystroke message
		keystroke return
		keystroke (ASCII character 30) --up arrow
		repeat length of message times
			keystroke (ASCII character 8) --backspace
		end repeat
		keystroke return
	end tell
	return input
end run

We wired up the Applescript via the Utilities > Run Applescript menu option in Automator:

2014 02 20 23 12 38

We can then go further and wire that up to a keyboard shortcut if we want by saving the workflow as a service in Automator but for my messing around purposes clicking the ‘Run’ button from Automator didn’t seem too much of a hardship!

Categories: Blogs

Neo4j: Cypher – Set Based Operations

Thu, 02/20/2014 - 20:22

I was recently reminded of a Neo4j cypher query that I wrote a couple of years ago to find the colleagues that I hadn’t worked with in the ThoughtWorks London office.

The model looked like this:

2014 02 18 17 04 01

And I created the following fake data set of the aforementioned model:

public class SetBasedOperations
    private static final Label PERSON = DynamicLabel.label( "Person" );
    private static final Label OFFICE = DynamicLabel.label( "Office" );
    private static final DynamicRelationshipType COLLEAGUES = DynamicRelationshipType.withName( "COLLEAGUES" );
    private static final DynamicRelationshipType MEMBER_OF = DynamicRelationshipType.withName( "MEMBER_OF" );
    public static void main( String[] args ) throws IOException
        Random random = new Random();
        String path = "/tmp/set-based-operations";
        FileUtils.deleteRecursively( new File( path ) );
        GraphDatabaseService db = new GraphDatabaseFactory().newEmbeddedDatabase( path );
        Transaction tx = db.beginTx();
            Node me = db.createNode( PERSON );
            me.setProperty( "name", "me" );
            Node londonOffice = db.createNode( OFFICE );
            londonOffice.setProperty( "name", "London Office" );
            me.createRelationshipTo( londonOffice, MEMBER_OF );
            for ( int i = 0; i < 1000; i++ )
                Node colleague = db.createNode( PERSON );
                colleague.setProperty( "name", "person" + i );
                colleague.createRelationshipTo( londonOffice, MEMBER_OF );
                if(random.nextInt( 10 ) >= 8) {
                    me.createRelationshipTo( colleague, COLLEAGUES );
        CommunityNeoServer server = CommunityServerBuilder
                .usingDatabaseDir( path )
                .onPort( 9001 )

I’ve created a node representing me and 1,000 people who work in the London office. Out of those 1,000 people I made it so that ~150 of them have worked with me.

If I want to write a cypher query to find the exact number of people who haven’t worked with me I might start with the following:

MATCH (p:Person {name: "me"})-[:MEMBER_OF]->(office {name: "London Office"})<-[:MEMBER_OF]-(colleague)
WHERE NOT (p-[:COLLEAGUES]->(colleague))
RETURN COUNT(colleague)

We start by finding me, then find the London office which I was a member of, and then find the other people who are members of that office. On the second line we remove people that I’ve previously worked with and then return a count of the people who are left.

When I ran this through my Cypher query tuning tool the average time to evaluate this query was 7.46 seconds.

That is obviously a bit too slow if we want to run the query on a web page and as far as I can tell the reason for that is that for each potential colleague we are searching through my ‘COLLEAGUES’ relationships and checking whether they exist. We’re doing that 1,000 times which is a bit inefficient.

I chatted to David about this, and he suggested that a more efficient query would be to work out all my colleagues up front once and then do the filtering from that set of people instead.

The re-worked query looks like this:

MATCH (p:Person {name: "me"})-[:COLLEAGUES]->(colleague)
WITH p, COLLECT(colleague) as marksColleagues
MATCH (colleague)-[:MEMBER_OF]->(office {name: "London Office"})<-[:MEMBER_OF]-(p)
WHERE NOT (colleague IN marksColleagues)
RETURN COUNT(colleague)

When we run that through the query tuner the average time reduces to 150 milliseconds which is much better.

This type of query seems to be more about set operations than graph ones because we’re looking for what isn’t there rather than what is. When that’s the case getting the set of things that we want to compare against up front is more profitable.

Categories: Blogs

Neo4j: Creating nodes and relationships from a list of maps

Mon, 02/17/2014 - 16:11

Last week Alistair and I were porting some Neo4j cypher queries from 1.8 to 2.0 and one of the queries we had to change was an interesting one that created a bunch of relationships from a list/array of maps.

In the query we had a user ‘Mark’ and wanted to create ‘FRIENDS_WITH’ relationships to Peter and Michael.

2014 02 17 13 39 08

The application passed in a list of maps representing Peter and Michael as a parameter but if we remove the parameters the query looked like this:

MERGE (me:User {userId: 1} )
SET = "Mark"
FOREACH (f IN [{userId: 2, name: "Michael"}, {userId: 3, name: "Peter"}] | 
    MERGE (u:User {userId: f.userId})
    SET u = f
    MERGE (me)-[:FRIENDS_WITH]->(u))

We first ensure that a user with ‘id’ of 1 exists in the database and then make sure their name is set to ‘Mark’. After we’ve done that we iterate over a list of maps containing Mark’s friends and ensure there is a ‘FRIENDS_WITH’ relationship from Mark to them.

The parameterised version of that query looks like this:

MERGE (me:User { userId: {userId} }) 
SET = {name} 
FOREACH(f IN {friends} | 
    MERGE (u:User {userId: f.userId }) 
    SET u = f 
    MERGE (me)-[:FRIENDS_WITH]->(u))

We can then execute that query using Jersey:

public class ListsOfMapsCypher
    public static void main( String[] args )
        ObjectNode request = JsonNodeFactory.instance.objectNode();
                "MERGE (me:User { userId: {userId} }) " +
                "SET = {name} " +
                "FOREACH(f IN {friends} | " +
                    "MERGE (u:User {userId: f.userId }) " +
                    "SET u = f " +
                    "MERGE (me)-[:FRIENDS_WITH]->(u)) ");
        ObjectNode params = JsonNodeFactory.instance.objectNode();
        params.put("userId", 1);
        params.put("name", "Mark");
        ArrayNode friends = JsonNodeFactory.instance.arrayNode();
        ObjectNode friend1 = JsonNodeFactory.instance.objectNode();
        friend1.put( "userId", 2 );
        friend1.put( "name", "Michael" );
        friends.add( friend1 );
        ObjectNode friend2 = JsonNodeFactory.instance.objectNode();
        friend2.put( "userId", 3 );
        friend2.put( "name", "Peter" );
        friends.add( friend2 );
        params.put("friends", friends );
        request.put("params", params );
        ClientResponse clientResponse = client()
                .resource( "http://localhost:7474/db/data/cypher" )
                .accept( MediaType.APPLICATION_JSON )
                .entity( request, MediaType.APPLICATION_JSON_TYPE )
                .post( ClientResponse.class );
        System.out.println(clientResponse.getEntity( String.class ));
    private static Client client()
        DefaultClientConfig defaultClientConfig = new DefaultClientConfig();
        return Client.create(defaultClientConfig);

We can then write a query to check Mark and his friends were persisted:

2014 02 17 14 10 12

And that’s it!

Categories: Blogs