Skip to content

Feed aggregator

R: A first attempt at linear regression

Mark Needham - Wed, 10/01/2014 - 00:20

I’ve been working through the videos that accompany the Introduction to Statistical Learning with Applications in R book and thought it’d be interesting to try out the linear regression algorithm against my meetup data set.

I wanted to see how well a linear regression algorithm could predict how many people were likely to RSVP to a particular event. I started with the following code to build a data frame containing some potential predictors:

library(RNeo4j)
officeEventsQuery = "MATCH (g:Group {name: \"Neo4j - London User Group\"})-[:HOSTED_EVENT]->(event)<-[:TO]-({response: 'yes'})<-[:RSVPD]-(),
                           (event)-[:HELD_AT]->(venue)
                     WHERE (event.time + event.utc_offset) < timestamp() AND venue.name IN [\"Neo Technology\", \"OpenCredo\"]
                     RETURN event.time + event.utc_offset AS eventTime,event.announced_at AS announcedAt, event.name, COUNT(*) AS rsvps"
 
events = subset(cypher(graph, officeEventsQuery), !is.na(announcedAt))
events$eventTime <- timestampToDate(events$eventTime)
events$day <- format(events$eventTime, "%A")
events$monthYear <- format(events$eventTime, "%m-%Y")
events$month <- format(events$eventTime, "%m")
events$year <- format(events$eventTime, "%Y")
events$announcedAt<- timestampToDate(events$announcedAt)
events$timeDiff = as.numeric(events$eventTime - events$announcedAt, units = "days")

If we preview ‘events’ it contains the following columns:

> head(events)
            eventTime         announcedAt                                        event.name rsvps       day monthYear month year  timeDiff
1 2013-01-29 18:00:00 2012-11-30 11:30:57                                   Intro to Graphs    24   Tuesday   01-2013    01 2013 60.270174
2 2014-06-24 18:30:00 2014-06-18 19:11:19                                   Intro to Graphs    43   Tuesday   06-2014    06 2014  5.971308
3 2014-06-18 18:30:00 2014-06-08 07:03:13                         Neo4j World Cup Hackathon    24 Wednesday   06-2014    06 2014 10.476933
4 2014-05-20 18:30:00 2014-05-14 18:56:06                                   Intro to Graphs    53   Tuesday   05-2014    05 2014  5.981875
5 2014-02-11 18:00:00 2014-02-05 19:11:03                                   Intro to Graphs    35   Tuesday   02-2014    02 2014  5.950660
6 2014-09-04 18:30:00 2014-08-26 06:34:01 Hands On Intro to Cypher - Neo4j's Query Language    20  Thursday   09-2014    09 2014  9.497211

We want to predict ‘rsvps’ from the other columns so I started off by creating a linear model which took all the other columns into account:

> summary(lm(rsvps ~., data = events))
 
Call:
lm(formula = rsvps ~ ., data = events)
 
Residuals:
    Min      1Q  Median      3Q     Max 
-8.2582 -1.1538  0.0000  0.4158 10.5803 
 
Coefficients: (14 not defined because of singularities)
                                                                    Estimate Std. Error t value Pr(>|t|)   
(Intercept)                                                       -9.365e+03  3.009e+03  -3.113  0.00897 **
eventTime                                                          3.609e-06  2.951e-06   1.223  0.24479   
announcedAt                                                        3.278e-06  2.553e-06   1.284  0.22339   
event.nameGraph Modelling - Do's and Don'ts                        4.884e+01  1.140e+01   4.286  0.00106 **
event.nameHands on build your first Neo4j app for Java developers  3.735e+01  1.048e+01   3.562  0.00391 **
event.nameHands On Intro to Cypher - Neo4j's Query Language        2.560e+01  9.713e+00   2.635  0.02177 * 
event.nameIntro to Graphs                                          2.238e+01  8.726e+00   2.564  0.02480 * 
event.nameIntroduction to Graph Database Modeling                 -1.304e+02  4.835e+01  -2.696  0.01946 * 
event.nameLunch with Neo4j's CEO, Emil Eifrem                      3.920e+01  1.113e+01   3.523  0.00420 **
event.nameNeo4j Clojure Hackathon                                 -3.063e+00  1.195e+01  -0.256  0.80203   
event.nameNeo4j Python Hackathon with py2neo's Nigel Small         2.128e+01  1.070e+01   1.989  0.06998 . 
event.nameNeo4j World Cup Hackathon                                5.004e+00  9.622e+00   0.520  0.61251   
dayTuesday                                                         2.068e+01  5.626e+00   3.676  0.00317 **
dayWednesday                                                       2.300e+01  5.522e+00   4.165  0.00131 **
monthYear01-2014                                                  -2.350e+02  7.377e+01  -3.185  0.00784 **
monthYear02-2013                                                  -2.526e+01  1.376e+01  -1.836  0.09130 . 
monthYear02-2014                                                  -2.325e+02  7.763e+01  -2.995  0.01118 * 
monthYear03-2013                                                  -4.605e+01  1.683e+01  -2.736  0.01805 * 
monthYear03-2014                                                  -2.371e+02  8.324e+01  -2.848  0.01468 * 
monthYear04-2013                                                  -6.570e+01  2.309e+01  -2.845  0.01477 * 
monthYear04-2014                                                  -2.535e+02  8.746e+01  -2.899  0.01336 * 
monthYear05-2013                                                  -8.672e+01  2.845e+01  -3.049  0.01011 * 
monthYear05-2014                                                  -2.802e+02  9.420e+01  -2.975  0.01160 * 
monthYear06-2013                                                  -1.022e+02  3.283e+01  -3.113  0.00897 **
monthYear06-2014                                                  -2.996e+02  1.003e+02  -2.988  0.01132 * 
monthYear07-2014                                                  -3.123e+02  1.054e+02  -2.965  0.01182 * 
monthYear08-2013                                                  -1.326e+02  4.323e+01  -3.067  0.00976 **
monthYear08-2014                                                  -3.060e+02  1.107e+02  -2.763  0.01718 * 
monthYear09-2013                                                          NA         NA      NA       NA   
monthYear09-2014                                                  -3.465e+02  1.164e+02  -2.976  0.01158 * 
monthYear10-2012                                                   2.602e+01  1.959e+01   1.328  0.20886   
monthYear10-2013                                                  -1.728e+02  5.678e+01  -3.044  0.01020 * 
monthYear11-2012                                                   2.717e+01  1.509e+01   1.800  0.09704 . 
month02                                                                   NA         NA      NA       NA   
month03                                                                   NA         NA      NA       NA   
month04                                                                   NA         NA      NA       NA   
month05                                                                   NA         NA      NA       NA   
month06                                                                   NA         NA      NA       NA   
month07                                                                   NA         NA      NA       NA   
month08                                                                   NA         NA      NA       NA   
month09                                                                   NA         NA      NA       NA   
month10                                                                   NA         NA      NA       NA   
month11                                                                   NA         NA      NA       NA   
year2013                                                                  NA         NA      NA       NA   
year2014                                                                  NA         NA      NA       NA   
timeDiff                                                                  NA         NA      NA       NA   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 
Residual standard error: 5.287 on 12 degrees of freedom
Multiple R-squared:  0.9585,	Adjusted R-squared:  0.8512 
F-statistic: 8.934 on 31 and 12 DF,  p-value: 0.0001399

As I understand it we can look at the R-squared value to understand how much of the variance in the data has been explained by the model – in this case it’s 85%.

A lot of the coefficients seem to be based around specific event names which seems a bit too specific to me so I wanted to see what would happen if I derived a feature which indicated whether a session was practical:

events$practical = grepl("Hackathon|Hands on|Hands On", events$event.name)

We can now run the model again with the new column having excluded ‘event.name’ field:

> summary(lm(rsvps ~., data = subset(events, select = -c(event.name))))
 
Call:
lm(formula = rsvps ~ ., data = subset(events, select = -c(event.name)))
 
Residuals:
    Min      1Q  Median      3Q     Max 
-18.647  -2.311   0.000   2.908  23.218 
 
Coefficients: (13 not defined because of singularities)
                   Estimate Std. Error t value Pr(>|t|)  
(Intercept)      -3.980e+03  4.752e+03  -0.838   0.4127  
eventTime         2.907e-06  3.873e-06   0.751   0.4621  
announcedAt       3.336e-08  3.559e-06   0.009   0.9926  
dayTuesday        7.547e+00  6.080e+00   1.241   0.2296  
dayWednesday      2.442e+00  7.046e+00   0.347   0.7327  
monthYear01-2014 -9.562e+01  1.187e+02  -0.806   0.4303  
monthYear02-2013 -4.230e+00  2.289e+01  -0.185   0.8553  
monthYear02-2014 -9.156e+01  1.254e+02  -0.730   0.4742  
monthYear03-2013 -1.633e+01  2.808e+01  -0.582   0.5676  
monthYear03-2014 -8.094e+01  1.329e+02  -0.609   0.5496  
monthYear04-2013 -2.249e+01  3.785e+01  -0.594   0.5595  
monthYear04-2014 -9.230e+01  1.401e+02  -0.659   0.5180  
monthYear05-2013 -3.237e+01  4.654e+01  -0.696   0.4952  
monthYear05-2014 -1.015e+02  1.509e+02  -0.673   0.5092  
monthYear06-2013 -3.947e+01  5.355e+01  -0.737   0.4701  
monthYear06-2014 -1.081e+02  1.604e+02  -0.674   0.5084  
monthYear07-2014 -1.110e+02  1.678e+02  -0.661   0.5163  
monthYear08-2013 -5.144e+01  6.988e+01  -0.736   0.4706  
monthYear08-2014 -1.023e+02  1.784e+02  -0.573   0.5731  
monthYear09-2013 -6.057e+01  7.893e+01  -0.767   0.4523  
monthYear09-2014 -1.260e+02  1.874e+02  -0.672   0.5094  
monthYear10-2012  9.557e+00  2.873e+01   0.333   0.7430  
monthYear10-2013 -6.450e+01  9.169e+01  -0.703   0.4903  
monthYear11-2012  1.689e+01  2.316e+01   0.729   0.4748  
month02                  NA         NA      NA       NA  
month03                  NA         NA      NA       NA  
month04                  NA         NA      NA       NA  
month05                  NA         NA      NA       NA  
month06                  NA         NA      NA       NA  
month07                  NA         NA      NA       NA  
month08                  NA         NA      NA       NA  
month09                  NA         NA      NA       NA  
month10                  NA         NA      NA       NA  
month11                  NA         NA      NA       NA  
year2013                 NA         NA      NA       NA  
year2014                 NA         NA      NA       NA  
timeDiff                 NA         NA      NA       NA  
practicalTRUE    -9.388e+00  5.289e+00  -1.775   0.0919 .
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 
Residual standard error: 10.21 on 19 degrees of freedom
Multiple R-squared:  0.7546,	Adjusted R-squared:  0.4446 
F-statistic: 2.434 on 24 and 19 DF,  p-value: 0.02592

Now we’re only accounting for 44% of the variance and none of our coefficients are significant so this wasn’t such a good change.

I also noticed that we’ve got a bit of overlap in the date related features – we’ve got one column for monthYear and then separate ones for month and year. Let’s strip out the combined one:

> summary(lm(rsvps ~., data = subset(events, select = -c(event.name, monthYear))))
 
Call:
lm(formula = rsvps ~ ., data = subset(events, select = -c(event.name, 
    monthYear)))
 
Residuals:
     Min       1Q   Median       3Q      Max 
-16.5745  -4.0507  -0.1042   3.6586  24.4715 
 
Coefficients: (1 not defined because of singularities)
                Estimate Std. Error t value Pr(>|t|)  
(Intercept)   -1.573e+03  4.315e+03  -0.364   0.7185  
eventTime      3.320e-06  3.434e-06   0.967   0.3425  
announcedAt   -2.149e-06  2.201e-06  -0.976   0.3379  
dayTuesday     4.713e+00  5.871e+00   0.803   0.4294  
dayWednesday  -2.253e-01  6.685e+00  -0.034   0.9734  
month02        3.164e+00  1.285e+01   0.246   0.8075  
month03        1.127e+01  1.858e+01   0.607   0.5494  
month04        4.148e+00  2.581e+01   0.161   0.8736  
month05        1.979e+00  3.425e+01   0.058   0.9544  
month06       -1.220e-01  4.271e+01  -0.003   0.9977  
month07        1.671e+00  4.955e+01   0.034   0.9734  
month08        8.849e+00  5.940e+01   0.149   0.8827  
month09       -5.496e+00  6.782e+01  -0.081   0.9360  
month10       -5.066e+00  7.893e+01  -0.064   0.9493  
month11        4.255e+00  8.697e+01   0.049   0.9614  
year2013      -1.799e+01  1.032e+02  -0.174   0.8629  
year2014      -3.281e+01  2.045e+02  -0.160   0.8738  
timeDiff              NA         NA      NA       NA  
practicalTRUE -9.816e+00  5.084e+00  -1.931   0.0645 .
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 
Residual standard error: 10.19 on 26 degrees of freedom
Multiple R-squared:  0.666,	Adjusted R-squared:  0.4476 
F-statistic: 3.049 on 17 and 26 DF,  p-value: 0.005187

Again none of the coefficients are statistically significant which is disappointing. I think the main problem may be that I have very few data points (only 42) making it difficult to come up with a general model.

I think my next step is to look for some other features that could impact the number of RSVPs e.g. other events on that day, the weather.

I’m a novice at this but trying to learn more so if you have any ideas of what I should do next please let me know.

Categories: Blogs

Putting Down the Tools

Agile Zen - Tue, 09/30/2014 - 21:41

This is a guest post from Marcus Blankenship.

Putting Down the Tools Avoiding the #1 Mistake All New Software Managers Make in a Crisis

Yes, we’ve all done it.

I’ve done it. My managers have done it. And most likely, it’s the default setting for many battle-worn team leads.

In those crisis moments when expectations are high, everything is going off the rails, and the deadline was yesterday, all new software managers will utter one fateful sentence.

"It would be so much faster if I just did this myself."

Then they pick up the IDE, close the office door, and retreat into the code.

What they don’t realize is that they’ve just kicked off a chain reaction of negative work practices that will plague their team for months.

Save Yourself

As you probably understand all too well, the programming world is full of fires to extinguish. No matter what your company’s size or specialty, you are going to hit many, many pressure cooker days.

It’s up to you to anticipate the inevitable before it happens and make a firm, conscious choice not to try to code your way out of a crisis.

Why We Make This Classic Mistake

Making the switch from programmer to team lead can be flat out awkward at times. You’ve gone from playing in the orchestra from leading it, which really shakes up your identity.

The most significant shift comes with the realization that you’re no longer a production unit, but you’re still working in an industry that values production. If you’ve always taken pride in the tidy transaction of being paid for your output, management is a strange new world.

Mixed Messages

First off, while promotions are a much-coveted status symbol, they carry a built-in mixed message. It’s as if your company is saying, "You’re such a great programmer! Now let’s have you do less of that."

Doing less coding often feels like you’re doing less altogether or that you’re not as important to the company, especially when you’re just learning the ropes.

Dependent on Others

Then there’s the new reality of how production happens. As a programmer, your brainpower and clever workarounds could conjure magnificent code from your fingers.

The programs you created could keep manufacturing production humming or bring it to a standstill. Entire businesses hung on your creations.

Now you’re dependent on other people to create the code that works this magic. And making the switch from creating your own technological incantations to supporting other people’s efforts really feels awkward for a while.

It Gets Worse

That clumsy beginner’s feeling gets cranked up when you face the new aspects of team dynamics and the reality of being without your most powerful weapon: your ability to code.

But you really get triggered by array of new expectations staring at you, like:

  • Figuring out your boss’s agenda
  • Assigning work to former coworkers
  • Dealing with morale speed bumps
  • Steering through software quality problems

Stepping into a new management role can feel like stepping into a plane’s cockpit without any flight training. There are all these complicated, unfamiliar dials and gauges you don’t understand, and you don’t know if that blinking light means an engine has failed or the coffee’s ready.

In order to escape this anxiety loop, many new managers fall into one of two patterns:

  • Code less, but indulge their coding desires by putting out fires and debugging whenever possible.
  • Code even more intently, blocking out team members and undermining their credibility as a leader.

Each of these positions is a pattern that will cause you to fail.

Let me say it again. Retreating to the position of "I’ll do it myself" isn’t going to save you any time or frustration. It’s only going to crank up the problems.

## I’ll Do It Myself’: Say Hello to a Cascade of Problems

Once you’ve uttered those fateful words, you are no longer doing the job you were paid to do: leading. And if you’re not doing your job, especially when your team needs your leadership most, then no one is.

If you follow through on your natural desire to just do it yourself, you’ve chosen to avoid doing the following critical items:

Communication

Checking in with your team members and other departments is the heart of your work. But chaining yourself to your computer stops that cold. Suddenly, you’ve cut off any team discussions about quality, progress, or individual development. Everyone loses.

Connection

Jumping in to correct someone else’s work not only short-circuits their learning process, it teaches them to do substandard work. Pretty soon they only turn in partially completed work because they figure you’ll just redo it anyway.

"Why bother?", they think. They lose their drive, you lose their trust, and projects start to suffer.

It’s like a conductor wrestling the tuba away from the tuba player.

Momentum

When you move back into production mode, there’s no one available to keep team moving forward. Essential work like status updates, work review, and new project preparation is completely out the window until you pick your head up.

You’ve just traded away hours of non-replaceable time and energy that should have been devoted to doing the job you’re hired to do. And when you get back to the work that’s been piling up on your desk, you’ve got to dig out all over again.

Sustainability

Let’s be honest. It feels great to be a hero, swooping in to save the day with a brilliant solution. But the cost is just too high.

A "quick fix" always takes much longer, turning your one-hour job into six hours (or more!) of mind-numbing panic. While you code up a fix, your resentment and exhaustion build and you’re teaching your team to depend on your last-minute efforts to bail them out instead of their own skills.

Management Skills

Running to your comfort zone feels safe, but it’s a very limited strategy that keeps you from making the changes necessary for you to be a good manager.

Trust me. Take a chance, drop the IDE security blanket and interact with your team instead.

Best Practices for Crises

While you’re waiting for your team to complete their work, you don’t need to sit passively or beat them with the boss stick. Here are a few tips that will keep you sane and your team on track.

Don’t Do it Yourself

If there’s a problem that you desperately need to rework, *don’t do it yourself. Grab a programmer and say, "Let’s review this code together to fix it."

Sure, you just scratched your itch by getting involved, but you’ve also established much-needed rapport and created up a teaching moment–for both of you.

Communicate

Panicked about a deadline? Get a status update from the programmers, adjust the workload if needed, and contact the client (or your boss) to give them an update.

Once again, you’ve opened lines of communication and placed your focus on the overall project, not just your team’s to-do list or your stress.

OK, Go Ahead And Code…

Can’t keep yourself away from temptation? Fine. Then code. But don’t code anything that will be used in production.

I cannot emphasize this enough. If you absolutely, positively can’t resist, then choose a project that is on the back burner. You don’t want to muck up a project in process.

This Is About the Long Game

Great management practices depend on a solid, thoughtful foundation created outside the heat of crisis. But it’s worth remembering that reputations are earned and character is uncovered during crucial, adrenaline-filled moments.

Remember that your purpose as a manager is to create a team that is capable of producing excellent code.

In those instinct-driven moments, especially in beginner’s panic, please take a step back. Avoid this classic management blunder.

About the Author

Nearly 20 years ago I made the leap from senior-level hacker to full-on tech lead. Practically overnight I went from writing code to being in charge of actual human beings.

Without training or guidance, I suddenly had to deliver entire products on time and under budget, hit huge company goals, and do it all with a smile on my face. I share what I’ve learned here.

Categories: Companies

Test Automation in the Age of Continuous Delivery

TestDriven.com - Tue, 09/30/2014 - 19:57
  I spend a lot of my time with clients figuring out the minutia of how to implement optimal test automation strategies that support their transition to Continuous Delivery. The goal is typically to be able to release software after each 2-week iteration (Sprint). When faced with such a compressed development and testing schedule, most […]
Categories: Communities

Neo4j: Generic/Vague relationship names

Mark Needham - Tue, 09/30/2014 - 18:47

An approach to modelling that I often see while working with Neo4j users is creating very generic relationships (e.g. HAS, CONTAINS, IS) and filtering on a relationship property or on a property/label at the end node.

Intuitively this doesn’t seem to make best use of the graph model as it means that you have to evaluate many relationships and nodes that you’re not interested in.

However, I’ve never actually tested the performance differences between the approaches so I thought I’d try it out.

I created 4 different databases which had one node with 60,000 outgoing relationships – 10,000 which we wanted to retrieve and 50,000 that were irrelevant.

I modelled the ‘relationship’ in 4 different ways…

  • Using a specific relationship type
    (node)-[:HAS_ADDRESS]->(address)
  • Using a generic relationship type and then filtering by end node label
    (node)-[:HAS]->(address:Address)
  • Using a generic relationship type and then filtering by relationship property
    (node)-[:HAS {type: "address"}]->(address)
  • Using a generic relationship type and then filtering by end node property
    (node)-[:HAS]->(address {type: “address”})

…and then measured how long it took to retrieve the ‘has address’ relationships.

The code is on github if you want to take a look.

Although it’s obviously not as precise as a JMH micro benchmark I think it’s good enough to get a feel for the difference between the approaches.

I ran a query against each database 100 times and then took the 50th, 75th and 99th percentiles (times are in ms):

Using a generic relationship type and then filtering by end node label
50%ile: 6.0    75%ile: 6.0    99%ile: 402.60999999999825
 
Using a generic relationship type and then filtering by relationship property
50%ile: 21.0   75%ile: 22.0   99%ile: 504.85999999999785
 
Using a generic relationship type and then filtering by end node label
50%ile: 4.0    75%ile: 4.0    99%ile: 145.65999999999931
 
Using a specific relationship type
50%ile: 0.0    75%ile: 1.0    99%ile: 25.749999999999872

We can drill further into why there’s a difference in the times for each of the approaches by profiling the equivalent cypher query. We’ll start with the one which uses a specific relationship name

Using a specific relationship type

neo4j-sh (?)$ profile match (n) where id(n) = 0 match (n)-[:HAS_ADDRESS]->() return count(n);
+----------+
| count(n) |
+----------+
| 10000    |
+----------+
1 row
 
ColumnFilter
  |
  +EagerAggregation
    |
    +SimplePatternMatcher
      |
      +NodeByIdOrEmpty
 
+----------------------+-------+--------+-----------------------------+-----------------------+
|             Operator |  Rows | DbHits |                 Identifiers |                 Other |
+----------------------+-------+--------+-----------------------------+-----------------------+
|         ColumnFilter |     1 |      0 |                             | keep columns count(n) |
|     EagerAggregation |     1 |      0 |                             |                       |
| SimplePatternMatcher | 10000 |  10000 | n,   UNNAMED53,   UNNAMED35 |                       |
|      NodeByIdOrEmpty |     1 |      1 |                        n, n |          {  AUTOINT0} |
+----------------------+-------+--------+-----------------------------+-----------------------+
 
Total database accesses: 10001

Here we can see that there were 10,002 database accesses in order to get a count of our 10,000 HAS_ADDRESS relationships. We get a database access each time we load a node, relationship or property.

By contrast the other approaches have to load in a lot more data only to then filter it out:

Using a generic relationship type and then filtering by end node label

neo4j-sh (?)$ profile match (n) where id(n) = 0 match (n)-[:HAS]->(:Address) return count(n);
+----------+
| count(n) |
+----------+
| 10000    |
+----------+
1 row
 
ColumnFilter
  |
  +EagerAggregation
    |
    +Filter
      |
      +SimplePatternMatcher
        |
        +NodeByIdOrEmpty
 
+----------------------+-------+--------+-----------------------------+----------------------------------+
|             Operator |  Rows | DbHits |                 Identifiers |                            Other |
+----------------------+-------+--------+-----------------------------+----------------------------------+
|         ColumnFilter |     1 |      0 |                             |            keep columns count(n) |
|     EagerAggregation |     1 |      0 |                             |                                  |
|               Filter | 10000 |  10000 |                             | hasLabel(  UNNAMED45:Address(0)) |
| SimplePatternMatcher | 10000 |  60000 | n,   UNNAMED45,   UNNAMED35 |                                  |
|      NodeByIdOrEmpty |     1 |      1 |                        n, n |                     {  AUTOINT0} |
+----------------------+-------+--------+-----------------------------+----------------------------------+
 
Total database accesses: 70001

Using a generic relationship type and then filtering by relationship property

neo4j-sh (?)$ profile match (n) where id(n) = 0 match (n)-[:HAS {type: "address"}]->() return count(n);
+----------+
| count(n) |
+----------+
| 10000    |
+----------+
1 row
 
ColumnFilter
  |
  +EagerAggregation
    |
    +Filter
      |
      +SimplePatternMatcher
        |
        +NodeByIdOrEmpty
 
+----------------------+-------+--------+-----------------------------+--------------------------------------------------+
|             Operator |  Rows | DbHits |                 Identifiers |                                            Other |
+----------------------+-------+--------+-----------------------------+--------------------------------------------------+
|         ColumnFilter |     1 |      0 |                             |                            keep columns count(n) |
|     EagerAggregation |     1 |      0 |                             |                                                  |
|               Filter | 10000 |  20000 |                             | Property(  UNNAMED35,type(0)) == {  AUTOSTRING1} |
| SimplePatternMatcher | 10000 | 120000 | n,   UNNAMED63,   UNNAMED35 |                                                  |
|      NodeByIdOrEmpty |     1 |      1 |                        n, n |                                     {  AUTOINT0} |
+----------------------+-------+--------+-----------------------------+--------------------------------------------------+
 
Total database accesses: 140001

Using a generic relationship type and then filtering by end node property

neo4j-sh (?)$ profile match (n) where id(n) = 0 match (n)-[:HAS]->({type: "address"}) return count(n);
+----------+
| count(n) |
+----------+
| 10000    |
+----------+
1 row
 
ColumnFilter
  |
  +EagerAggregation
    |
    +Filter
      |
      +SimplePatternMatcher
        |
        +NodeByIdOrEmpty
 
+----------------------+-------+--------+-----------------------------+--------------------------------------------------+
|             Operator |  Rows | DbHits |                 Identifiers |                                            Other |
+----------------------+-------+--------+-----------------------------+--------------------------------------------------+
|         ColumnFilter |     1 |      0 |                             |                            keep columns count(n) |
|     EagerAggregation |     1 |      0 |                             |                                                  |
|               Filter | 10000 |  20000 |                             | Property(  UNNAMED45,type(0)) == {  AUTOSTRING1} |
| SimplePatternMatcher | 10000 | 120000 | n,   UNNAMED45,   UNNAMED35 |                                                  |
|      NodeByIdOrEmpty |     1 |      1 |                        n, n |                                     {  AUTOINT0} |
+----------------------+-------+--------+-----------------------------+--------------------------------------------------+
 
Total database accesses: 140001

So in summary…specific relationships #ftw!

Categories: Blogs

Going to SAFe Program Consultant Training – London England

Learn more about our Scrum and Agile training sessions on WorldMindware.com

Travis Birch and I are going next week to the SAFe Program Consultant (SPC) training with Dean Leffingwell.  For Berteig Consulting, this will be an opportunity to expand our knowledge and to, perhaps, offer some new services including training and consulting.  Of course, there have been many articles written about SAFe from a Scrum perspective, but I’m hoping to write an article about it from an enterprise Agility perspective.  I have been involved as a coach and consultant in a number of such transformations, and I’m interested to see what I can learn from SAFe and perhaps how it can help to improve our Real Agility Program.  I currently consider SAFe to be a “pragmatic” approach to enterprise Agility vs. a “transformative” approach.  This perspective is based on some light reading and 3rd party reports about SAFe… clearly not a good enough base of knowledge!  I’m looking forward to bridging that gap!

Try out our Virtual Scrum Coach with the Scrum Team Assessment tool - just $500 for a team to get targeted advice and great how-to informationPlease share!
facebooktwittergoogle_plusredditpinterestlinkedinmail
Categories: Blogs

Agile Culture

Scrum Expert - Tue, 09/30/2014 - 15:58
If you asked software developers about Agile, there are chances that a majority will discuss it with words like “Scrum”, “sprints” or “retrospectives”. However Agile is not just a collection of techniques and practices, but it is more a state of mind or a culture. This is the topic of this book written by Pollyanna Pixton, Paul Gibson and Niel Nickolaisen. All the people and values aspects of Agile software development are discussed in this book that provides both conceptual material and real life stories. This is a book that I ...
Categories: Communities

Agile Slovenia, Ljubljana, Slovenia, October 10 2014

Scrum Expert - Tue, 09/30/2014 - 14:39
Agile Slovenia is a one day conference focused on Agile and Scrum topics that take place in Ljubljana. Agile Slovenia is an event where you can turn theory in action, meet agile people, listen to agile pioneers, expand your knowledge. It features both local and international Agile experts. In the agenda you can find topics like “Shifting from manufacturing product to creating a service”, “Complex Projects aren’t planable but controllable”, “There’s no such thing as an agile contract”, “How Agile Coaches help us win – the Agile Coach role at Spotify”, ...
Categories: Communities

Large Scale Scrum (LeSS)

Agile World - Venkatesh Krishnamurthy - Tue, 09/30/2014 - 13:52

Last week, I had the opportunity to speak about Large Scale Scrum (LeSS) at Agile PM meet up group  in Melbourne.  It was really an honor to speak with such an incredibly experienced, knowledgeable audience. At the end of the session, we had very engaging Q&A.

As part of the session, I shared some of the challenges of  scaling Agile and possible solutions as well. One of the solution being, applying the Large Scale Scrum(LeSS). 

Based on my experience of working on several large scale Agile projects, I have come to realize the following 4 types of challenges common across large enterprises.  They are People, Process, Tools/Technology and Org Structure/Culture. 

I have summarized the challenges into this diagram

image

Even though these challenges are common in small Agile projects but gets amplified while scaling Agile.

The popular  Scaling Frameworks are as follows:   Spotify,  XScale, SAFe, DAD (Disciplined Agile Delivery).

image

In addition to the above,  Large Scale Scrum(LeSS) by Craig Larman is popular as well.  I have personally applied this while working with Craig Larman during 2006 at Valtech India. LeSS and LeSS Huge are two variants for large scale projects.  LeSS huge can be depicted as shown in the diagram below:

image

LeSS is based on some of the proven principles around Queuing Theory,  Systems Thinking  and Empirical Process Control  as shown below.

image

If you want to learn more about  applying Large Scale Scrum on your projects, do drop me an email and happy to share the ideas.

Categories: Blogs

15 Useful Pacts for Agile Teams: An Agile Team Creed

Agile Management Blog - VersionOne - Tue, 09/30/2014 - 13:49

The Agile Manifesto values “Individuals and Interactions” over “Process and Tools.”  I suspect it was no accident that this was listed first.  Lack of communication, miscommunication, or the mistaken presumption that communication has occurred, are the root cause of many problems.  Yet, we still focus much discussion on the process and tools.

  • When and how do you conduct a certain Scrum ceremony?
  • Are you practicing pure Scrum?
  • Which agile tools do you use and why?
  • How do you use the tool?
  • What agile metrics are you using and how?
  • What best practices exist?
  • And lots more concerns and questions about how to “do” process…

However, most all change transformations to “be” agile struggle with organizational culture and many organizations and teams continue to fail to reach their full potential because of issues related to “individuals and interactions.”

Teams are the heartbeat of agile development.  It’s the people that produce business success.  Even when the organizational culture embraces agile values, the teams must also address their individuals and healthy interactions among them to maximize value.  This takes time and effort to mature.  I find we mistakenly assume that since people know each other already and even “behave” nicely, they think they can skip over the team-building activities.

Simply gathering individuals together and assigning them the label of “team” does not make a team.  Each team is comprised of unique personalities and thus there is no cookie-cutter best answer for every team.  Each team must find its own way and navigate the uniqueness of each individual to determine the best way to handle interactions that work for their team dynamic.  There should be “pacts” that everyone on the team agrees to.  If there are dysfunctions (egos, prima donnas, passive aggressive behavior, apathy, poor listening skills, a presumed hierarchy, and so on), then the challenge is an order of magnitude that’s much more difficult.  Each team is a unique, dynamic system, subject to changing moods and their environment.

Sadly, some agile teams never evolve beyond a collection of individuals who meet together at the prescribed Scrum ceremonies and then return to work independently with a focus on their individual tasks only.  Maybe the ability to track their work has improved, but they have failed to recognize and harness the true potential that comes from working as a high-performing team.  Perhaps you have seen teams who are full of energy and so you recognize the power of teams?  So, why do some teams never get there?  There are multiple factors that influence agile and Scrum teams.

Let’s assume that the organization’s leadership values the power of teams and has a supportive culture and vision in play.  So what can be done within the team to ensure that it sets a solid foundation for growth and success?

During the team forming stage, it is important for the team members to openly discuss behaviors and expectations.  There is great value in recording those discussions as a reminder to the members when they do encounter problems.  But, they need to dive deeply into what each member truly believes and feels.  Because what you believe and how you feel directly determines how you will behave.  These team discussions must go beyond the process-mechanics agreements such as what time to meet.  They need to communicate something more of an “agile team creed” or list of pacts they make with one another. Team members must be comfortable sharing their fears and concerns.

Having an agile team creed is a great starting point for deep team discussions to root out true beliefs.  It captures cultural expectations and behaviors for the team which I believe lay the foundation for a great agile team.  Here is my list of 15 useful pacts agile teams can make.  I call it the Agile Team Creed.

Agile Team Creed

Has your team had these deep conservations about what they believe?  Would they agree with these items as part of their Agile Team Creed?  If not, why?  Perhaps, it reveals a cultural issue that needs attention.

Categories: Companies

New Connections Functionality Now Available

LeanKit’s new Connections functionality has been released to all Portfolio Edition accounts. Find out how this enables you to track and manage work distributed across multiple teams and boards — without losing sight of the details. For some time now, LeanKit has provided the ability to create drill-through relationships. This functionality enabled you to establish parent-child associations between a […]

The post New Connections Functionality Now Available appeared first on Blog | LeanKit.

Categories: Companies

Swarming Context

Agile Tools - Tue, 09/30/2014 - 07:12

Rail_Bridge_Swarm_of_Starlings._-_geograph.org.uk_-_124591

The application of Swarming as a method can be broken down into four main contexts. For each context the process of swarming is different. Allowing for different contexts makes sense, because we really can’t expect the same process to work equally well in every situation. Even the simplest animals are able to exhibit variations in behavior based on the context, so why shouldn’t our processes? We change our behavior to match the circumstances. That is, unless we are using fixed methods like Scrum or Kanban. If you are using fixed methods, the proscription is to treat the process in a fractal fashion, repeating it everywhere. Practically speaking, by having only one process these methods ignore the context.

So what are the four contexts of Swarming? Here they are in no particular order:

  • Emergencies
  • Shifting Gears
  • Innovation
  • Building

Emergencies represent the simplest context for swarming. When a crisis occurs, it’ all hands on deck. Everyone joins the conversation and brings whatever specific expertise they have to the party. The group self-organizes to enable those present to contribute to solving the problem. You see this a lot in production operations environments when a “P1″ defect occurs or, heaven forbid, the production system goes down. When this happens, everyone swarms on the problem. Some are gathering information, some are listening and integrating the information, and some are taking action to try and remedy the situation. All of this is happening dynamically in the moment without central organization. All of these activities are critical to the success of the swarm. During a crisis, nobody is going to stop what they are doing for a standup meeting, and they sure as hell aren’t interested in seeing your Kanban board.

Shifting gears refers to when the system is in transition. The corporate ecosystems that we are all a part of are changing faster with every passing day. New products are coming to market and disrupting the old ones. It’s not enough to simply work within the existing system. You can’t keep up that way. These days corporations have to match their structure to the complexity of the environment. That’s hard, and that’s where swarming comes in. Like when honey bees form a swarm, the corporation reaches a critical mass where a new structure is necessary. Up until this point, the hive has been a stable and reliable structure, but with the presence of a new queen everything changes. A cascade of events takes place where the hive moves on. This can also happen with companies. When they reach a certain size, they can spin off subsidiaries, divisions, and even teams. We see this when teams reach critical mass and split into two teams (meiosis). On swarming teams, we use simple rules to enable groups to decide on their own when division should take place (Team size of 7 plus or minus 2). We use the swarming values and principles to help guide who works on each team – always leaning toward letting individuals decide based on where their own passions take them.

In swarming, Innovation is treated as foraging. We are foraging for new information and new ideas. In this context we are actively using our social networks to recruit new people and new ideas to our cause. This can be initiated as part of a special state (shifting gears) or it can be part of the ongoing activities of the team. When ants are foraging, they tend to follow the strongest pheromone trails to a food source. However this rule is not universal. There are ants who wander off the pheromone trail from time to time. These solitary explorers are the ones who have the unique opportunity to wander off the beaten path and potentially find rich new sources of food. So too, we want people on our team not to follow the team too closely. It’s best if they can wander off and explore side avenues and blind alleys. This isn’t something that is dictated, it’s a natural part of teams with rich diversity. People make these decisions on their own and either bring them back to the original team or they form a new team.

Building takes place when we are trying to strengthen our networks. As a team is growing it uses it’s social networks to strengthen bonds both within and without the team. This can be as simple as increasing the number of social “touches” on a team. Social touches are things like: greeting each other, going out to lunch together, supporting each other’s work. There are some people who are stronger at this than others. Some people tend to form many lightweight social contacts (which is very useful). On the other hand, there are those who only have a few deep, strong relationships. A good swarming team is composed of a healthy balance of both types of people.

In summary, swarming is used differently based on the context you are in. Understand the context, and you are prepared to take advantage of the power of swarming.

 


Filed under: Agile, Swarming, Teams Tagged: animals, building, context, emergencies, innovation, Kanban, methods, Scrum, shifting gears, Swarming, Teams
Categories: Blogs

PostgreSQL: ERROR: column does not exist

Mark Needham - Tue, 09/30/2014 - 00:40

I’ve been playing around with PostgreSQL recently and in particular the Northwind dataset typically used as an introductory data set for relational databases.

Having imported the data I wanted to take a quick look at the employees table:

postgres=# SELECT * FROM employees LIMIT 1;
 EmployeeID | LastName | FirstName |        Title         | TitleOfCourtesy | BirthDate  |  HireDate  |           Address           |  City   | Region | PostalCode | Country |   HomePhone    | Extension | Photo |                                                                                      Notes                                                                                      | ReportsTo |              PhotoPath               
------------+----------+-----------+----------------------+-----------------+------------+------------+-----------------------------+---------+--------+------------+---------+----------------+-----------+-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+--------------------------------------
          1 | Davolio  | Nancy     | Sales Representative | Ms.             | 1948-12-08 | 1992-05-01 | 507 - 20th Ave. E.\nApt. 2A | Seattle | WA     | 98122      | USA     | (206) 555-9857 | 5467      | \x    | Education includes a BA IN psychology FROM Colorado State University IN 1970.  She also completed "The Art of the Cold Call."  Nancy IS a member OF Toastmasters International. |         2 | http://accweb/emmployees/davolio.bmp
(1 ROW)

That works fine but what if I only want to return the ‘EmployeeID’ field?

postgres=# SELECT EmployeeID FROM employees LIMIT 1;
ERROR:  COLUMN "employeeid" does NOT exist
LINE 1: SELECT EmployeeID FROM employees LIMIT 1;

I hadn’t realised (or had forgotten) that field names get lower cased so we need to quote the name if it’s been stored in mixed case:

postgres=# SELECT "EmployeeID" FROM employees LIMIT 1;
 EmployeeID 
------------
          1
(1 ROW)

From my reading the suggestion seems to be to have your field names lower cased to avoid this problem but since it’s just a dummy data set I guess I’ll just put up with the quoting overhead for now.

Categories: Blogs

R: Deriving a new data frame column based on containing string

Mark Needham - Mon, 09/29/2014 - 23:37

I’ve been playing around with R data frames a bit more and one thing I wanted to do was derive a new column based on the text contained in the existing column.

I started with something like this:

> x = data.frame(name = c("Java Hackathon", "Intro to Graphs", "Hands on Cypher"))
> x
             name
1  Java Hackathon
2 Intro to Graphs
3 Hands on Cypher

And I wanted to derive a new column based on whether or not the session was a practical one. The grepl function seemed to be the best tool for the job:

> grepl("Hackathon|Hands on|Hands On", x$name)
[1]  TRUE FALSE  TRUE

We can then add a column to our data frame with that output:

x$practical = grepl("Hackathon|Hands on|Hands On", x$name)

And we end up with the following:

> x
             name practical
1  Java Hackathon      TRUE
2 Intro to Graphs     FALSE
3 Hands on Cypher      TRUE

Not too tricky but it took me a bit too long to figure it out so I thought I’d save future Mark some time!

Categories: Blogs

JIRA is Not Agile

Notes from a Tool User - Mark Levison - Mon, 09/29/2014 - 22:00

Agile pyramid 2I’ve heard people say, “We started using Jira and GreenHopper, so we’re Agile now”. Similar things are said of Rally, VersionOne, LeanKit, TargetProcess, etc. In making those declarations, it’s clear that they don’t understand Agile at all.

At its core, Agile is a set of Values and Principles:

…. ·      Individuals and interactions over processes and tools ·      Working software over comprehensive documentation ·      Customer collaboration over contract negotiation ·      Responding to change over following a plan That is, while there is value in the items on the right, we value the items on the left more.

Underlying these is a mindset with a focus on self-discipline, self-organization, and adaption to change.

The Practices of Scrum (Sprint Planning, Daily Scrum, Sprint Review, and Sprint Retrospective), the Roles (ScrumMaster, Product Owner, and Development Team), and the Artifacts (Product Backlog, Sprint Backlog, and Product Increment) only help the team support the principles and achieve the goal of delivering working software.

Electronic tools (Task walls, Development Environments, …) or physical tools (Task walls, …) are only useful in so far as they provide support for those principles and practices.

If your Agile adoption starts with a tool and a scattering of practices, then the whole point has been missed and the core – the essence – of Agile needs to be carefully reviewed until that is obvious.

Categories: Blogs

The Edible Birthday

Portia Tung - Selfish Programming - Mon, 09/29/2014 - 21:33

Birthday Cake

I’m a September baby and this year, I treated my friends at work to the most scrumptious cake I’ve ever seen. What better way to remind us of transformative journeys along the Yellow Brick Road than with the Rainbow Cake by Hummingbird Cafe.

Folks ummed and ahhed at the sight of the multi-coloured layers of this beautiful cake only to be further and pleasantly surprised by the bubble gum flavoured icing as they savoured their slice.

“If I still wrote software, this would be the kind of software I’d create,” I mused, drunken on sugary goodness.

“Don’t you think that would smack of over-engineering?” remarked a fellow cake-connoisseur.

Of course, what I really meant to say is that I marvel at the thought of creating something so thoughtful and thought-provoking as the Rainbow cake. A work of exquisite (and, optionally, edible) beauty that would brighten the world and remind us of what passion, creativity and craftsmanship can produce.

All this serves as a timely reminder that we need to live our dreams to make them come true.

Here’s to enjoying the rainbows on your journey of a lifetime. Happy Birthday one and all!

Categories: Blogs

Combining PRINCE2 with Agile

TV Agile - Mon, 09/29/2014 - 20:01
PRINCE2 and Agile are now both mainstream and many organizations are combining these two approaches but with varying degrees of success. How can you benefit from linking these two approaches which, on the face of it, may not be seen as complementary? This presentation covers: * What are the relative strengths of each approach? * […]
Categories: Blogs

Getting the Best out of Scrum

Scrum Expert - Mon, 09/29/2014 - 19:30
The best known project management framework with an Agile approach is Scrum. For something that is relatively simple to understand there is a lot of hype surrounding it. But why? This video explains: * What Scrum is and more importantly what Scrum is not? * Is Scrum too simplistic or is that its strength? * Can you run a project using Scrum or is it just for product development? * How do you scale Scrum to work in complex situations and environments? * How does Scrum relate to other methods and approaches? * Are Scrum certifications worth ...
Categories: Communities

The Future of Jobs

J.D. Meier's Blog - Mon, 09/29/2014 - 17:46

Will you have a job in the future?

What will that job look like and how will the nature of work change?

Will automation take over your job in the near future?

These are the kinds of questions that Ruth Fisher, author of Winning the Hardware-Software Game, has tackled in a series of posts.

I wrote a summary post to distill her big ideas and insights about the future of jobs in my post:

The Future of Jobs

Fisher has done an outstanding job of framing out the landscape and walking the various arguments and perspectives on how automation will change the nature of work and shape the future of jobs.

One of the first things you might be wondering is, what jobs will automation take away?

Fisher addresses that.

Another question is, what new types jobs will be created?

While that’s an exercise for the reader, Fisher provides clues based on what industry luminaries have seen in terms of how jobs are changing.

The key is to know what automation can and can’t do, and to look at the pattern of work in terms of what’s better suited for humans, and what’s better suited for machines.

As one of my mentors puts it, “If the work can be automated, it’s not human.”

He’s a fan of people doing creative, non-routine work, where they can thrive and shine.

As I take on work, or push back on work, I look through a pretty simple lens:

  1. Is the work repetitive in nature? (in which case, something that should be automated)
  2. Is the work a high-value activity? (if not, why am I doing non high-value activities?)
  3. Does the work create greater capability? (for me, the team, the organization, etc.)
  4. Does the work play to my strengths? (if not, who is a better resource or provider.  You grow faster in your strengths, and in today’s world, if people aren’t giving their best where they have their best to give, it leads to a low-impact team that eventually gets out-executed, or put out to Pasteur.)
  5. Does the work lead to world-class impact?  (When everything gets exposed beyond the firewall, and when it’s a globally connected ecosystem, it’s really important to not only bring your A-game, but to play in a way where you can provide the best service in the world for your specific niche.   If you can’t be the best in your niche in a sustainable way, then you’re in the wrong niche.)

I find that by using this simple lens, I tend to take on high-value work that creates high-impact, that cannot be easily automated.  At the same time, while I perform the work, I look for way to turn things into repetitive activities that can be outsources or automated so that I can keep moving up the stack, and producing higher-value work … that’s more human.

Categories: Blogs

Introducing SAFe for Lean Systems Engineering Big Picture 0.21

Agile Product Owner - Mon, 09/29/2014 - 17:41

Hi,

In an earlier post, we announced the development of a new solution framework: SAFe for Lean Systems Engineering (SAFe LSE for short). Work continues at a pretty good pace, and the team (mostly Dean, Harry, Inbar, Alex, Regina so far) have been pretty busy putting up the new site, developing content, and of course, working on the Big Picture for SAFe LSE.

As we often discuss in SPC class, the BP serves as the domain model for the framework, as it it identifies the people that do the work, their major activities, and the work products that capture decisions and help guide the flow of work.

To that end, attached is “v 0.21″ of the SAFe LSE BP, being the 21st internal revision of this diagram. It’s far from done (SAFe reached over 100 revs before I stopped counting) but we think it tells an adequate story to start with. Of course, it isn’t fully self-explanatory, for that we need content and the website, but we think those building big complex systems can get an idea where we are headed with this version.

SAFe for Lean Systems Engineering Big Picture v0.21

SAFe for Lean Systems Engineering Big Picture v0.21

As for the near-future roadmap, we will go live with a preview version sometime in November. That one will be fully open for comments, so all can participate in adding value and criticism, as the case may be.

We do look forward to your input, and be assured that we have now fully committed to this initiative, so there will be a first GA release of SAFe for Lean systems Engineering sometime in 2015.

– Dean

PS: If you check the earlier post, you’ll see that Manifesto, which will also be elaborated with hyperlinked SAFe LSE principles, has also been updated, based on a lot of feedback we have garnered so far

PPS: We are also building a new 3-day training course:

Leading SAFe LSE: Leading Lean Systems Development with 
SAFe for Lean Systems Engineering.

The first version of this course will be delivered the first week in February, in the Boulder training center. We should have that course available for registration later this week.

Categories: Blogs

Knowledge Sharing


SpiraTeam is a agile application lifecycle management (ALM) system designed specifically for methodologies such as scrum, XP and Kanban.