Skip to content

Feed aggregator

Python/scikit-learn: Detecting which sentences in a transcript contain a speaker

Mark Needham - Sat, 02/21/2015 - 00:42

Over the past couple of months I’ve been playing around with How I met your mother transcripts and the most recent thing I’ve been working on is how to extract the speaker for a particular sentence.

This initially seemed like a really simple problem as most of the initial sentences I looked at weere structured like this:

<speaker>: <sentence>

If there were all in that format then we could write a simple regular expression and then move on but unfortunately they aren’t. We could probably write a more complex regex to pull out the speaker but I thought it’d be fun to see if I could train a model to work it out instead.

The approach I’ve taken is derived from an example in the NLTK book.

The first problem with this approach was that I didn’t have any labelled data to work with so I wrote a little web application that made it easy for me to train chunks of sentences at a time:

2015 02 20 00 44 38

I stored the trained words in a JSON file. Each entry looks like this:

import json
with open("data/import/trained_sentences.json", "r") as json_file:
    json_data = json.load(json_file)
>>> json_data[0]
{u'words': [{u'word': u'You', u'speaker': False}, {u'word': u'ca', u'speaker': False}, {u'word': u"n't", u'speaker': False}, {u'word': u'be', u'speaker': False}, {u'word': u'friends', u'speaker': False}, {u'word': u'with', u'speaker': False}, {u'word': u'Robin', u'speaker': False}, {u'word': u'.', u'speaker': False}]}
>>> json_data[1]
{u'words': [{u'word': u'Robin', u'speaker': True}, {u'word': u':', u'speaker': False}, {u'word': u'Well', u'speaker': False}, {u'word': u'...', u'speaker': False}, {u'word': u'it', u'speaker': False}, {u'word': u"'s", u'speaker': False}, {u'word': u'a', u'speaker': False}, {u'word': u'bit', u'speaker': False}, {u'word': u'early', u'speaker': False}, {u'word': u'...', u'speaker': False}, {u'word': u'but', u'speaker': False}, {u'word': u'...', u'speaker': False}, {u'word': u'of', u'speaker': False}, {u'word': u'course', u'speaker': False}, {u'word': u',', u'speaker': False}, {u'word': u'I', u'speaker': False}, {u'word': u'might', u'speaker': False}, {u'word': u'consider', u'speaker': False}, {u'word': u'...', u'speaker': False}, {u'word': u'I', u'speaker': False}, {u'word': u'moved', u'speaker': False}, {u'word': u'here', u'speaker': False}, {u'word': u',', u'speaker': False}, {u'word': u'let', u'speaker': False}, {u'word': u'me', u'speaker': False}, {u'word': u'think', u'speaker': False}, {u'word': u'.', u'speaker': False}]}

Each word in the sentence is represented by a JSON object which also indicates if that word was a speaker in the sentence.

Feature selection

Now that I’ve got some trained data to work with I needed to choose which features I’d use to train my model.

One of the most obvious indicators that a word is the speaker in the sentence is that the next word is ‘:’ so ‘next word’ can be a feature. I also went with ‘previous word’ and the word itself for my first cut.

This is the function I wrote to convert a word in a sentence into a set of features:

def pos_features(sentence, i):
    features = {}
    features["word"] = sentence[i]
    if i == 0:
        features["prev-word"] = "<START>"
        features["prev-word"] = sentence[i-1]
    if i == len(sentence) - 1:
        features["next-word"] = "<END>"
        features["next-word"] = sentence[i+1]
    return features

Let’s try a couple of examples:

import nltk
>>> pos_features(nltk.word_tokenize("Robin: Hi Ted, how are you?"), 0)
{'prev-word': '<START>', 'word': 'Robin', 'next-word': ':'}
>>> pos_features(nltk.word_tokenize("Robin: Hi Ted, how are you?"), 5)
{'prev-word': ',', 'word': 'how', 'next-word': 'are'}

Now let’s run that function over our full set of labelled data:

with open("data/import/trained_sentences.json", "r") as json_file:
    json_data = json.load(json_file)
tagged_sents = []
for sentence in json_data:
    tagged_sents.append([(word["word"], word["speaker"]) for word in sentence["words"]])
featuresets = []
for tagged_sent in tagged_sents:
    untagged_sent = nltk.tag.untag(tagged_sent)
    for i, (word, tag) in enumerate(tagged_sent):
        featuresets.append( (pos_features(untagged_sent, i), tag) )

Here’s a sample of the contents of featuresets:

>>> featuresets[:5]
[({'prev-word': '<START>', 'word': u'You', 'next-word': u'ca'}, False), ({'prev-word': u'You', 'word': u'ca', 'next-word': u"n't"}, False), ({'prev-word': u'ca', 'word': u"n't", 'next-word': u'be'}, False), ({'prev-word': u"n't", 'word': u'be', 'next-word': u'friends'}, False), ({'prev-word': u'be', 'word': u'friends', 'next-word': u'with'}, False)]

It’s nearly time to train our model, but first we need to split out labelled data into training and test sets so we can see how well our model performs on data it hasn’t seen before. sci-kit learn has a function that does this for us:

from sklearn.cross_validation import train_test_split
train_data,test_data = train_test_split(featuresets, test_size=0.20, train_size=0.80)
>>> len(train_data)
>>> len(test_data)

Now let’s train our model. I decided to try out Naive Bayes and Decision tree models to see how they got on:

>>> classifier = nltk.NaiveBayesClassifier.train(train_data)
>>> print nltk.classify.accuracy(classifier, test_data)
>>> classifier = nltk.DecisionTreeClassifier.train(train_data)
>>> print nltk.classify.accuracy(classifier, test_data)

It looks like both are doing a good job here with the decision tree doing slightly better. One thing to keep in mind is that most of the sentences we’ve trained at in the form ‘:‘ and we can get those correct with a simple regex so we should expect the accuracy to be very high.

If we explore the internals of the decision tree we’ll see that it’s massively overfitting which makes sense given our small training data set and the repetitiveness of the data:

>>> print(classifier.pseudocode(depth=2))
if next-word == u'!': return False
if next-word == u'$': return False
if next-word == u"'s": return False
if next-word == u"'ve": return False
if next-word == u'(':
  if word == u'!': return False
if next-word == u'*': return False
if next-word == u'*****': return False
if next-word == u',':
  if word == u"''": return False
if next-word == u'--': return False
if next-word == u'.': return False
if next-word == u'...':
  if word == u'who': return False
  if word == u'you': return False
if next-word == u'/i': return False
if next-word == u'1': return True
if next-word == u':':
  if prev-word == u"'s": return True
  if prev-word == u',': return False
  if prev-word == u'...': return False
  if prev-word == u'2030': return True
  if prev-word == '<START>': return True
  if prev-word == u'?': return False
if next-word == u'\u266a\u266a': return False

One update I may make to the features is to include the part of speech of the word rather than its actual value to see if that makes the model a bit more general. Another option is to train a bunch of decision trees against a subset of the data and build an ensemble/random forest of those trees.

Once I’ve got a working ‘speaker detector’ I want to then go and work out who the likely speaker is for the sentences which don’t contain a speaker. The plan is to calculate the word distributions of the speakers from sentences I do have and then calculate the probability that they spoke the unlabelled sentences.

This might not work perfectly as there could be new characters in those episodes but hopefully we can come up with something decent.

The full code for this example is on github if you want to have a play with it.

Any suggestions for improvements are always welcome in the comments.

Categories: Blogs

When Did Agile Become a Social Movement?

Leading Agile - Mike Cottmeyer - Fri, 02/20/2015 - 17:50

Occasionally on my personal Facebook page, I like to post controversial topics and see if we can get a civil conversation going around difficult issues. Civility can sometimes be difficult, but it is really interesting to hear different points of view and learn more about why people believe what they believe. In that spirit, I want to ask you guys a question…

Has agile become a social movement? If so, when did it happen? If so, why do you think it happened?

It seems to me back in the day, agile was about getting product into market faster… it was about working with customers to make sure we were building the stuff they really wanted… it was about craftsmanship and quality and excellence. There is a part of me that feels like some of us have taken things like self-organization, empowerment, and collaboration to an illogical extreme. Potentially to the detriment of some of our other goals.

I’m curious if this is just me or if anyone else feels this way. Please share your thoughts.

The post When Did Agile Become a Social Movement? appeared first on LeadingAgile.

Categories: Blogs

Three Things You MUST Know to Transform Any Sized Organization into an Agile Enterprise #Agile2015

Leading Agile - Mike Cottmeyer - Fri, 02/20/2015 - 17:12

Here is the talk I just submitted for the Agile2015 conference coming up this year in Washington DC. Noodling on submitting another one before the deadline in a few days, but I’m kinda one track on this topic right now, so it’s pretty much all I can really muster the energy to speak about ;-)

If you’ve got a minute head over to the submission site and give me your feedback.


The deeper we go down the path of scaled agile transformation, the more we are learning that adding additional process and additional complexity can only ever get us part of the way there. At some point size and complexity is going to limit our ability to be truly agile. To be truly agile, we have to reduce size and complexity and move toward greater organizational simplicity.

The challenge is that large organizations ARE often complex and usually anything but simple. Most agile transformations get started by either ignoring the complexity inherent in the system or by wrapping that complexity in planning constructs that can help in the short run, but are ultimately doomed to limit your business agility over time. There has to be another way.

To really achieve agility at scale, we have to stop chasing more advanced ways to manage complexity and seek out more effective patterns for moving toward greater simplicity. In short, it’s not the end state of an agile transformation that we must stay focused on, it’s the systematic process of reducing complexity that is critical to achieving your ultimate business goals.

To transform any sized organization into an agile enterprise, there are only three things you need to know to be successful:

  1. You must have complete cross functional teams
  2. You must have clear backlogs
  3. You must have the ability to produce a working, tested increment of product on regular intervals.

Every other benefit of agile is impossible without creating these three conditions for success; everything that gets in the way of creating these three conditions is an impediment to your transformation which has to be removed; and your transformation roadmap should be solely focused on how your going to make all this possible in your organization. Until you make this happen… nothing much else is going to matter.

This talk will explore patterns for creating cross-functional teams at scale, what that looks like, what get’s in the way, and how to get there. We’ll discover why clear backlogs are so hard to create and what you’ll need to do about it. Warning, this will not be easy! Finally, we’ll discuss why working tested software created on regular intervals is the secret sauce to actually getting the business benefits your organization is looking for.

Learning Outcomes:

  • Participants will learn the critical importance of forming complete cross-functional teams, why many common patterns for forming teams fail, patterns for forming teams successfully, and strategies for progressively reducing dependencies between teams over time.
  • Participants will learn common failure patterns we see around creating backlogs, how the guidance contained in Scrum and SAFe can actually prevent some organizations from creating effective backlogs, and patterns for creating backlogs that really get teams moving quick.
  • Participants will learn what it really means to create a working, tested, increment at the end of every sprint, release, or PI. They will learn how making this kind of progress support solid delivery metrics, increases predictability, and earns the trust of the business over time

Submission Link:

The post Three Things You MUST Know to Transform Any Sized Organization into an Agile Enterprise #Agile2015 appeared first on LeadingAgile.

Categories: Blogs

Exploring container platforms: StackEngine

Xebia Blog - Fri, 02/20/2015 - 16:51

Docker has been around for more than a year already, and there are a lot of container platforms popping up. In this series of blogposts I will explore these platforms and share some insights. This blogpost is about StackEngine.

TL;DR: StackEngine is (for now) just a nice frontend to the Docker binary. Nothing...

Categories: Companies

Agile on the Beach 2015: Call for Speakers

Scrum Expert - Fri, 02/20/2015 - 10:41
Agile on the Beach is a two-day conference that will take place September 3 and 4 2015 in Falmouth, UK. Agile on the Beach 2015 will again focus on three established themes (Business, Teams and Software Craftsmanship) plus the Product Management and Design theme. Agile on the Beach welcomes submissions on any subject that falls within the themes of the conference. In particular we are keen to encourage working practitioners to submit sessions and case studies. The conference encourages submissions in the business track especially to look outside and beyond Agile ...
Categories: Communities

Diamond Kata - TDD with only Property-Based Tests

Mistaeks I Hav Made - Nat Pryce - Fri, 02/20/2015 - 01:04
The Diamond Kata is a simple exercise that Seb Rose described in a recent blog post. Seb describes the Diamond Kata as: Given a letter, print a diamond starting with ‘A’ with the supplied letter at the widest point. For example: print-diamond ‘C’ prints A B B C C B B A Seb used the exercise to illustrate how he “recycles” tests to help him work incrementally towards a full solution. Seb’s approach prompted Alastair Cockburn to write an article in response in which he argued for more thinking before programming. Alastair’s article shows how he approached the Diamond Kata with more up-front analysis. Ron Jeffries and George Dinwiddie resonded to Alastair’s article, showing how they approached the Diamond Kata relying on emergent design to produce an elegant solution (“thinking all the time”, as Ron Jeffries put it). There was some discussion on Twitter, and several other people published their approaches. (I’ll list as many as I know about at the end of this article). The discussion sparked my interest, so I decided to have a go at the exercise myself. The problem seemed to me, at first glance, to be a good fit for property testing. So I decided to test-drive a solution using only property-based tests and see what happens. I wrote the solution in Scala and used ScalaTest to run and organise the tests and ScalaCheck for property testing. What follows is an unexpurgated, warts-and-all walkthrough of my progress, not just the eventual complete solution. I made wrong turns and stupid mistakes along the way. The walkthrough is pretty long, so if you want you don’t want to follow through step by step, jump straight to the complete solution and/or my conclusions on how the exercise went and what I learned. Alternatively, if you want to follow the walkthrough in more detail, the entire history is on GitHub, with a commit per TDD step (add a failing test, commit, make the implementation pass the test, commit, refactor, commit, … and repeat). Walkthrough Getting Started: Testing the Test Runner The first thing I like to do when starting a new project is make sure my development environment and test runner are set up right, that I can run tests, and that test failures are detected and reported. I use Gradle to bootstrap a new Scala project with dependencies on the latest versions of ScalaTest and ScalaCheck and import the Gradle project into IntelliJ IDEA. ScalaTest supports several different styles of test and assertion syntax. The user guide recommends writing an abstract base class that combines traits and annotations for your preferred testing style and test runner, so that’s what I do first: @RunWith(classOf[JUnitRunner]) abstract class UnitSpec extends FreeSpec with PropertyChecks { } My test class extends UnitSpec: class DiamondSpec extends UnitSpec { } I add a test that explicitly fails, to check that the test framework, IDE and build hang together correctly. When I see the test failure, I’m ready to write the first real test. The First Test Given that I’m writing property tests, I have to start with a simple property of the diamond function, not a simple example. The simplest property I can think of is: For all valid input character, the diamond contains one or more lines of text. To turn that into a property test, I must define “all valid input characters” as a generator. The description of the Diamond Kata defines valid input as a single upper case character. ScalaCheck has a predefined generator for that: val inputChar = Gen.alphaUpperChar At this point, I haven’t decided how I will represent the diamond. I do know that my test will assert on the number of lines of text, so I write the property with respect to an auxiliary function, diamondLines(c:Char):Vector[String], which will generate a diamond for input character c and return the lines of the diamond in a vector. "produces some lines" in { forAll (inputChar) { c => assert(diamondLines(c).nonEmpty) } } I like the way that the test reads in ScalaTest/ScalaCheck. It is pretty much a direct translation of my English description of the property into code. To make the test fail, I write diamondLines as: def diamondLines(c : Char) : Vector[String] = { Vector() } The entire test class is: import org.scalacheck._ class DiamondSpec extends UnitSpec { val inputChar = Gen.alphaUpperChar "produces some lines" in { forAll (inputChar) { c => assert(diamondLines(c).nonEmpty) } } def diamondLines(c : Char) : Vector[String] = { Vector() } } The simplest implementation that will make that property pass is to return a single string: object Diamond { def diamond(c: Char) : String = { "A" } } I make the diamondLines function in the test call the new function and split its result into lines: def diamondLines(c : Char) = { Diamond.diamond(c).lines.toVector } The implementation can be used like this: object DiamondApp extends App { import Diamond.diamond println(diamond(args.lift(0).getOrElse("Z").charAt(0))) } A Second Test, But It Is Not Very Helpful I now need to add another property, to more tightly constrain the solution. I notice that the diamond always has an odd number of lines, and decide to test that: For all valid input character, the diamond has an odd number of lines. This implies that the number of lines is greater than zero (because vectors cannot have a negative number of elements and zero is even), so I change the existing test rather than adding another one: "produces an odd number lines" in { forAll (inputChar) { c => assert(isOdd(diamondLines(c).length)) } } def isOdd(n : Int) = n % 2 == 1 But this new test has a problem: my existing solution already passes it. The diamond function returns a single line, and 1 is an odd number. This choice of property is not helping drive the development forwards. A Failing Test To Drive Development, But a Silly Mistake The next simplest property I can think of is the number of lines of the diamond. If ‘ord(c)’ is the number of letters between ‘A’ and c, (zero for A, 1 for B, 2 for C, etc.) then: For all valid input characters, c, the number of lines in a diamond for c is 2*ord(c)+1. At this point I make a silly mistake. I write my property as: "number of lines" in { forAll (inputChar) { c => assert(diamondLines(c).length == ord(c)+1) } } def ord(c: Char) : Int = c - 'A' I don’t notice the mistake immediately. When I do, I decide to leave it in the code as an experiment to see if the property tests will detect the error by becoming inconsistent, and how long it will take before they do so. This kind of mistake would easily be caught by an example test. It’s a good idea to have a few examples, as well as properties, to act as smoke tests. I make the test pass with the smallest amount of production code possible. I move the ord function from the test into the production code and use it to return the required number of lines that are all the same. def diamond(c: Char) : String = { "A\n" * (ord(c)+1) } def ord(c: Char) : Int = c - 'A' Despite sharing the ord function between the test and production code, there’s still some duplication. Both the production and test code calculate ord(c)+1. I want to address that before writing the next test. Refactor: Duplicated Calculation I replace ord(c)+1 with lineCount(c), which calculates number of lines generated for an input letter, and inline the ord(c) function, because it’s now only used in one place. object Diamond { def diamond(c: Char) : String = { "A\n" * lineCount(c) } def lineCount(c: Char) : Int = (c - 'A')+1 } And I use lineCount in the test as well: "number of lines" in { forAll (inputChar) { c => assert(diamondLines(c).length == lineCount(c)) } } On reflection, using the lineCount calculation from production code in the test feels like a mistake. Squareness The next property I add is: For all valid input character, the text containing the diamond is square Where “is square” means: The length of each line is equal to the total number of lines In Scala, this is: "squareness" in { forAll (inputChar) { c => assert(diamondLines(c) forall {_.length == lineCount(c)}) } } I can make the test pass like this: object Diamond { def diamond(c: Char) : String = { val side: Int = lineCount(c) ("A" * side + "\n") * side } def lineCount(c: Char) : Int = (c - 'A')+1 } Refactor: Rename the lineCount Function The lineCount is also being used to calculate the length of each line, so I rename it to squareSide. object Diamond { def diamond(c: Char) : String = { val side: Int = squareSide(c) ("A" * side + "\n") * side } def squareSide(c: Char) : Int = (c - 'A')+1 } Refactor: Clarify the Tests I’m now a little dissatisfied with the way the tests read: "number of lines" in { forAll (inputChar) { c => assert(diamondLines(c).length == squareSide(c)) } } "squareness" in { forAll (inputChar) { c => assert(diamondLines(c) forall {_.length == squareSide(c)}) } } The “squareness” property does not stand alone. It doesn’t communicate that the output is square unless combined with “number of lines” property. I refactor the test to disentangle the two properties: "squareness" in { forAll (inputChar) { c => val lines = diamondLines(c) assert(lines forall {line => line.length == lines.length}) } } "size of square" in { forAll (inputChar) { c => assert(diamondLines(c).length == squareSide(c)) } } The Letter on Each Line The next property I write specifies which characters are printed on each line. The characters of each line should be either a letter that depends on the index of the line, or a space. Because the diamond is vertically symmetrical, I only need to consider the lines from the top to the middle of the diamond. This makes the calculation of the letter for each line much simpler. I make a note to add a property for the vertical symmetry once I have made the implementation pass this test. "single letter per line" in { forAll (inputChar) { c => val allLines = diamondLines(c) val topHalf = allLines.slice(0, allLines.size/2 + 1) for ((line, index) <- topHalf.zipWithIndex) { val lettersInLine = line.toCharArray.toSet diff Set(' ') val expectedOnlyLetter = ('A' + index).toChar assert(lettersInLine == Set(expectedOnlyLetter), "line " + index + ": \"" + line + "\"") } } } To make this test pass, I change the diamond function to: def diamond(c: Char) : String = { val side: Int = squareSide(c) (for (lc <- 'A' to c) yield lc.toString * side) mkString "\n" } This repeats the correct letter for the top half of the diamond, but the bottom half of the diamond is wrong. This will be fixed by the property for vertical symmetry, which I’ve noted down to write next. Vertical Symmetry The property for vertical symmetry is: For all input character, c, the lines from the top to the middle of the diamond, inclusive, are equal to the reversed lines from the middle to the bottom of the diamond, inclusive. "is vertically symmetrical" in { forAll(inputChar) { c => val allLines = diamondLines(c) val topHalf = allLines.slice(0, allLines.size / 2 + 1) val bottomHalf = allLines.slice(allLines.size / 2, allLines.size) assert(topHalf == bottomHalf.reverse) } } The implementation is: def diamond(c: Char) : String = { val side: Int = squareSide(c) val topHalf = for (lc <- 'A' to c) yield lineFor(side, lc) val bottomHalf = topHalf.slice(0, topHalf.length-1).reverse (topHalf ++ bottomHalf).mkString("\n") } But this fails the “squareness” and “size of square” tests! My properties are now inconsistent. The test suite has detected the erroneous implementation of the squareSide function. The correct implementation of squareSide is: def squareSide(c: Char) : Int = 2*(c - 'A') + 1 With this change, the implementation passes all of the tests. The Position Of The Letter In Each Line Now I add a property that specifies the position and value of the letter in each line, and that all other characters in a line are spaces. Like the previous test, I can rely on symmetry in the output to simplify the arithmetic. This time, because the diamond has horizontal symmetry, I only need specify the position of the letter in the first half of the line. I add a specification for horizontal symmetry, and factor out generic functions to return the first and second half of strings and sequences. "is vertically symmetrical" in { forAll (inputChar) { c => val lines = diamondLines(c) assert(firstHalfOf(lines) == secondHalfOf(lines).reverse) } } "is horizontally symmetrical" in { forAll (inputChar) { c => for ((line, index) <- diamondLines(c).zipWithIndex) { assert(firstHalfOf(line) == secondHalfOf(line).reverse, "line " + index + " should be symmetrical") } } } "position of letter in line of spaces" in { forAll (inputChar) { c => for ((line, lineIndex) <- firstHalfOf(diamondLines(c)).zipWithIndex) { val firstHalf = firstHalfOf(line) val expectedLetter = ('A'+lineIndex).toChar val letterIndex = firstHalf.length - (lineIndex + 1) assert (firstHalf(letterIndex) == expectedLetter, firstHalf) assert (firstHalf.count(_==' ') == firstHalf.length-1, "number of spaces in line " + lineIndex + ": " + line) } } } def firstHalfOf[AS, A, That](v: AS)(implicit asSeq: AS => Seq[A], cbf: CanBuildFrom[AS, A, That]) = { v.slice(0, (v.length+1)/2) } def secondHalfOf[AS, A, That](v: AS)(implicit asSeq: AS => Seq[A], cbf: CanBuildFrom[AS, A, That]) = { v.slice(v.length/2, v.length) } The implementation is: object Diamond { def diamond(c: Char) : String = { val side: Int = squareSide(c) val topHalf = for (letter <- 'A' to c) yield lineFor(side, letter) (topHalf ++ topHalf.reverse.tail).mkString("\n") } def lineFor(length: Int, letter: Char): String = { val halfLength = length/2 val letterIndex = halfLength - ord(letter) val halfLine = " "*letterIndex + letter + " "*(halfLength-letterIndex) halfLine ++ halfLine.reverse.tail } def squareSide(c: Char) : Int = 2*ord(c) + 1 def ord(c: Char): Int = c - 'A' } It turns out the ord function, which I inlined into squareSide a while ago, is needed after all. The implementation is now complete. Running the DiamondApp application prints out diamonds. But there’s plenty of scope for refactoring both the production and test code. Refactoring: Delete the “Single Letter Per Line” Property The “position of letter in line of spaces” property makes the “single letter per line” property superflous, so I delete “single letter per line”. Refactoring: Simplify the Diamond Implementation I rename some parameters and simplify the implementation of the diamond function. object Diamond { def diamond(maxLetter: Char) : String = { val topHalf = for (letter <- 'A' to maxLetter) yield lineFor(maxLetter, letter) (topHalf ++ topHalf.reverse.tail).mkString("\n") } def lineFor(maxLetter: Char, letter: Char): String = { val halfLength = ord(maxLetter) val letterIndex = halfLength - ord(letter) val halfLine = " "*letterIndex + letter + " "*(halfLength-letterIndex) halfLine ++ halfLine.reverse.tail } def squareSide(c: Char) : Int = 2*ord(c) + 1 def ord(c: Char): Int = c - 'A' } The implementation no longer uses the squareSide function. It’s only used by the “size of square” property. Refactoring: Inline the squareSide function I inline the squareSide function into the test. "size of square" in { forAll (inputChar) { c => assert(diamondLines(c).length == 2*ord(c) + 1) } } I believe the erroneous calculation would have been easier to notice if I had done this from the start. Refactoring: Common Implementation of Symmetry There’s one last bit of duplication in the implementation. The expressions that create the horizontal and vertical symmetry of the diamond can be replaced with calls to a generic function. I’ll leave that as an exercise for the reader… Complete Tests and Implementation Tests: import Diamond.ord import org.scalacheck._ import scala.collection.generic.CanBuildFrom class DiamondSpec extends UnitSpec { val inputChar = Gen.alphaUpperChar "squareness" in { forAll (inputChar) { c => val lines = diamondLines(c) assert(lines forall {line => line.length == lines.length}) } } "size of square" in { forAll (inputChar) { c => assert(diamondLines(c).length == 2*ord(c) + 1) } } "is vertically symmetrical" in { forAll (inputChar) { c => val lines = diamondLines(c) assert(firstHalfOf(lines) == secondHalfOf(lines).reverse) } } "is horizontally symmetrical" in { forAll (inputChar) { c => for ((line, index) <- diamondLines(c).zipWithIndex) { assert(firstHalfOf(line) == secondHalfOf(line).reverse, "line " + index + " should be symmetrical") } } } "position of letter in line of spaces" in { forAll (inputChar) { c => for ((line, lineIndex) <- firstHalfOf(diamondLines(c)).zipWithIndex) { val firstHalf = firstHalfOf(line) val expectedLetter = ('A'+lineIndex).toChar val letterIndex = firstHalf.length - (lineIndex + 1) assert (firstHalf(letterIndex) == expectedLetter, firstHalf) assert (firstHalf.count(_==' ') == firstHalf.length-1, "number of spaces in line " + lineIndex + ": " + line) } } } def firstHalfOf[AS, A, That](v: AS)(implicit asSeq: AS => Seq[A], cbf: CanBuildFrom[AS, A, That]) = { v.slice(0, (v.length+1)/2) } def secondHalfOf[AS, A, That](v: AS)(implicit asSeq: AS => Seq[A], cbf: CanBuildFrom[AS, A, That]) = { v.slice(v.length/2, v.length) } def diamondLines(c : Char) = { Diamond.diamond(c).lines.toVector } } Implementation: object Diamond { def diamond(maxLetter: Char) : String = { val topHalf = for (letter <- 'A' to maxLetter) yield lineFor(maxLetter, letter) (topHalf ++ topHalf.reverse.tail).mkString("\n") } def lineFor(maxLetter: Char, letter: Char): String = { val halfLength = ord(maxLetter) val letterIndex = halfLength - ord(letter) val halfLine = " "*letterIndex + letter + " "*(halfLength-letterIndex) halfLine ++ halfLine.reverse.tail } def ord(c: Char): Int = c - 'A' } Conclusions In his article, “Thinking Before Programming”, Alastair Cockburn writes: The advantage of the Dijkstra-Gries approach is the simplicity of the solutions produced. The advantage of TDD is modern fine-grained incremental development. … Can we combine the two? I think property-based tests in the TDD process combined the two quite successfully in this exercise. I could record my half-formed thoughts about the problem and solution as generators and properties while using “modern fine-grained incremental development” to tighten up the properties and grow the code that met them. In Seb’s original article, he writes that when working from examples… it’s easy enough to get [the tests for ‘A’ and ‘B’] to pass by hardcoding the result. Then we move on to the letter ‘C’. The code is now screaming for us to refactor it, but to keep all the tests passing most people try to solve the entire problem at once. That’s hard, because we’ll need to cope with multiple lines, varying indentation, and repeated characters with a varying number of spaces between them. I didn’t encounter this problem when driving the implementation with properties. Adding a new property always required an incremental improvement to the implementation to get the tests passing again. Neither did I need to write throw-away tests for behaviour that was not actually desired of the final implementation, as Seb did with his “test recycling” approach. Every property I added applied to the complete solution. I only deleted properties that were implied by properties I added later, and so had become unnecessary duplication. I took the approach of starting from very generic properties and incrementally adding more specific properties as I refine the implementation. Generic properties were easy to come up with, and helped me make progress in the problem. The suite of properties reinforced one another, testing the tests, and detected the mistake I made in one property that caused it to be inconsistent with the rest. I didn’t know Scala, ScalaTest or ScalaCheck well. Now I’ve learned them better I wish I had written a minimisation strategy for the input character. This would have made test failure messages easier to understand. I also didn’t address what the diamond function would do with input outside the range of ‘A’ to ‘Z’. Scala doesn’t let one define a subtype of Char, so I can’t enforce the input constraint in the type system. I guess the Scala way would be to define diamond as a PartialFunction[Char,String]. Further Thoughts Thoughts on duplication and tests as documentation Thoughts on property-based tests and iterative/incremental development Other Solutions Mark Seeman has approached Diamond Kata with property-based tests, using F# and FsCheck. Solutions to the Diamond Kata using exmaple-based tests include: Seb Rose: Recycling Tests in TDD Alastair Cockburn: Thinking Before Programming Seb Rose: Diamond recycling (and painting yourself into a corner) Ron Jeffries: a detailed walkthrough of his solution George Dinwiddie: Another Approach to the Diamond Kata Ivan Sanchez: A walkthrough of his Clojure solution. Jon Jagger: print “squashed-circle” diamond Sandro Mancuso: A Java solution on GitHub Krzysztof Jelski: A Python solution on GitHub Philip Schwarz: A Clojure solution on GitHub
Categories: Blogs

The Deliberate Practice of Being Agile

Illustrated Agile - Len Lagestee - Fri, 02/20/2015 - 00:55

In Malcolm Gladwell’s book, Outliers, he discusses the need to deliberately practice for 10,000 hours before becoming an expert in a chosen endeavor. Ten. Thousand. Hours. If all you did for the next year was practice something non-stop, without sleep, you would be about 90% of the way there. There is however, a raging debate […]

The post The Deliberate Practice of Being Agile appeared first on Illustrated Agile.

Categories: Blogs

AgileCraft Gets $10 Million in Funding

Scrum Expert - Thu, 02/19/2015 - 19:28
AgileCraft, a vendor of software for scaling agile to the enterprise, has announced that it raised $10 million in Series A financing led by private equity firm Crane Nelson. The round included participation from former Bank of America senior executives Jim Kelly and Tim Arnoult. The investment will accelerate the company’s growth, geographic expansion and product innovation. Built from the top-down to support agile at scale, the AgileCraft platform helps align entire organizations around common goals and objectives while providing actionable views and metrics to everyone touching the technology product lifecycle. AgileCraft ...
Categories: Communities

Reliable database tests with Respawn

Jimmy Bogard - Thu, 02/19/2015 - 19:21

Creating reliable tests that exercise the database can be a tricky beast to tame. There are many different sub-par strategies for doing so, and most of the documented methods talk about resetting the database at teardown, either using rolled back transactions or table truncation.

I’m not a fan of either of these methods – for truly reliable tests, the fixture must have a known starting point at the start of the test, not be relying on something to clean up after itself. When a test fails, I want to be able to examine the data during or after the test run.

That’s why I created Respawn, a small tool to reset the database back to its clean beginning. Instead of using transaction rollbacks, database restores or table truncations, Respawn intelligently navigates the schema metadata to build out a static, correct order in which to clear out data from your test database, at fixture setup instead of teardown.

Respawn is available on NuGet, and can work with SQL Server or Postgres (or any ANSI-compatible database that supports INFORMATION_SCHEMA views correctly).

You create a checkpoint:

private static Checkpoint checkpoint = new Checkpoint
    TablesToIgnore = new[]
    SchemasToExclude = new []

You can supply tables to ignore and schemas to exclude for tables you don’t want cleared out. In your test fixture setup, reset your checkpoint:


Or if you’re using a database besides SQL Server, you can pass in an open DbConnection:

using (var conn = new NpgsqlConnection("ConnectionStringName"))

    var checkpoint = new Checkpoint {
        SchemasToInclude = new[]
        DbAdapter = DbAdapter.Postgres


Because Respawn stores the correct SQL in the right order to clear your tables, you don’t need to maintain a list of tables to delete or recalculate on every checkpoint reset. And since table truncation won’t work with tables that include foreign key constraints, DELETE will be faster than table truncation for test databases.

We’ve used this method at Headspring for the last six years or so, battle tested on a dozen projects we’ve put into production.

Stop worrying about unreliable database tests – respawn at the starting point instead!

Post Footer automatically generated by Add Post Footer Plugin for wordpress.

Categories: Blogs

Big Apple Scrum Day 2015 Call for Proposals

Scrum Expert - Thu, 02/19/2015 - 16:37
The Big Apple Scrum Day 2015 is a one-day conference brought to you by NYC Scrum User group that will take place in June. As a community conference, the Big Apple Scrum Day wants to encourage proposals from local Scrum and Agile practitioners on the topics of: * Scrum for Newbies * Living and Breathing Scrum * Beyond Scrum * Scrum/Agile in non-IT areasThe deadline to submit a proposal is February 28, 2015. Visit!speakers/ctpg for more information
Categories: Communities

Speak at RallyON 2015

Rally Agile Blog - Thu, 02/19/2015 - 15:00

Continuous improvement may be a fundamental tenet of Agile and Lean disciplines, but making real change in organizations is hard. If you’ve ever read a book, seen a talk, or had a conversation that changed your thinking or behavior, then you know that meaningful change often starts with someone else’s experience and advice.

That’s why we’re inviting you to share yours as a RallyON 2015 speaker.

RallyON (happening June 15–17 in Phoenix, Arizona) is our annual industry conference where innovative organizations come to learn, lead, and grow. RallyON shares the best thinking, strategies, and practices to help leaders keep pace with the new speed of business and build organizations that are fast, Lean, and adaptable.

RallyON 2015 Content Submission Guidelines

Be part of the RallyON 2015 agenda by nominating a speaker, proposing a session you’ll present, or suggesting a topic you’d like to see. Here’s what we’re looking for this year:

  • Use cases explaining how you’re building agility across your entire organization
  • Best practices, frameworks, and cutting-edge trends in Agile or Lean methods
  • Success stories around the business impact of agility
  • Interactive, hands-on learning experiences that drive mindset or behavioral change

Your audience will be broad, diverse, and leading-edge. This year we expect the conference to attract more than 500 attendees—from developers to product managers to executives
—representing companies in healthcare, IT, manufacturing, banking, media, and government.

How are you building a faster, smarter, better organization? Share your ideas with us at RallyON 2015. Submit now to our Call for Content.

Rally Software
Categories: Companies

ACE! Conference on Lean and Agile Software Development, Krakow, Poland, March 16–17 2015

Scrum Expert - Thu, 02/19/2015 - 10:04
The ACE! Conference on Lean and Agile Software Development is a two-day conference that brings together in Krakow some of the best-known Agile, Scrum and lean practitioners in Europe and abroad. It is the guaranty of a great networking experience for agile software development practitioners that will come away with new ideas and enthusiasm. In the agenda of the ACE! Conference on Lean and Agile Software Development you can find topics like “Execute a non-reactionary UX strategy”, “Selfish Accessibility”, “Remote User Testing”, “Designing to Learn: Creating Effective MVPs”, “Growing Your Discovery ...
Categories: Communities

The Power of Questions in Artifact Design

Leading Agile - Mike Cottmeyer - Thu, 02/19/2015 - 09:39

QuestionMarkEffective coaches use powerful questions, not to obtain answers but to stimulate thought so insight emerges and knowledge is generated.

Why not leverage this power by designing artifacts explicitly around questions? This will work for either formal artifacts used to capture and persist knowledge, or informal artifacts used to guide a conversation or workshop.

In a recent engagement this became so clear to me that I decided to capture my thoughts on this topic and some examples of changes made along the way to facilitate better outcomes for the particular group in question.

Questions Help Clarify the Intent of a Conversation
  • Getting from an artifact template to valuable knowledge is rarely a straight line, and people tend to simply regurgitate known information
  • Question-driven conversations will help surface latent dissonance and divergence
  • Helps the coach and stakeholders focus on where the knowledge resides and who may provide unique perspectives

Whatever question is asked, your brain goes right to work formulating answers. Use this to your advantage by asking open-ended questions. Since the brain will not be able to formulate an answer, deeper thought will result.

Some things to keep in mind for clarifying intent:

  • Mentally map out the journey and craft questions that serve as a compass rather than turn-by-turn GPS
  • Use open powerful questions
  • Based on the context, be willing to change the question to surface what may be hidden


Before: “Customers”

After: “Who is impacted by your work?”

  • The team in question’s thoughts in terms of the impact they are making rather than identifying “customers”
  • Within the team, refocusing on who they impact opened up the conversation to consider others who had not been viewed as “customers”
  • At the same time, other stakeholders were then identified and a good conversation ensued about the differences between customers and stakeholders
Questions Move the Focus from Results to Dialogue
  • Fields in a template or on a form appear to be “complete” regardless of quality as our minds will focus on filling in blanks, not the actual content
  • Questions can push a group beyond converging on a shared understanding to creating a vibrant, divergent pace for exploration and dialogue
  • Be aware of questions loaded with assumptions, initial results may continue to hide them

Some things to keep in mind for shifting focus from results to dialogue:

  • Wherever you can, replace artifact labels with a question
  • Use questions to discover new possibilities by challenging convention and challenging convergence
  • Review questions for loaded assumptions and either make them explicit or craft the question to remove the assumption


Before: “Customer Outcome”

After: “What stories would your customers tell after experiencing this value?”

  • Interesting conversations emerged about divergent views in the team about the true outcome they hoped to deliver
  • Helped to focus the conversation on the highest leverage dimensions of the outcome
  • Increased the energy of the entire group and even fed back into conversations about options considered
Questions Increase Engagement by Appealing to Credibility, Logic, and Emotion
  • Credibility should be apparent through the questions themselves
  • Logic can be expressed through the mental journey from answering one question to asking and answering the next
  • Emotional appeal can establish a state of receptivity for new ideas, for vigorous conversation, for openness to what is possible

Some things to consider for Ethos, Logos, and Pathos:

  • Use your expertise to craft questions based on principles rather than dogma
  • Design an end-to-end chain of questions that lead the group from surfacing what they do know, don’t know, and assumptions to meaningful dialogue
  • Craft questions that touch on the group sense of identity, self-interest, and passion. Engagement will improve and the resulting dialogue will be rich


Before: “Unique Value Proposition”

After: “Why is this team the best suited to work in this problem / solution space?”

  • Created energy in the team around why they were uniquely positioned to make the most impact
  • Helped a still forming team to rally around their passion and unique identity
  • Vibrant conversation that led directly to articulating a compelling vision for the team in short order
Wrapping Up

You must change the system to change the results – to change the results, change the thinking, to change the thinking, use questions. And use them not just in coaching; use them whenever and wherever you desire to influence the thought process and generate new points of view. The added benefit of not just using them to coach the group and facilitate the conversation, but to design or refine the artifact itself is the powerful combination of having gone through the thought process in the workshop and having the prompts persisted for reinforcement and continued refinement in the future.

Here are two partial iterations – of probably 5 total that I observed – of a template for facilitating a program level strategic conversation. However, the most impressive version I encountered was observing a new group take the most current template, and then update it even further to match how they needed to think through their particular strategy. That indicated to me that they embraced not just a template but an adaptive mental model for how to effectively frame the meaningful conversation they required.

Partial Canvas of an Early Version Partial Canvas of a Later Version

The post The Power of Questions in Artifact Design appeared first on LeadingAgile.

Categories: Blogs

Python’s pandas vs Neo4j’s cypher: Exploring popular phrases in How I met your mother transcripts

Mark Needham - Thu, 02/19/2015 - 02:52

I’ve previously written about extracting TF/IDF scores for phrases in documents using scikit-learn and the final step in that post involved writing the words into a CSV file for analysis later on.

I wasn’t sure what the most appropriate tool of choice for that analysis was so I decided to explore the data using Python’s pandas library and load it into Neo4j and write some Cypher queries.

To do anything with Neo4j we need to first load the CSV file into the database. The easiest way to do that is with Cypher’s LOAD CSV command.

First we’ll load the phrases in and then we’ll connect them to the episodes which were previously loaded:

LOAD CSV WITH HEADERS FROM "file:///Users/markneedham/projects/neo4j-himym/data/import/tfidf_scikit.csv" AS row
MERGE (phrase:Phrase {value: row.Phrase});
LOAD CSV WITH HEADERS FROM "file:///Users/markneedham/projects/neo4j-himym/data/import/tfidf_scikit.csv" AS row
MATCH (phrase:Phrase {value: row.Phrase})
MATCH (episode:Episode {id: TOINT(row.EpisodeId)})
MERGE (phrase)-[:USED_IN_EPISODE {tfidfScore: TOFLOAT(row.Score)}]->(episode);

Now we’re ready to start writing some queries. To start with we’ll write a simple query to find the top 3 phrases for each episode.

In pandas this is quite easy – we just need to group by the appropriate field and then take the top 3 records in that grouping:

top_words_by_episode = df \
    .sort(["EpisodeId", "Score"], ascending = [True, False]) \
    .groupby(["EpisodeId"], sort = False) \
>>> print(top_words_by_episode.to_string())
        EpisodeId              Phrase     Score
3976            1                 ted  0.262518
2912            1              olives  0.195714
2441            1            marshall  0.155515
8143            2                 ted  0.292184
5197            2              carlos  0.227454
7482            2               robin  0.195150
12551           3                 ted  0.232662
9040            3              barney  0.187255
11254           3              mcneil  0.170619
15641           4             natalie  0.562485
16763           4                 ted  0.191873
16234           4               robin  0.102671
20715           5            subtitle  0.310866
18121           5          coat check  0.181682
20861           5                 ted  0.169973

The cypher version looks quite similar, the main difference being that we use the COLLECT to generate an array of phrases by episode and then take the top 3:

MATCH (e:Episode)<-[rel:USED_IN_EPISODE]-(phrase)
WITH e, rel, phrase
ORDER BY, rel.tfidfScore DESC
RETURN, e.title, COLLECT({phrase: phrase.value, score: rel.tfidfScore})[..3]
==> +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
==> | | e.title                                     | COLLECT({phrase: phrase.value, score: rel.tfidfScore})[..3]                                                                                                               |
==> +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
==> | 1    | "Pilot"                                     | [{phrase -> "ted", score -> 0.2625177493269755},{phrase -> "olives", score -> 0.19571419072701732},{phrase -> "marshall", score -> 0.15551468983363487}]                  |
==> | 2    | "Purple Giraffe"                            | [{phrase -> "ted", score -> 0.292184496766088},{phrase -> "carlos", score -> 0.22745438090499026},{phrase -> "robin", score -> 0.19514993122773566}]                      |
==> | 3    | "Sweet Taste of Liberty"                    | [{phrase -> "ted", score -> 0.23266190616714866},{phrase -> "barney", score -> 0.18725456678444408},{phrase -> "officer mcneil", score -> 0.17061872221616137}]           |
==> | 4    | "Return of the Shirt"                       | [{phrase -> "natalie", score -> 0.5624848345525686},{phrase -> "ted", score -> 0.19187323894701674},{phrase -> "robin", score -> 0.10267067360622682}]                    |
==> | 5    | "Okay Awesome"                              | [{phrase -> "subtitle", score -> 0.310865508347106},{phrase -> "coat check", score -> 0.18168178787561182},{phrase -> "ted", score -> 0.16997258596683185}]               |
==> | 6    | "Slutty Pumpkin"                            | [{phrase -> "mike", score -> 0.2966610054610693},{phrase -> "ted", score -> 0.19333276951599407},{phrase -> "robin", score -> 0.1656172994411056}]                        |
==> | 7    | "Matchmaker"                                | [{phrase -> "ellen", score -> 0.4947912795578686},{phrase -> "sarah", score -> 0.24462913913669443},{phrase -> "ted", score -> 0.23728319597607636}]                      |
==> | 8    | "The Duel"                                  | [{phrase -> "ted", score -> 0.26713931416222847},{phrase -> "marshall", score -> 0.22816702335751904},{phrase -> "swords", score -> 0.17841675237702592}]                 |
==> | 9    | "Belly Full of Turkey"                      | [{phrase -> "ericksen", score -> 0.43145756691027665},{phrase -> "mrs ericksen", score -> 0.1939318283559959},{phrase -> "kendall", score -> 0.1846969793866628}]         |
==> | 10   | "The Pineapple Incident"                    | [{phrase -> "ted", score -> 0.439756993033922},{phrase -> "trudy", score -> 0.36367907631894536},{phrase -> "carl", score -> 0.16413071244131686}]                        |
==> | 11   | "The Limo"                                  | [{phrase -> "moby", score -> 0.48314164479037003},{phrase -> "party number", score -> 0.30458929780262456},{phrase -> "ranjit", score -> 0.1991061739767796}]             |
==> +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

In the cypher version we get one row per episode whereas with the Python version we get 3 rows. It might be possible to achieve this effect with pandas too but I wasn’t sure how to do so.

Next let’s find the top phrases for a single episode – the type of query that might be part of an episode page on a How I met your mother wiki:

top_words = df[(df["EpisodeId"] == 1)] \
    .sort(["Score"], ascending = False) \
>>> print(top_words.to_string())
      EpisodeId                Phrase     Score
3976          1                   ted  0.262518
2912          1                olives  0.195714
2441          1              marshall  0.155515
4732          1               yasmine  0.152279
3347          1                 robin  0.130418
209           1                barney  0.124412
2146          1                  lily  0.122925
3637          1                signal  0.103793
1366          1                goanna  0.098138
3524          1                 scene  0.095342
710           1                   cut  0.091734
2720          1              narrator  0.086462
1147          1             flashback  0.078296
1148          1        flashback date  0.070283
3224          1                ranjit  0.069393
4178          1           ted yasmine  0.058569
1149          1  flashback date robin  0.058569
525           1                  carl  0.058210
3714          1           smurf pen1s  0.054365
2048          1              lebanese  0.054365
MATCH (e:Episode {title: "Pilot"})<-[rel:USED_IN_EPISODE]-(phrase)
WITH phrase, rel
ORDER BY rel.tfidfScore DESC
RETURN phrase.value AS phrase, rel.tfidfScore AS score
==> +-----------------------------------------------+
==> | phrase                 | score                |
==> +-----------------------------------------------+
==> | "ted"                  | 0.2625177493269755   |
==> | "olives"               | 0.19571419072701732  |
==> | "marshall"             | 0.15551468983363487  |
==> | "yasmine"              | 0.15227880637176266  |
==> | "robin"                | 0.1304175242341549   |
==> | "barney"               | 0.12441175186690791  |
==> | "lily"                 | 0.12292497785945679  |
==> | "signal"               | 0.1037932464656365   |
==> | "goanna"               | 0.09813798750091524  |
==> | "scene"                | 0.09534236041231685  |
==> | "cut"                  | 0.09173366535740156  |
==> | "narrator"             | 0.08646229819848741  |
==> | "flashback"            | 0.07829592155397117  |
==> | "flashback date"       | 0.07028252601773662  |
==> | "ranjit"               | 0.06939276915589167  |
==> | "ted yasmine"          | 0.05856877168144719  |
==> | "flashback date robin" | 0.05856877168144719  |
==> | "carl"                 | 0.058210117288760355 |
==> | "smurf pen1s"          | 0.05436505297972703  |
==> | "lebanese"             | 0.05436505297972703  |
==> +-----------------------------------------------+

Our next query is a negation – find the episodes which don’t mention the phrase ‘robin’. In python we can do some simple set operations to work this out:

all_episodes = set(range(1, 209))
robin_episodes = set(df[(df["Phrase"] == "robin")]["EpisodeId"])
>>> print(set(all_episodes) - set(robin_episodes))
set([145, 198, 143])

In cypher land a query will suffice:

MATCH (episode:Episode), (phrase:Phrase {value: "robin"})
WHERE NOT (episode)<-[:USED_IN_EPISODE]-(phrase)
RETURN AS id, episode.season AS season, episode.number AS episode

And finally a mini recommendation engine type query – how many of the top phrases in Episode 1 were used in other episodes:

First python:

phrases_used = set(df[(df["EpisodeId"] == 1)] \
    .sort(["Score"], ascending = False) \
phrases = df[df["Phrase"].isin(phrases_used)]
print (phrases[phrases["EpisodeId"] != 1] \
    .groupby(["Phrase"]) \
    .size() \
    .order(ascending = False))

Here we’ve pulled it out into a few steps – first we identify the top phrases, then we find out where they occur across the whole data set and finally we filter out the occurrences in the first episode and count the other occurrences.

marshall    207
barney      207
ted         206
lily        206
robin       204
scene        36
signal        4
goanna        3
olives        1

In cypher we can write a query to do this as well:

MATCH (episode:Episode {title: "Pilot"})<-[rel:USED_IN_EPISODE]-(phrase)
WITH phrase, rel, episode
ORDER BY rel.tfidfScore DESC
MATCH (phrase)-[:USED_IN_EPISODE]->(otherEpisode)
WHERE otherEpisode <> episode
RETURN phrase.value AS phrase, COUNT(*) AS numberOfOtherEpisodes
ORDER BY numberOfOtherEpisodes DESC
==> +------------------------------------+
==> | phrase     | numberOfOtherEpisodes |
==> +------------------------------------+
==> | "barney"   | 207                   |
==> | "marshall" | 207                   |
==> | "ted"      | 206                   |
==> | "lily"     | 206                   |
==> | "robin"    | 204                   |
==> | "scene"    | 36                    |
==> | "signal"   | 4                     |
==> | "goanna"   | 3                     |
==> | "olives"   | 1                     |
==> +------------------------------------+

Overall there’s not much in it – for some of the queries I found it easier in cypher and for others easier with pandas. It’s always useful to have multiple tools in the toolbox!

Categories: Blogs

Diamond Kata - Thoughts on Incremental Development

Mistaeks I Hav Made - Nat Pryce - Thu, 02/19/2015 - 01:35
Some more thoughts on my experience doing the Diamond Kata with property-based tests… When test-driving development with example-based tests, an essential skill to be learned is how to pick each example to be most helpful in driving development forwards in small steps. You want to avoid picking examples that force you to take too big a step (A.K.A. “now draw the rest of the owl”). Conversely, you don’t want to get sidetracked into a boring morass of degenerate cases and error handling when you’ve not yet addressed the core of the problem to be solved. Property-based tests are similar: the skill is in picking the right next property to let you make useful progress in small steps. But the progress from nothing to solution is different. Doing TDD with example-based tests, I’ll start with an arbitrary, specific example (arbitrary but carefully chosen to help me make useful progress), and write a specific implementation to support just that one example. I’ll add more examples to “triangulate” the property I want to implement, and generalise the implementation to pass the tests. I continue adding examples and triangulating, and generalising the implementation until I have extensive coverage and a general implementation that meets the required properties. Example TDD progress Doing TDD with property-based tests, I’ll start with a very general property, and write a specific but arbitrary implementation that meets the property (arbitrary but carefully chosen to help me make useful progress). I’ll add more specific properties, which force me to generalise the the implementation to make it meet all the properties. The properties also double-check one another (testing the tests, so to speak). I continue adding properties and generalising the implementation until I have a general implementation that meets the required properties. Property TDD progress I find that property-based tests let me work more easily in larger steps than when testing individual examples. I am able to go longer without breaking the problem down to avoid duplication and boilerplate in my tests, because the property-based tests have less scope for duplication and boilerplate. For example, if solving the Diamond Kata with example-based tests, my reaction to the “now draw the rest of the owl” problem that Seb identified would be to move the implementation towards a compositional design so that I could define the overall behaviour declaratively and not need to write integration tests for the whole thing. For example, when I reached the test for “C”, I might break the problem down into two parts. Firstly, a higher-order mirroredLines function that is passed a function to generate each line, with the type (in Haskell syntax): mirroredLines :: (Char -> Char -> String) -> Char -> String I would test-drive mirroredLines with a stub function to generate fake lines, such as: let fakeLine ch maxCh = "line for " ++ [ch] Then, I would write a diamondLine function, that calculates the actual lines of the diamond. And declare the diamond function by currying: let diamond = mirroredLines diamondLine I wouldn’t feel the need to write tests for the diamond function given adequate tests of mirroredLines and diamondLine.
Categories: Blogs

Please Help Me Title Essays on Estimation

Johanna Rothman - Wed, 02/18/2015 - 23:40

I have finished the content for Essays on Estimation. But, I need a new title. The book is more than loosely coupled essays. It reads like a real book, with progression and everything.

I have a number of ideas. They are (in no particular order):

  1. Predicting the Unpredictable: Essays on Estimating Project Costs and Dates
  2. Essays on Estimation: Pragmatic Approaches for Estimating Cost and Schedule
  3. How Much Will This Project Cost or When Will it be Done? Essays on Estimation
  4. Essays on Estimation: How to Predict When Your Project Will be Done
  5. Pragmatic Estimation: How to Create and Update Schedule or Cost Estimates
  6. Practical Approaches to Estimation: Create and Update Your Prediction Without Losing Your Hair
  7. Essays on Estimation: Practical Approaches for Schedule and Cost Estimates

Do you like any of these ideas? Have a better idea? I would like a title that explains what’s in the book.

I numbered these so you could respond easily in the comments with the number, if you like. Or, you can type out the entire title or a new title. I am open to ideas.

Thank you in advance.

Categories: Blogs

Benefits of Pair Programming Revisited

Powers of Two - Rob Myers - Wed, 02/18/2015 - 22:55

Let's take a fresh look at pair programming, with an eye towards systems optimization.

Pair programming is perhaps the most controversial of the many agile engineering practices. It appears inefficient on the surface, and may often measure as such based on code output (the infamous KLOC metric) or number of coding tasks completed. But dismissing it without exploring the impact to your overall value-stream could actually slow down throughput.  We'll take a look at the benefits of pair programming--some subtle, some sublime--so you are well-equipped to evaluate the impact.

Pair programming is the practice of having two people collaborating simultaneously on a coding task. The pair can be physically co-located, as in the picture below, or they can use some form of screen-sharing and video-chat software.  The person currently using the keyboard and mouse is called the "Driver" and the other person is the "Navigator."  I say "currently" because people will often switch roles during a pairing session.

Low-stress, co-located pair programming looks like this:  Neither Driver nor Navigator has to lean sideways to see the screen, or to type on the keyboard. The font is large enough so both can read the code comfortably. We're not sitting so close that our chairs collide, but not so far that we need to mirror the displays.Misconceptions
There are many misconceptions about pair programming, leading people to conclude that "it's two doing the work of one." Here are a few of the more common misapprehensions...Navigator as ObserverThe Navigator is not watching over your shoulder, per se.

The Navigator is an active participant. She discusses design and implementation options where necessary; keeps the overall task, and the bigger design picture, in mind; manages the short list of emerging sub-tasks; selects the next step, or test, or sub-task to work on; and catches many things that the compiler may not catch. (If there is a compiler!)  There isn't really time for boredom.

If she is literally looking over your shoulder, then you're not at a workstation suitable for pairing. Cubes are generally bad for pairing, because the desks are small or curved inwards. Co-located pairing is side-by-side, at a table that has a straight edge or is curved outwards.

Often only one wireless keyboard and one wireless mouse are used.  Wireless devices make switching Drivers much easier.

Unless the pair is not co-located, one screen for the code is sufficient. Other monitors can be used for research, test results, whatever. You may want to avoid having the same image displayed on two adjacent screens.  It may seem easier at first, but eventually one of you will gesture at the screen and the other will have to lean over to see the target of the gesture.
Pairing as Just Sitting TogetherPairing is not two people working on two separate files, even if one file contains the unit tests and the other contains the implementation code.  Both people agree on the intent of the next test, and on the shape of the implementation.  They are collaborating, and sharing critical information with each other.

The Navigator may occasionally turn to use a second, nearby computer to do some related research (e.g., the exact syntax for the needed regular expression). This is always in response to the ongoing conversation and task. It is not "oh, look, I got e-mail from Dr. Oz again...!"
Navigator as AdvisorPair programming is not a master/apprentice relationship. It's the meeting of two professionals to collaborate on a programming task.  They both bring different talents and experience to the task. Both people tend to learn a lot, because everyone has various levels of experience with various tools and technologies.

In 2003 I was tech lead and XP coach on a growing team.  We had just hired an excellent candidate nearly fresh out of college. He and I sat down to pair on the next programming task.  I described how we were planning to redo the XML-emitting Java code to meet the schema of our biggest client, instead of supporting our long-lived internal schema.  I explained that we expected to have to change quite a bit of code, and perhaps the database schema as well, and that we'd be scheduling it in bits and pieces over the upcoming months.  I reassured him that we had plenty of test coverage to feel confident in our drastic overhaul.

He frowned, and said, "Why not just run the data through an XSLT transformation?!"  (XSLT is a rich programming language written as XML, designed for such transformations. Until this point, I hadn't given it much consideration.)

He saved us months of rework! To my delight, I learned a new technology (new for me anyway). My contribution to the efforts was to show him how we could use JUnit to test-drive the XSLT transformations.  Both parties learned a great deal from each other.

In software development, there are no "juniors" or "seniors," just people with varying degrees of knowledge and experience with a wide variety of technologies and techniques.

Systemic BenefitsFewer DefectsThis is the most-cited benefit of pair programming.  It's relatively easy to measure over time.

In my own experience, it's not clear that this is the main benefit.  I've always combined pair programming with TDD, and TDD catches a broad variety of defects very quickly.  In that productive but scientifically uncontrolled environment, measuring defects caught by pairing becomes much more difficult.

But this is where systems thinking comes in:  Pair programming reduces rework, allowing a task or story that is declared "done" to be done, without having to revisit the code excessively in the future.  Pair programming may be slower, the way quality craftsmanship always appears slower:  The code remains as valuable in the future.

The benefits that follow reflect this.  Pair programming is an investment.
Better DesignI've noticed that even the most experienced developer, when left to himself, will on occasion write code that only one person can quickly understand:  Himself.  And often even he won't understand it months from now.

But if two people write the code together, it instantly breaks that lone-wolf barrier, resulting in code that is understandable to, and changeable by, many.
Continuous Code ReviewBecause most (around 80%) of defects and design issues are caught and fixed immediately with pair programming, the need for code reviews diminishes.
Many times I've seen this nasty cycle:
All code must be reviewed by the Architect. The Architect is overburdened with code reviews.  The Architect rubber-stamps your changes in order to keep up with demand for code reviews. Defects slip past the code-review gate and make their way into production. All code must be reviewed by the Architect. This shows up in the value-stream (or your Kanban wall) as an overloaded code-review queue.

Also, if the Architect does catch a mistake, the task is sent back to the developers for repair and rework. This shows up in the value-stream as a loop backwards from code-review to development. And rework is waste.  The longer the delay between the time a defect is introduced and the time it is fixed, the greater the waste.

From a Lean, systems, or Theory of Constraints standpoint, the removal of a gated activity (the code review) in favor of a parallel or collaborative activity (pair programming) at the constraint (the most overburdened queue) may improve throughput.
Enhanced Development Skills The educational value of pair programming is immense, ongoing, and narrowed in scope to actual tasks that the team encounters.

An individual who encounters a challenging technological hurdle may assume he knows the right thing to do when he doesn't, or spend a great deal of time in detailed research, or succumb to feelings of inadequacy and try to find a circuitous, face-saving route around the hurdle.

When a pair encounters a hurdle that neither has seen before, they know immediately that it's a significant challenge rather than a perceived inadequacy, and that they have a number of options. Those options are explored in just enough detail to overcome the hurdle efficiently.

People don't often pair with the same person for an extended period of time, so there's opportunity for a broad, and just-deep-enough, continuous education in relevant technologies, tools, techniques.

Through this ongoing process of shared learning and cross pollination, the whole development team becomes more and more capable.

For example, perhaps your SQL-optimization expert pairs with someone who is interested in the topic today.  Tomorrow, the SQL-optimization expert can go on vacation, without bringing development to a halt, and without having a queue of unfinished work pile up on her desk while she's in Hawai'i.

Not everyone has to be an expert in everything.  The task can be completed sufficiently to complete the story, and perhaps a more specific story or task will bring the tanned-and-rested expert's attention to the mostly-optimized SQL query at a later time.

This is an important risk-mitigation strategy, because having too few people who know how to perform a critical task is asking for trouble.
Improved FlowImagine you are the leader of a development team.  You walk in after a nice relaxing weekend and see one of your developers hard at work. "Hey, Travis, how was your weekend?"

Travis gets this frustrated look on his face (generally, developers should not play poker), "Uh...what? Oh.  Fine!" and he waves you away dismissively.  You've pulled him from The Zone.

What if, instead, you had walked in to see Travis and Fred sitting together, conversing quietly, and looking intently at a single computer screen.  Wouldn't you save your greeting for later?

Or, what if you had something important to ask? "Hey guys, are you going to make the client meeting at 3pm today?"

Travis continues to stare intently at the screen, and types something; but Fred spins his chair, "Oh, right!  I'll add that to our calendars." He writes a brief note on a 3x5 card that was already in his hand, and smiles reassuringly, "We'll be there!"

See the difference? Fred has handled the priority interruption without pulling Travis out of The Zone, without forcing the pair to task-switch (another popular form of waste). And Travis will be able to get Fred caught up very quickly, and they'll be on their way to finishing the task.
Mutual Encouragement"Encouragement" contains the root word "courage."  With two, it's always easier to face our fears, our fatigue, and our foibles.

Even if both Driver and Navigator are fatigued (e.g., after a big team lunch, or a long morning of meetings), together they may muster enough energy and enthusiasm to carry on and complete a task.
Enhanced Self-ControlHave you ever taken a "brief" personal break, only to discover 90 minutes later that your still involved in that phone call with Mom, Facebook update, or silent daydream?

Don't feel bad. It's natural.

If you and your pair-programming partner agree to a 15 minute break, however, you will be more likely to time your activities to return to the pairing workstation in 15 minutes, and you're more likely to engage in actual restful activities, rather than checking e-mail for 13 minutes before walking to the coffee shop.

Also, while writing code, neither Driver nor Navigator will allow themselves to become repeatedly distracted by e-mail pop-ups or cell phone ringtones.  If it's not urgent, it can wait. Or, either person can call for a break.

Human Systems
We have to remember that humans make up this complex adaptive system we use to build things, and so human nature has an extremely large impact on how we build things.

Pairing helps alleviate distraction, fatigue, brittle code, skills-gaps, embarrassment over inadequacy, communication issues, fear of failure. Pairing thus improves overall throughput by decreasing rework and hidden, "deferred" work.

I find that pair programming is usually faster when measured by task completion over time.  On average, if you give two people two dev tasks, A and B, they will be done with both tasks sooner if they pair up and complete A and B sequentially, rather than if one takes A and the other takes B.

On the surface, this may seem to contradict my earlier systems-related advice about replacing a gate with collaboration.  But there is no gate, explicit or implicit, between these developers or between most software development tasks.

Also, much depends on where a change is applied relative to the actual constraint.  If you optimize upstream from the constraint, you'll likely gum up the system even more.  (You didn't think scaled agility was going to be delivered in a pretty, branded, simple, mass-produced, gift-wrapped box...did you?! ;-)

But if you discover that your current constraint is within the development process, then allowing the team to use pair programming may considerably improve overall throughput of value. (Emphasis on value. "Velocity" isn't value. Lines of code are not necessarily valuable either. Working, useable, deployed code is valuable.)

Try It
I've used pair programming on a number of teams since 1998, and I've always found it beneficial in the ways described above, and many other ways.

All participants in my technical classes, since 2002, pair on all coding labs.  It's a great experience for participants: they often learn a great deal from each other as well as from the labs. It also benefits me, the instructor:  I can tell when a pair is in The Zone, stuck, finished, or not getting along; all by listening carefully to the levels and tone of conversations in the room.

I recommend, as with all new practices, you and your team outline a simple plan (or "test") to determine whether or not the new practice is impacting the working environment for the better.  Then try it out in earnest for 30 days, and re-evaluate at a retrospective.  Pair programming, as with many seemingly counter-intuitive agile practices, may just surprise you.

Categories: Blogs

February Newsletter: Kanban for DevOps, 7 Lean Metrics, New Custom Card Icons, and More

Here’s the February 2015 edition of the LeanKit monthly newsletter. Make sure you catch the next issue in your inbox and subscribe today. Kanban for DevOps: 3 Reasons IT Ops Uses Lean Flow Heard the joke about being “done” and being “done-done?” In part one of this three-part blog series, Dominica DeGrandis explores why using a […]

The post February Newsletter: Kanban for DevOps, 7 Lean Metrics, New Custom Card Icons, and More appeared first on Blog | LeanKit.

Categories: Companies

The Agile Appraisals Manifesto

Growing Agile - Wed, 02/18/2015 - 10:55
Last week (9 – 11 February 2015) we attended the Scrum Coach Retreat in South Africa. Some friends of ours formed a group dedicated to looking at how appraisals are done at large corporates and if they could be made more agile. We loved what they came up with so much we had to blog about it!

The Authors : 

Philip (@7ft_phil), Justin (@the_jus), Candice (@candicemesk), Yusuf (@ykaloo)

The Situation Given      Appraisal must happen in a corporate agile team When    Doing agile and doing it right in terms of delivery of software. We want to    Ensure that the appraisal process supports the agile process we embrace. Eg. Adapting to change, reviews more regularly, inspect and adapt, teams over individuals, encourage courageous feedback. Proposed Elevator Pitch We’ve all experienced performance appraisals, and we’ve all realised that they violate the agile values that we hold dear. Perhaps subjectivity from the person in power led to an unfair or skewed result, perhaps the infrequency of assessments meant recent events dominated rather  than the big picture? Do you truly feel that the feedback received allowed you to improve as much as you can, and do you feel the reward is fair? Imagine guidelines that shape appraisals past the pitfalls and rewards an agile friendly outcome. AgileAppraisalsManifesto Agile Appraisals Manifesto Fairness over malice. Actionable feedback over numbers. Frequent reality checks over enticing gaming of the process. Rewards reflective of business value added over rewarding the status quo.  Principles of the Agile Appraisals Manifesto
  • We respect individuals regardless of the power dynamic in the process.
  • Criteria should be visible and the process transparent.
  • The process is fluid and can change as long as the manifesto is applied.
  • Appraisals should be regular enough to ensure no surprises.
  • We embrace and support and reward changing actions where business reality demands.
  • We recognise the value of work done.
  • Appraisals should be done by people with context.
  • We assess as frequently as possible, but no more frequent than would be disruptive.
  • Metrics should be easy to understand , assessable and achievable, and have a shelf life.
  • We provide frequent actionable feedback to help teams and people improve.

Thank you so much for sharing this with us Phil, Justin, Candice and Yusuf. We think its a great place to start conversations to improve how appraisals are done, especially for organisations where getting rid of appraisals is too big a step right now.

Categories: Companies

Knowledge Sharing

SpiraTeam is a agile application lifecycle management (ALM) system designed specifically for methodologies such as scrum, XP and Kanban.