Skip to content

Feed aggregator

Clean Tests: Isolating Internal State

Jimmy Bogard - Tue, 02/17/2015 - 19:46

Other posts in this series:

One of the more difficult problems with slow tests that touch shared resources is building a clean starting point. In order for tests to be reliable, the environment in which the test executes needs to be in a reliable, consistent starting state. In slow tests, in which I’m accessing out-of-process dependencies, I’m worried about two things:

  • External state is known and consistent
  • Internal state is known and consistent

In order to keep my sanity, I want to put the responsibility of building that known starting point into a Standard Fixture. This fixture is responsible for creating that starting point, and it’s this starting point that ensures the long-term maintainability of my system.

Consistent internal state

Since I’m using AutoFixture for the creation and configuration of my fixture, it will be AutoFixture I use to build out my Standard Fixture. My standard fixture will be a single class in which my tests will interact with, and because the name “Fixture” is a bit overused in many libraries, I have to name my class somewhat specifically, and it will start with building out an isolated sandbox for my internal state:

public class SlowTestFixture
{
    private static IContainer Root = IoC.BuildCompositionRoot();

    public SlowTestFixture()
    {
        Container = Root.CreateChildContainer();
    }

    public IContainer Container { get; }
}

I use a DI container as my composition root in my systems, and this combined with child containers allows me to ensure that I have a unique, isolated sandbox for running my tests. The root container is my blueprint for an execution context, and represents what I do in production. The child container’s configuration, whatever I might do to it, lives only for the context of this one test.

Throughout the rest of my tests, I can access that container to build components as need be. The next piece I’ll need is to tell AutoFixture about this fixture, and to use it both when someone needs access to the context as well as when someone needs an instance of something.

In AutoFixture, this is done via fixture customizations:

public class SlowTestsCustomization : ICustomization
{
    public void Customize(IFixture fixture)
    {
        var contextFixture = new SlowTestFixture();

        fixture.Register(() => contextFixture);

        fixture.Customizations.Add(new ContainerBuilder(contextFixture.Container));
    }
}

Customizations alter behaviors of the AutoFixture’s fixture object, allowing me to add effectively new links in a chain of responsbility pattern. I want two behaviors added:

  • Access to the fixture
  • Building container-supplied instances

The first is simple, I can register individual instances with AutoFixture using the “Register” method. The second, since it depends on the type supplied, needs its own isolated customization:

public class ContainerBuilder : ISpecimenBuilder
{
    private readonly IContainer _container;

    public ContainerBuilder(IContainer container)
    {
        _container = container;
    }

    public object Create(object request, ISpecimenContext context)
    {
        var type = request as Type;

        if (type == null || type.IsPrimitive)
        {
            return new NoSpecimen(request);
        }

        var service = _container.TryGetInstance(type);

        return service ?? new NoSpecimen(request);
    }
}

AutoFixture calls each specimen builder, one at a time, and each specimen builder either builds out an instance or returns a null object, the “NoSpecimen” object.

Ultimately, the goal is to be able to have my tests to use a pre-built component, or to use the fixture as necessary:

public InvoiceApprovalTests(Invoice invoice,
    SlowTestFixture fixture,
    IInvoiceApprover invoiceApprover)
{
    _invoice = invoice;

    invoiceApprover.Approve(invoice);
    fixture.Save(invoice);
}

The last part I need to fill in is to modify Fixie to use my customizations when building up test instances. This is in my Fixie convention where I had previously configured Fixie to use AutoFixture to instantiate my test classes:

private object CreateFromFixture(Type type)
{
    var fixture = new Fixture();

    new SlowTestsCustomization().Customize(fixture);

    return new SpecimenContext(fixture).Resolve(type);
}

My tests now have an isolated sandbox for internal state, as each child container instance is isolated per fixture. If I need to inject stubs/fakes, I don’t affect any other tests because of how I’ve built the boundaries of my test in Fixie.

In the next post, I’ll look at isolating external state (the database).

Post Footer automatically generated by Add Post Footer Plugin for wordpress.

Categories: Blogs

Designing a job crafting experience

Alastair Simpson created a Mentor Canvas intended for mentoring UX designers.

I generally like it because it provides a reasonable structure in a collaborative, canvas style.

However, to make it more appealing to me, I'd like to adjust it to generalise to a non-UX designer perspective and also to reflect some slightly different assumptions of what I consider important for developing oneself and others.  Specifically, I prefer a job crafting approach.

I've created a template on Google Drive:






Categories: Blogs

Should Agile Equal Being Happy?

Leading Agile - Mike Cottmeyer - Tue, 02/17/2015 - 15:50

Ever had a conversation with someone about what they thought “being” Agile meant?  I was having that conversation today.  The other guy said he was surprised that he wasn’t happier.  I asked him to help me understand what he meant by that.

An Agile team should be happy

Someone, somewhere, convinced this fellow that the Manifesto for Agile Software Development included life, liberty, and the pursuit of happiness.

The reality is I feel he was misguided, just like all of those other people who think that if you’re on an Agile team then you don’t plan, you don’t test, or you don’t document. The ideas like Agile is all teddy bears and rainbows has somehow spread to the far reaches of the Agile community.

When asked if Agile makes me happy, my response was simple.

No

Being an Agile coach, leading Agile transformations, and helping customers reach their potential does not make me happy.  It leaves me with a feeling of satisfaction.  Much like mowing my lawn every weekend in summer, it doesn’t make me happy. But, when I am done with the task at hand, I look at what I have accomplished and I feel satisfied.  Isn’t that a more realistic goal? The pursuit of satisfaction, as it relates to work?  Happiness is an emotional state that I reserve to my personal life, when I combine satisfaction from my work and positive emotions in my off-time.

Is the goal of happiness within an Agile team misguided?

I’m interested in your thoughts.

The post Should Agile Equal Being Happy? appeared first on LeadingAgile.

Categories: Blogs

UPscALE Agile in Medium & Large Enterprises, Stuttgart, Germany, March 11 2015

Scrum Expert - Tue, 02/17/2015 - 11:25
UPscALE Agile in Medium & Large Enterprises is a one-day focused on the scaling of Scrum and other Agile software development approaches. All the talks are in German except of the keynote. In the agenda of UPscALE Agile in Medium & Large Enterprises you will find topics like the ” Scrum @ Scale – A Scaling Framework based on My Experiences” keynote delivered by Jeff Sutherland. The other presentations will be performed by medium and large enterprises like Volkswagen and SAP about their experience is scaling Agile. Web site: http://www.up-scale.de/ Location for the ...
Categories: Communities

Cancelling $http requests for fun and profit

Xebia Blog - Tue, 02/17/2015 - 10:11

At my current client, we have a large AngularJS application that is configured to show a full-page error whenever one of the $http requests ends up in error. This is implemented with an error interceptor as you would expect it to be. However, we’re also using some calculation-intense resources that happen to timeout once in a while. This combination is tricky: a user triggers a resource request when navigating to a certain page, navigates to a second page and suddenly ends up with an error message, as the request from the first page triggered a timeout error. This is a particular unpleasant side effect that I’m going to address in a generic way in this post.

There are of course multiple solutions to this problem. We could create a more resilient implementation in the backend that will not time out, but accepts retries. We could change the full-page error in something less ‘in your face’ (but you still would get some out-of-place error notification). For this post I’m going to fix it using a different approach: cancel any running requests when a user switches to a different location (the route part of the URL). This makes sense; your browser does the same when navigating from one page to another, so why not mimic this behaviour in your Angular app?

I’ve created a pretty verbose implementation to explain how to do this. At the end of this post, you’ll find a link to the code as a packaged bower component that can be dropped in any Angular 1.2+ app.

To cancel a running request, Angular does not offer that many options. Under the hood, there are some places where you can hook into, but that won’t be necessary. If we look at the $http usage documentation, the timeout property is mentioned and it accepts a promise to abort the underlying call. Perfect! If we set a promise on all created requests, and abort these at once when the user navigates to another page, we’re (probably) all set.

Let’s write an interceptor to plug in the promise in each request:

angular.module('angularCancelOnNavigateModule')
  .factory('HttpRequestTimeoutInterceptor', function ($q, HttpPendingRequestsService) {
    return {
      request: function (config) {
        config = config || {};
        if (config.timeout === undefined && !config.noCancelOnRouteChange) {
          config.timeout = HttpPendingRequestsService.newTimeout();
        }
        return config;
      }
    };
  });

The interceptor will not overwrite the timeout property when it is explicitly set. Also, if the noCancelOnRouteChange option is set to true, the request won’t be cancelled. For better separation of concerns, I’ve created a new service (the HttpPendingRequestsService) that hands out new timeout promises and stores references to them.

Let’s have a look at that pending requests service:

angular.module('angularCancelOnNavigateModule')
  .service('HttpPendingRequestsService', function ($q) {
    var cancelPromises = [];

    function newTimeout() {
      var cancelPromise = $q.defer();
      cancelPromises.push(cancelPromise);
      return cancelPromise.promise;
    }

    function cancelAll() {
      angular.forEach(cancelPromises, function (cancelPromise) {
        cancelPromise.promise.isGloballyCancelled = true;
        cancelPromise.resolve();
      });
      cancelPromises.length = 0;
    }

    return {
      newTimeout: newTimeout,
      cancelAll: cancelAll
    };
  });

So, this service creates new timeout promises that are stored in an array. When the cancelAll function is called, all timeout promises are resolved (thus aborting all requests that were configured with the promise) and the array is cleared. By setting the isGloballyCancelled property on the promise object, a response promise method can check whether it was cancelled or another exception has occurred. I’ll come back to that one in a minute.

Now we hook up the interceptor and call the cancelAll function at a sensible moment. There are several events triggered on the root scope that are good hook candidates. Eventually I settled for $locationChangeSuccess. It is only fired when the location change is a success (hence the name) and not cancelled by any other event listener.

angular
  .module('angularCancelOnNavigateModule', [])
  .config(function($httpProvider) {
    $httpProvider.interceptors.push('HttpRequestTimeoutInterceptor');
  })
  .run(function ($rootScope, HttpPendingRequestsService) {
    $rootScope.$on('$locationChangeSuccess', function (event, newUrl, oldUrl) {
      if (newUrl !== oldUrl) {
        HttpPendingRequestsService.cancelAll();
      }
    })
  });

When writing tests for this setup, I found that the $locationChangeSuccess event is triggered at the start of each test, even though the location did not change yet. To circumvent this situation, the function does a simple difference check.

Another problem popped up during testing. When the request is cancelled, Angular creates an empty error response, which in our case still triggers the full-page error. We need to catch and handle those error responses. We can simply add a responseError function in our existing interceptor. And remember the special isGloballyCancelled property we set on the promise? That’s the way to distinguish between cancelled and other responses.

We add the following function to the interceptor:

      responseError: function (response) {
        if (response.config.timeout.isGloballyCancelled) {
          return $q.defer().promise;
        }
        return $q.reject(response);
      }

The responseError function must return a promise that normally re-throws the response as rejected. However, that’s not what we want: neither a success nor a failure callback should be called. We simply return a never-resolving promise for all cancelled requests to get the behaviour we want.

That’s all there is to it! To make it easy to reuse this functionality in your Angular application, I’ve packaged this module as a bower component that is fully tested. You can check the module out on this GitHub repo.

Categories: Companies

Python/pandas: Column value in list (ValueError: The truth value of a Series is ambiguous.)

Mark Needham - Mon, 02/16/2015 - 23:39

I’ve been using Python’s pandas library while exploring some CSV files and although for the most part I’ve found it intuitive to use, I had trouble filtering a data frame based on checking whether a column value was in a list.

A subset of one of the CSV files I’ve been working with looks like this:

$ cat foo.csv
"Foo"
1
2
3
4
5
6
7
8
9
10

Loading it into a pandas data frame is reasonably simple:

import pandas as pd
df = pd.read_csv('foo.csv', index_col=False, header=0)
>>> df
   Foo
0    1
1    2
2    3
3    4
4    5
5    6
6    7
7    8
8    9
9   10

If we want to find the rows which have a value of 1 we’d write the following:

>>> df[df["Foo"] == 1]
   Foo
0    1

Finding the rows with a value less than 7 is as you’d expect too:

>>> df[df["Foo"] < 7]
   Foo
0    1
1    2
2    3
3    4
4    5
5    6

Next I wanted to filter out the rows containing odd numbers which I initially tried to do like this:

odds = [i for i in range(1,10) if i % 2 <> 0]
>>> odds
[1, 3, 5, 7, 9]
 
>>> df[df["Foo"] in odds]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/markneedham/projects/neo4j-himym/himym/lib/python2.7/site-packages/pandas/core/generic.py", line 698, in __nonzero__
    .format(self.__class__.__name__))
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Unfortunately that doesn’t work and I couldn’t get any of the suggestions from the error message to work either. Luckily pandas has a special isin function for this use case which we can call like this:

>>> df[df["Foo"].isin(odds)]
   Foo
0    1
2    3
4    5
6    7
8    9

Much better!

Categories: Blogs

When development resembles the ageing of wine

Xebia Blog - Mon, 02/16/2015 - 21:29

Once upon a time I was asked to help out a software product company.  The management briefing went something like this: "We need you to increase productivity, the guys in development seem to be unable to ship anything! and if they do ship something it's only a fraction of what we expected".

And so the story begins. Now there are many ways how we can improve the teams outcome and its output (the first matters more), but it always starts with observing what they do today and trying to figure out why.

It turns out that requests from the business were treated like a good wine, and were allowed to "age", in the oak barrel that was called Jira. Not so much to add flavour in the form of details, requirements, designs, non functional requirements or acceptance criteria, but mainly to see if the priority of this request would remain stable over a period of time.

In the days that followed I participated in the "Change Control Board" and saw what he meant. Management would change priorities on the fly and make swift decisions on requirements that would take weeks to implement. To stay in vinotology terms, wine was poured in and out the barrels at such a rate that it bore more resemblance to a blender than to the art of wine making.

Though management was happy to learn I had unearthed to root cause to their problem, they were less pleased to learn that they themselves were responsible.  The Agile world created the Product Owner role for this, and it turned out that this is hat, that can only be worn by a single person.

Once we funnelled all the requests through a single person, both responsible for the success of the product and for the development, we saw a big change. Not only did the business got a reliable sparring partner, but the development team had a single voice when it came to setting the priorities. Once the team starting finishing what they started we started shipping at regular intervals, with features that we all had committed to.

Of course it did not take away the dynamics of the business, but it allowed us to deliver, and become reliable in how and when we responded to change. Perhaps not the most aged wine, but enough to delight our customers and learn what we should put in our barrel for the next round.

 

Categories: Companies

Managing Flow

TV Agile - Mon, 02/16/2015 - 20:45
This presentation shows the impact that wait time has on when the customer receives the product or service. How by focusing on wait time we can improve the flow of products or services to our customers and significantly reduce the time to delivery to the customer. Deliver with higher frequency and better quality with example […]
Categories: Blogs

Agile Misconceptions: There Is One Right Approach

Johanna Rothman - Mon, 02/16/2015 - 17:59

I have an article up on agileconnection.com called Common Misconceptions about Agile: There Is Only One Approach.

If you read my Design Your Agile Project series, you know I am a fan of determining what approach works when for your organization or project.

Please leave comments over there. Thanks!

Two notes:

  1. If you would like to write an article for agileconnection.com, I’m the technical editor. Send me your article and we can go from there.
  2. If you would like more common-sense approaches to agile, sign up for the Influential Agile Leader. We’re leading it in San Francisco and London this year. Early bird pricing ends soon.
Categories: Blogs

What good are story points and velocity in Scrum?

Scrum Breakfast - Mon, 02/16/2015 - 12:10
We use velocity as a measure of how many story points to take into the next sprint. When you take in enough stories, and story points, so that you reach your average velocity, then, you can end the sprint planning meeting.Although this is a common approach, it is exactly how you should not use story points in Scrum. It leads to over-commitment and spillover (started, but unfinished work) at the end of the sprint. Both of these are bad for performance. How should you use story points in planning? How do you create the Forecast? And what do you do if the team runs out of work?

The first thing to remember is that Development Team is self-organizing. They have exclusive jurisdiction over how much work they take on. The Product Owner has final say over the ordering of items in the backlog, but nobody tells the the Development Team how much work to take on! Not the Product Owner, not the ScrumMaster, and certainly not the math!

As a Product Owner, I would use story points to help set medium and long-term expectations on what is really achievable. Wish and probable reality need to be more or less in sync with each other. If the disparity is too big, it's the Product Owner's job to fix the problem, and she has lots of options: less scope, simpler acceptance criteria, more time, more people, pivot, persevere, or even abandon.

As a ScrumMaster, I would use velocity to identify a number of dysfunctions. A wavy burndown chart is often a symptom of stories that are too big, excessive spillover, or poorly understood acceptance criteria (to name the most likely causes). A flattening burn-down chart is often a sign of technical debt. An accelerating burn-down chart may be sign of management pressure to perform (story point inflation). A lack of a burn-down or velocity chart may be a sign of flying blind!

As a member of the Development Team, I would use the estimate in story points to help decide whether stories are ready to take into the sprint. An individual story should represent on average 10% or less of the team's capacity.
How to create the Sprint ForecastHow much work should the team take on in a sprint? As Scrum Master, I would ask the team, can you do the first story? Can you do the first and the second? Can you do first, the second and the third? Keep asking until the team hesitates. As soon as they hesitate, stop. That is the forecast.

Why should you stop at this point? Taking on more stories will add congestion and slow down the team. Think of the highway at rush hour. Do more cars on the road mean the traffic moves faster? Would be nice.

Why do you even make a forecast? Some projects say, let's just get into a state of flow, and pull work as we are ready to take it. This can work too, but my own experience with that approach has been mixed. It is very easy to lose focus on getting things done and lose the ability to predict what can be done over a longer period of time. So I believe Sprint Forecasts are useful because they help us inspect-and-adapt enroute to our longer term goal.

What about "yesterday's weather"? Can we use the results of the last sprint to reality check the forecast for this sprint? Sure! If your team promised 100 but only delivered 70 or less, this is a sign that they should not commit to more than 70, and quite probably less. I call this "throttling", and it is one of my 12 Tips for Product Owners who want better performance from their Scrum Teams. But yesterday's weather is not a target, it's a sanity check. If it becomes your target, it may be holding you down.
What if the team runs out of work?On the one hand, this is easy. If the team runs out of work, they can just ask the Product Owner for more. A working agreement can streamline this process, for example, Team, if you run out of work, you can:

  • Take the top item from the product backlog.
  • Contact me (the Product Owner) if you get down to just one ready item in the backlog
  • Implement your top priority improvement to our code ("refactoring")

Implementing improvements from the last retrospective is usually a particularly good idea, unless you are very close to a release. There are investments in productivity that will often pay huge dividends, surprisingly quickly!


Categories: Blogs

Want best impact? Change yourself!

Manage Well - Tathagat Varma - Mon, 02/16/2015 - 12:02
A lot of us want to create an impact, especially the ones that comes in B-I-G font size. Change the world. Stop global warming. Establish world peace. Find cancer cure. Stop wars. Leave a legacy that lasts forever. We want to conquer the world with our ideas, our creation, our accomplishments.
Categories: Blogs

What happen if you combine and integrate two awesome tools?

tinyPM Team Blog - Mon, 02/16/2015 - 11:17

tinypm_slack

Have you heard about Slack?
No? So you need to catch up. Slack is an awesome platform for team communication – “everything in one place, instantly searchable, available wherever you go“.

It’s really great! You can easily improve your team communication by creating open channels, projects and topics that the whole team shares. What we love the most in Slack?
- splendid search feature
- easy way to attach pictures and others
- beautiful design
- simplicity

It’s just “everything in one place”!

 

To provide you with essential value of your work
We’ve integrated tinyPM with Slack! Now you can check what happen when two awesome tools are working together to bring you everything that is necessary to do a great job!

This is one of these tiny things that help you to transform your good team into a great team! Now you can stay up to date with every change that was made in tinyPM. This integration will post updates to a channel in Slack whenever a story or task activity occurs. How cool is that? images

tinyPM_notification_in_Slack

That’s why we’ve decided to integrate our tinyPM with Slack – to provide you with essential value of being always aware of what’s going on with your projects. Getting notified about every change will helps you to manage your team and your projects more effeiciently.

 

How to connect your tinyPM with Slack?
It’s quite simple and intuitive. You just need to enable Slack integration with tinyPM just create an Incoming WebHook in your Slack:

  • Open Slack
  • Click on the dropdown next to your team name and select Configure integrations
  • Select Incoming WebHooks
  • Copy the Webhook URL to your clipboard

 

Once you create Incoming WebHook in Slack:

  • Open your tinyPM
  • Go to “Application Settings”
  • Click on Slack integration
  • Paste WebHook URL

 

That’s it! As simple as that. Now, working with tinyPM and Slack integration you will save your time, manage your projects way better that before and communicate more efficiently.

 

Tell us about your feelings with regards to tinyPM and Slack integration. We’re looking forward to receiving your feedback.

Categories: Companies

Kanban Thinking Workshop in London

AvailAgility - Karl Scotland - Mon, 02/16/2015 - 11:00

Kanban-thinking-banner

I have another public Kanban Thinking workshop coming up in London (March 5-6), in collaboration with Agil8, and to fill the last few places, I can offer a discount! Book now, using the code KS25 to get 25% off the standard price, and get 2 days of fun, discover how to design a kanban system by populating a kanban canvas, and learn how to make system interventions which have a positive impact.

To wet your appetite, here’s a couple of photos from a recent workshop. (Click for larger versions).

IMG_1371IMG_1370

Categories: Blogs

Early Bird Ends Soon for Influential Agile Leader

Johanna Rothman - Sun, 02/15/2015 - 22:46

If you are a leader for your agile efforts in your organization, you need to consider participating in The Influential Agile Leader. If you are working on how to transition to agile, how to talk about agile, how to help your peers, managers, or teams, you want to participate.

Gil Broza and I designed it to be experiential and interactive. We’re leading the workshop in San Francisco, Mar 31-Apr 1. We’ll be in London April 14-15.

The early bird pricing ends Feb 20.

People who participate see great results, especially when they bring peers/managers from their organization. Sign up now.

Categories: Blogs

Python/scikit-learn: Calculating TF/IDF on How I met your mother transcripts

Mark Needham - Sun, 02/15/2015 - 17:56

Over the past few weeks I’ve been playing around with various NLP techniques to find interesting insights into How I met your mother from its transcripts and one technique that kept coming up is TF/IDF.

The Wikipedia definition reads like this:

tf–idf, short for term frequency–inverse document frequency, is a numerical statistic that is intended to reflect how important a word is to a document in a collection or corpus.

It is often used as a weighting factor in information retrieval and text mining.

The tf-idf value increases proportionally to the number of times a word appears in the document, but is offset by the frequency of the word in the corpus, which helps to adjust for the fact that some words appear more frequently in general.

I wanted to generate a TF/IDF representation of phrases used in the hope that it would reveal some common themes used in the show.

Python’s scikit-learn library gives you two ways to generate the TF/IDF representation:

  1. Generate a matrix of token/phrase counts from a collection of text documents using CountVectorizer and feed it to TfidfTransformer to generate the TF/IDF representation.
  2. Feed the collection of text documents directly to TfidfVectorizer and go straight to the TF/IDF representation skipping the middle man.

I started out using the first approach and hadn’t quite got it working when I realised there was a much easier way!

I have a collection of sentences in a CSV file so the first step is to convert those into a list of documents:

from collections import defaultdict
import csv
 
episodes = defaultdict(list)
with open("data/import/sentences.csv", "r") as sentences_file:
    reader = csv.reader(sentences_file, delimiter=',')
    reader.next()
    for row in reader:
        episodes[row[1]].append(row[4])
 
for episode_id, text in episodes.iteritems():
    episodes[episode_id] = "".join(text)
 
corpus = []
for id, episode in sorted(episodes.iteritems(), key=lambda t: int(t[0])):
    corpus.append(episode)

corpus contains 208 entries (1 per episode), each of which is a string containing the transcript of that episode. Next it’s time to train our TF/IDF model which is only a few lines of code:

from sklearn.feature_extraction.text import TfidfVectorizer
tf = TfidfVectorizer(analyzer='word', ngram_range=(1,3), min_df = 0, stop_words = 'english')

The most interesting parameter here is ngram_range – we’re telling it to generate 2 and 3 word phrases along with the single words from the corpus.

e.g. if we had the sentence “Python is cool” we’d end up with 6 phrases – ‘Python’, ‘is’, ‘cool’, ‘Python is’, ‘Python is cool’ and ‘is cool’.

Let’s execute the model against our corpus:

tfidf_matrix =  tf.fit_transform(corpus)
>>> len(feature_names)
498254
 
>>> feature_names[50:70]
[u'00 does sound', u'00 don', u'00 don buy', u'00 dressed', u'00 dressed blond', u'00 drunkenly', u'00 drunkenly slurred', u'00 fair', u'00 fair tonight', u'00 fall', u'00 fall foliage', u'00 far', u'00 far impossible', u'00 fart', u'00 fart sure', u'00 friends', u'00 friends singing', u'00 getting', u'00 getting guys', u'00 god']

So we’re got nearly 500,000 phrases and if we look at tfidf_matrix we’d expect it to be a 208 x 498254 matrix – one row per episode, one column per phrase:

>>> tfidf_matrix
<208x498254 sparse matrix of type '<type 'numpy.float64'>'
	with 740396 stored elements in Compressed Sparse Row format>

This is what we’ve got although under the covers it’s using a sparse representation to save space. Let’s convert the matrix to dense format to explore further and find out why:

dense = tfidf_matrix.todense()
>>> len(dense[0].tolist()[0])
498254

What I’ve printed out here is the size of one row of the matrix which contains the TF/IDF score for every phrase in our corpus for the 1st episode of How I met your mother. A lot of those phrases won’t have happened in the 1st episode so let’s filter those out:

episode = dense[0].tolist()[0]
phrase_scores = [pair for pair in zip(range(0, len(episode)), episode) if pair[1] > 0]
 
>>> len(phrase_scores)
4823

There are just under 5000 phrases used in this episode, roughly 1% of the phrases in the whole corpus.
The sparse matrix makes a bit more sense – if scipy used a dense matrix representation there’d be 493,000 entries with no score which becomes more significant as the number of documents increases.

Next we’ll sort the phrases by score in descending order to find the most interesting phrases for the first episode of How I met your mother:

>>> sorted(phrase_scores, key=lambda t: t[1] * -1)[:5]
[(419207, 0.2625177493269755), (312591, 0.19571419072701732), (267538, 0.15551468983363487), (490429, 0.15227880637176266), (356632, 0.1304175242341549)]

The first value in each tuple is the phrase’s position in our initial vector and also corresponds to the phrase’s position in feature_names which allows us to map the scores back to phrases. Let’s look up a couple of phrases:

>>> feature_names[419207]
u'ted'
>>> feature_names[312591]
u'olives'
>>> feature_names[356632]
u'robin'

Let’s automate that lookup:

sorted_phrase_scores = sorted(phrase_scores, key=lambda t: t[1] * -1)
for phrase, score in [(feature_names[word_id], score) for (word_id, score) in sorted_phrase_scores][:20]:
   print('{0: <20} {1}'.format(phrase, score))
 
ted                  0.262517749327
olives               0.195714190727
marshall             0.155514689834
yasmine              0.152278806372
robin                0.130417524234
barney               0.124411751867
lily                 0.122924977859
signal               0.103793246466
goanna               0.0981379875009
scene                0.0953423604123
cut                  0.0917336653574
narrator             0.0864622981985
flashback            0.078295921554
flashback date       0.0702825260177
ranjit               0.0693927691559
flashback date robin 0.0585687716814
ted yasmine          0.0585687716814
carl                 0.0582101172888
eye patch            0.0543650529797
lebanese             0.0543650529797

We see all the main characters names which aren’t that interested – perhaps they should be part of the stop list – but ‘olives’ which is where the olive theory is first mentioned. I thought olives came up more often but a quick search for the term suggests it isn’t mentioned again until Episode 9 in Season 9:

$ grep -rni --color "olives" data/import/sentences.csv | cut -d, -f 2,3,4 | sort | uniq -c
  16 1,1,1
   3 193,9,9

‘yasmine’ is also an interesting phrase in this episode but she’s never mentioned again:

$ grep -h -rni --color "yasmine" data/import/sentences.csv
49:48,1,1,1,"Barney: (Taps a woman names Yasmine) Hi, have you met Ted? (Leaves and watches from a distance)."
50:49,1,1,1,"Ted: (To Yasmine) Hi, I'm Ted."
51:50,1,1,1,Yasmine: Yasmine.
53:52,1,1,1,"Yasmine: Thanks, It's Lebanese."
65:64,1,1,1,"[Cut to the bar, Ted is chatting with Yasmine]"
67:66,1,1,1,Yasmine: So do you think you'll ever get married?
68:67,1,1,1,"Ted: Well maybe eventually. Some fall day. Possibly in Central Park. Simple ceremony, we'll write our own vows. But--eh--no DJ, people will dance. I'm not going to worry about it! Damn it, why did Marshall have to get engaged? (Yasmine laughs) Yeah, nothing hotter than a guy planning out his own imaginary wedding, huh?"
69:68,1,1,1,"Yasmine: Actually, I think it's cute."
79:78,1,1,1,"Lily: You are unbelievable, Marshall. No-(Scene splits in half and shows both Lily and Marshall on top arguing and Ted and Yasmine on the bottom mingling)"
82:81,1,1,1,Ted: (To Yasmine) you wanna go out sometime?
85:84,1,1,1,[Cut to Scene with Ted and Yasmine at bar]
86:85,1,1,1,Yasmine: I'm sorry; Carl's my boyfriend (points to bartender)

It would be interesting to filter out the phrases which don’t occur in any other episode and see what insights we get from doing that. For now though we’ll extract phrases for all episodes and write to CSV so we can explore more easily:

with open("data/import/tfidf_scikit.csv", "w") as file:
    writer = csv.writer(file, delimiter=",")
    writer.writerow(["EpisodeId", "Phrase", "Score"])
 
    doc_id = 0
    for doc in tfidf_matrix.todense():
        print "Document %d" %(doc_id)
        word_id = 0
        for score in doc.tolist()[0]:
            if score > 0:
                word = feature_names[word_id]
                writer.writerow([doc_id+1, word.encode("utf-8"), score])
            word_id +=1
        doc_id +=1

And finally a quick look at the contents of the CSV:

$ tail -n 10 data/import/tfidf_scikit.csv
208,york apparently laughs,0.012174304095213192
208,york aren,0.012174304095213192
208,york aren supposed,0.012174304095213192
208,young,0.013397275854758335
208,young ladies,0.012174304095213192
208,young ladies need,0.012174304095213192
208,young man,0.008437685963000223
208,young man game,0.012174304095213192
208,young stupid,0.011506395106658192
208,young stupid sighs,0.012174304095213192
Categories: Blogs

Diamond Kata - Some Thoughts on Tests as Documentation

Mistaeks I Hav Made - Nat Pryce - Sun, 02/15/2015 - 14:13
Comparing example-based tests and property-based tests for the Diamond Kata, I’m struck by how well property-based tests reduce duplication of test code. For example, in the solutions by Sandro Mancuso and George Dinwiddie, not only do multiple tests exercise the same property with different examples but the tests duplicate assertions. Property-based tests avoid the former by defining generators of input data, but I’m not sure why the latter occurs. Perhaps Seb’s “test recycling” approach would avoid this kind of duplication. But compared to example based tests, property based tests do not work so well as as an explanatory overview. Examples convey an overall impression of what the functionality is, but are are not good at describing precise details. When reading example-based tests, you have to infer the properties of the code from multiple examples and informal text in identifiers and comments. The property-based tests I wrote for the Diamond Kata specify precise properties of the diamond function, but nowhere is there a test that describes that the function draws a diamond! There’s a place for both examples and properties. It’s not an either/or decision. However, explanatory examples used for documentation need not be test inputs. If we’re generating inputs for property tests and generating documentation for our software, we can combine the two, and insert generated inputs and calculated ouputs into generated documentation.
Categories: Blogs

Sincere Seekers in Search of True Love

Portia Tung - Selfish Programming - Sat, 02/14/2015 - 22:43

Free-Hugs

Years ago, I made a wish. A wish that one day, I’d be brave enough and mad enough to take part in the movement that is taking the world by storm, or should I say love? I’m, of course, referring to the Free Hugs Campaign started by one man in an attempt to reconnect with humanity.

I first came across “free hugging” during a visit to Helsinki back in December 2008. It was a bitterly cold winter, the kind that made you worry about losing a toe or two if you spent too long stomping the white pavement on your own.

I was wandering around the city after a jam-packed day of Agile training and who did I find beaming with warm smiles and arms wide open towards me but two young women at the train station?

Incredibly, these two young women were offering free hugs. To anyone and everyone.

A Wish Come True

After 6 long years, this random wish of mine finally came true. On Sunday, 18 January 2015, to my great fear and delight, I was offered the chance to give free hugs to the people frequenting Pimlico (home of Tate Britain) on a chilly winter afternoon.

And in spite of of the butterflies in my tummy screaming “No!!! Don’t do it!!!”, I knew my time had come. To connect with the rest of humanity like I’ve never dared to but have always longed to do.

Together with a bunch of well-wishing strangers in search of inner peace, I stomped the pavement and offered free hugs to anyone and everyone.

Between us, we hugged over 80 people in under an hour and didn’t get arrested.

For me, the most remarkable takeaway from that experience is that I learned more about what it means to be human in those 60 minutes than I have in my lifetime so far.

I learned that strangers can be kind and generous. That most of us want nothing more than to connect with one another. That we’re all in search of true love and when we find it, what better way to celebrate it than with a hug?

Happy Valentine’s Day!

Categories: Blogs

The Great Love Quotes Collection Revamped

J.D. Meier's Blog - Sat, 02/14/2015 - 21:30

A while back I put together a comprehensive collection of love quotes.   It’s a combination of the wisdom of the ages + modern sages.   In the spirit of Valentine’s Day, I gave it a good revamp.  Here it is:

The Great Love Quotes Collection

It's a serious collection of love quotes and includes lessons from the likes of Lucille Ball, Shakespeare, Socrates, and even The Princess Bride.

How I Organized the Categories for Love Quotes

I organized the quotes into a set of buckets:
Beauty
Broken Hearts and Loss
Falling in Love
Fear and Love
Fun and Love
Kissing
Love and Life
Significance and Meaning
The Power of Love
True Love

I think there’s a little something for everyone among the various buckets.   If you walk away with three new quotes that make you feel a little lighter, put a little skip in your step, or help you see love in a new light, then mission accomplished.

Think of Love as Warmth and Connection

If you think of love like warmth and connection, you can create more micro-moments of love in your life.

This might not seem like a big deal, but if you knew all the benefits for your heart, brain, bodily processes, and even your life span, you might think twice.

You might be surprised by how much your career can be limited if you don’t balance connection with conviction.  It’s not uncommon to hear a lot of turning points in the careers of developers, program managers, IT leaders, and business leaders that changed their game, when they changed their heart.

In fact, on one of the teams I was on, the original mantra was “business before technology”, but people in the halls started to say, “people before business, business before technology” to remind people of what makes business go round.

When people treat each other better, work and life get better.

Love Quotes Help with Insights and Actions

Here are a few of my favorite love quotes from the collection …

“Love is like heaven, but it can hurt like hell.” – Unknown

“Love is not a feeling, it’s an ability.” — Dan in Real Life

“There is a place you can touch a woman that will drive her crazy. Her heart.” — Milk Money

“Hearts will be practical only when they are made unbreakable.”  – The Wizard of Oz

“Things are beautiful if you love them.” – Jean Anouilh

“Life is messy. Love is messier.” – Catch and Release

“To the world you may be just one person, but to one person you may be the world.” – Unknown

For many more quotes, explore The Great Love Quotes Collection.

You Might Also Like

Happiness Quotes Revamped

My Story of Personal Transformation

The Great Leadership Quotes Collection Revamped

The Great Personal Development Quotes Collection Revamped

The Great Productivity Quotes Collection

Categories: Blogs

Changing Behavior by Asking the Right Questions

George Dinwiddie’s blog - Sat, 02/14/2015 - 03:16

My article, Agile Adoption: Changing Behavior by Asking the Right Questions, has been published over on ProjectManagement.com (free registration required). It talks about when managers want change, but don’t want to squeeze the Agile out by force.

Categories: Blogs

Knowledge Sharing


SpiraTeam is a agile application lifecycle management (ALM) system designed specifically for methodologies such as scrum, XP and Kanban.