Skip to content

Feed aggregator

Do What Works… Even If It’s Not Agile.

Leading Agile - Mike Cottmeyer - Mon, 06/29/2015 - 22:19

I think I’ve come to the conclusion that ‘agile’ as we know it isn’t the best starting place for everyone that wants to adopt agile. Some folks, sure… everyone, probably not.

For many companies something closer to a ‘team-based, iterative and incremental delivery approach, using some agile tools and techniques, wrapped within a highly-governed Lean/Kanban based program and portfolio management framework’ is actually a better place to start.

Why?

Well, many organizations really struggle forming complete cross-functional teams, building backlogs, producing working tested software on regular intervals, and breaking dependencies. In the absence of these, agile is pretty much impossible.

Scrum isn’t impossible, mind you.

Agile is impossible.

So how does a ‘team-based, iterative and incremental delivery approach, using some agile tools and techniques, wrapped within a highly-governed Lean/Kanban based program and portfolio management framework’ actually work?

Let me explain.

First, I want to form Scrum-like teams around as much of the delivery organization as I can. I’ll form teams around shared components, feature teams, services teams, etc. Ideally, I’d like to form teams around objects that would be end up being real Scrum teams in some future state.

These Scrum-like teams operate under the same rules as a normal Scrum team, they are complete and cross-functional, internally self-organizing, same roles, ceremonies, and artifacts as a normal Scrum team, but with much higher focus on stabilizing velocity and less on adaptation.

Why ‘Scrum-like’ teams?

Dependencies. #dependenciesareevil

These teams have business process and requirements dependencies all around them. They have architectural dependencies between them. They have organizational dependencies due to the current management structure and likely some degree of matrixing.

Until those dependencies are broken, it’s tough to operate as a small independent Scrum team that can inspect and adapt their way into success. Those dependencies have to be managed until they can be broken. We can’t pretend they aren’t there.

How do I manage dependencies?

This is where the ‘Lean/Kanban’ based program and portfolio governance comes in. Explicitly model the value stream from requirements identification all the way through delivery. Anyone that can’t be on a Scrum team gets modeled in this value stream.

We like to form small, dedicated, cross-functional teams (explicitly not Scrum teams) to decompose requirements, deal with cross-cutting concerns, flow work into the Scrum-like teams, and coordinate batch size along with all the upstream and downstream coordination.

Early on, we might be doing 3-6-9 even 12-18 month roadmapping, creating 3-6 month feature level release plans, and fine grained, risk adjusted release backlogs at the story level. The goal is to nail quarterly commitments and to start driving visibility for longer term planning.

Not agile?

Don’t really care, this is a great first step toward untangling a legacy organization that is struggling to get a foot hold adopting agile. Many companies we work with, this is agile enough, but ideally this is only the first step toward greater levels of agile maturity.

How do I increase maturity?

Goal #1 was to stabilize the system and build trust with the organization. This isn’t us against them, it’s not management against the people, it’s working within the constraints of the existing system to get better business results… and fast.

Over time, you want to continue working to reduce batch size at the enterprise level, you want to progressively reduce dependencies between teams, you want to start funding teams and business capabilities rather than projects, you want to invest to learn.

Lofty goals, huh?

That said, those are lofty goals for an organization that can’t form a team, build a backlog, or produce working tested software every few weeks. Those are lofty goals for an organization that is so mired in dependencies they can’t move, let alone self-organize.

Our belief is that we are past the early adopters. We are past the small project teams in big companies. We are past simply telling people to self-organize and inspect and adapt. We need a way to crack companies apart, to systematically refactor them into agile organizations.

Once we have the foundational structures, principles, guidelines in place, and a sufficient threshold of people is bought into the new system and understands how it operates, then we can start letting go, deprecating control structures, and really living the promise of agile.

The post Do What Works… Even If It’s Not Agile. appeared first on LeadingAgile.

Categories: Blogs

Book Review: Managing Humans

thekua.com@work - Mon, 06/29/2015 - 21:31

I remember hearing about Managing Humans several years ago but I only got around to buying it and getting through reading it.

Managing Humans

It is written by the well-known Michael Lopp otherwise known as Rands, who blogs at Rands and Repose.

The title is a clever take on working in software development and Rands shares his experiences working as a technical manager in various companies through his very unique perspective and writing style. If you follow his blog, you can see it shine through in the way that he tells stories, the way that he creates names around stereotypes and situations you might find yourself in the role of a Technical Manager.

He offers lots of useful advice that covers a wide variety of topics such as tips for interviewing, resigning, making meetings more effective, dealing with specific types of characters that are useful regardless of whether or not you are a Technical Manager or not.

He also covers a wider breath of topics such as handling conflict, tips for hiring, motivation and managing upwards (the last particularly necessary in large corporations). I felt like some of the topics felt outside the topic of “Managing Humans” and the intended target audience of a Technical Manager such as tips for resigning (yourself, not handling it from your team) and joining a start up.

His stories describe the people he has worked with and situations he has worked in. A lot of it will probably resonate very well with you if you have worked, or work in large software development firm or a “Borland” of our time.

The book is easy to digest in chunks, and with clear titles, is easy to pick up at different intervals or going back for future reference. The book is less about a single message, than a series of essays that offer another valuable insight into working with people in the software industry.

Categories: Blogs

Dealing with People You Can’t Stand

J.D. Meier's Blog - Mon, 06/29/2015 - 16:46

“If You Want To Go Fast, Go Alone. If You Want To Go Far, Go Together” – African Proverb

I blew the dust off some olds posts to rekindle some of the most important information for work and life.

It’s about dealing with people you can’t stand.

Whether you think of them as jerks, bullies, or just difficult people, the better you can deal with difficult people, the better you can get things done and make things happen.

And the more you learn how to bring out the best, in people at their worst, the less you’ll find people you can’t stand.

How To Bring Out the Best in People at Their Worst (Including Yourself)

Everything I needed to learn about dealing with difficult people, I learned from the book Dealing with People You Can’t Stand: How to Bring Out the Best in People at Their Worst, by Dr. Rick Brinkman and Dr. Rick Kirschner.

It’s one of the most brilliant, thoughtful books I’ve ever read on interpersonal skills and dealing with all sorts of bad behaviors.

The real key to dealing with difficult behavior is more than just recognizing bad behaviors in other people.

It’s recognizing bad behaviors in yourself, the kind that contribute to and amplify other people’s bad behaviors.

The more you know, the more you grow, and this is truly one of those transformational books.

Learn How To Deal with Difficult People (and Gain Some Mad Interpersonal Skills)

I’ve completely re-written my pot that provides an overview of the big ideas in Dealing with People You Can’t Stand:

Dealing with People You Can’t Stand

Even better, I’ve re-written all of my posts that talk through the 10 Types of Difficult People, and what to do about them.

I have to warn you:  Once you learn the 10 Types of Difficult People, you’ll be using the labels to classify bad behaviors that you experience in the halls, in meetings, behind your back, etc.

With that in mind, here they are …

10 Types of Difficult People

Here are the 10 Types of Difficult People at a glance:

  1. Grenade Person – After a brief period of calm, the Grenade person explodes into unfocused ranting and raving about things that have nothing to do with the present circumstances.
  2. Know-It-Alls – Seldom in doubt, the Know-It-All person has a low tolerance for correction and contradiction. If something goes wrong, however, the Know-It-All will speak with the same authority about who’s to blame – you!
  3. Maybe Person – In a moment of decision, the Maybe Person procrastinates in the hope that a better choice will present itself.
  4. No Person – A No Person kills momentum and creates friction for you. More deadly to morale than a speeding bullet, more powerful than hope, able to defeat big ideas with a single syllable.
  5. Nothing Person – A Nothing Person doesn’t contribute to the conversation. No verbal feedback, no nonverbal feedback, Nothing. What else could you expect from … the Nothing Person.
  6. Snipers – Whether through rude comments, biting sarcasm, or a well-timed roll of the eyes, making you look foolish is the Sniper’s specialty.
  7. Tanks – The Tank is confrontational, pointed and angry, the ultimate in pushy and aggressive behavior
  8. Think-They-Know-It-Alls – Think-They-Know-It-All people can’t fool all the people all the time, but they can fool some of the people enough of the time, and enough of the people all of the time – all for the sake of getting some attention.
  9. Whiners – Whiners feel helpless and overwhelmed by an unfair world. Their standard is perfection, and no one and nothing measures up to it.
  10. Yes Person – In an effort to please people and avoid confrontation, Yes People say “yes” without thinking things through.

I warned you.  Are you already thinking about some Snipers in a few meetings that you have, or is there a Yes Person driving you nuts (or are you that Yes Person?)

Have you talked to a Think-They-Know-It-All lately, or worse, a Know-It—All?

Never fear, I’ve included actionable insights and recommendations for dealing with all the various bad behaviors you’ll encounter.

The Lens of Human Understanding

If all this talk about dealing with difficult people, and having silly labels seems like a gimmick, it’s not.  It’s actually deep insight rooted in a powerful, but simple framework that Dr. Rick Brinkman and Dr. Rick Kirschner refer to as the Lens of Human Understanding:

The Lens of Human Understanding

Once I learned The Lens of Human Understanding, so many things fell into place.

Not only did I understand myself better, but I could instantly see what was driving other people, and how my behavior would either create more conflict or resolve it.

But when you don’t know what makes people tick, it’s very easy to get ticked off, or to tick them off.

Here’s looking at you … and other people … and their behaviors … in a brand new way.

You Might Also Like

25 Books the Most Successful Microsoft Leaders Read and Do

Interpersonal Skills Books

Personal Development Hub on Sources of Insight

Personal Development Resources at Sources of Insight

The Great Leadership Quotes Collection

Categories: Blogs

Can You Replace User Stories with Use Cases?

Scrum Expert - Mon, 06/29/2015 - 15:55
Agile requirements are a key success factor for Scrum projects. Many people criticize the minimalist format of user stories, often forgetting that they are mainly a support for a conversation and don’t have the objective to fully document requirements. In this article, Paul Raymond discusses how classical use cases can be use to expand user stories during requirements elicitation in Scrum sprints. Paul Raymond, Inflectra Corporation, http://www.inflectra.com/ User Stories are often characterized by relatively short, uncomplicated and informal descriptions, whereas Use Cases are often longer, more formally structured descriptions of not only ...
Categories: Communities

R: Speeding up the Wimbledon scraping job

Mark Needham - Mon, 06/29/2015 - 07:36

Over the past few days I’ve written a few blog posts about a Wimbledon data set I’ve been building and after running the scripts a few times I noticed that it was taking much longer to run that I expected.

To recap, I started out with the following function which takes in a URI and returns a data frame containing a row for each match:

library(rvest)
library(dplyr)
 
scrape_matches1 = function(uri) {
  matches = data.frame()
 
  s = html(uri)
  rows = s %>% html_nodes("div#scoresResultsContent tr")
  i = 0
  for(row in rows) {  
    players = row %>% html_nodes("td.day-table-name a")
    seedings = row %>% html_nodes("td.day-table-seed")
    score = row %>% html_node("td.day-table-score a")
    flags = row %>% html_nodes("td.day-table-flag img")
 
    if(!is.null(score)) {
      player1 = players[1] %>% html_text() %>% str_trim()
      seeding1 = ifelse(!is.na(seedings[1]), seedings[1] %>% html_node("span") %>% html_text() %>% str_trim(), NA)
      flag1 = flags[1] %>% html_attr("alt")
 
      player2 = players[2] %>% html_text() %>% str_trim()
      seeding2 = ifelse(!is.na(seedings[2]), seedings[2] %>% html_node("span") %>% html_text() %>% str_trim(), NA)
      flag2 = flags[2] %>% html_attr("alt")
 
      matches = rbind(data.frame(winner = player1, 
                                 winner_seeding = seeding1, 
                                 winner_flag = flag1,
                                 loser = player2, 
                                 loser_seeding = seeding2,
                                 loser_flag = flag2,
                                 score = score %>% html_text() %>% str_trim(),
                                 round = round), matches)      
    } else {
      round = row %>% html_node("th") %>% html_text()
    }
  } 
  return(matches)
}

Let’s run it to get an idea of the data that it returns:

matches1 = scrape_matches1("http://www.atpworldtour.com/en/scores/archive/wimbledon/540/2014/results")
 
> matches1 %>% filter(round %in% c("Finals", "Semi-Finals", "Quarter-Finals"))
           winner winner_seeding winner_flag           loser loser_seeding loser_flag            score          round
1    Milos Raonic            (8)         CAN    Nick Kyrgios          (WC)        AUS    674 62 64 764 Quarter-Finals
2   Roger Federer            (4)         SUI   Stan Wawrinka           (5)        SUI     36 765 64 64 Quarter-Finals
3 Grigor Dimitrov           (11)         BUL     Andy Murray           (3)        GBR        61 764 62 Quarter-Finals
4  Novak Djokovic            (1)         SRB     Marin Cilic          (26)        CRO  61 36 674 62 62 Quarter-Finals
5   Roger Federer            (4)         SUI    Milos Raonic           (8)        CAN         64 64 64    Semi-Finals
6  Novak Djokovic            (1)         SRB Grigor Dimitrov          (11)        BUL    64 36 762 767    Semi-Finals
7  Novak Djokovic            (1)         SRB   Roger Federer           (4)        SUI 677 64 764 57 64         Finals

As I mentioned, it’s quite slow but I thought I’d wrap it in system.time so I could see exactly how long it was taking:

> system.time(scrape_matches1("http://www.atpworldtour.com/en/scores/archive/wimbledon/540/2014/results"))
   user  system elapsed 
 25.570   0.111  31.416

About 30 seconds! The first thing I tried was downloading the file separately and running the function against the local file:

> system.time(scrape_matches1("data/raw/2014.html"))
   user  system elapsed 
 25.662   0.123  25.863

Hmmm, that’s only saved us 5 seconds so the bottleneck must be somewhere else. Still there’s no point making a HTTP request every time we run the script so we’ll stick with the local file version.

While browsing rvest’s vignette I noticed a function called html_table which I was curious about. I decided to try and replace some of my code with a call to that:

matches2= html("data/raw/2014.html") %>% 
  html_node("div#scoresResultsContent table.day-table") %>% html_table(header = FALSE) %>% 
  mutate(X1 = ifelse(X1 == "", NA, X1)) %>%
  mutate(round = ifelse(grepl("\\([0-9]\\)|\\(", X1), NA, X1)) %>% 
  mutate(round = na.locf(round)) %>%
  filter(!is.na(X8)) %>%
  select(winner = X3, winner_seeding = X1, loser = X7, loser_seeding = X5, score = X8, round)
 
> matches2 %>% filter(round %in% c("Finals", "Semi-Finals", "Quarter-Finals"))
           winner winner_seeding           loser loser_seeding            score          round
1  Novak Djokovic            (1)   Roger Federer           (4) 677 64 764 57 64         Finals
2  Novak Djokovic            (1) Grigor Dimitrov          (11)    64 36 762 767    Semi-Finals
3   Roger Federer            (4)    Milos Raonic           (8)         64 64 64    Semi-Finals
4  Novak Djokovic            (1)     Marin Cilic          (26)  61 36 674 62 62 Quarter-Finals
5 Grigor Dimitrov           (11)     Andy Murray           (3)        61 764 62 Quarter-Finals
6   Roger Federer            (4)   Stan Wawrinka           (5)     36 765 64 64 Quarter-Finals
7    Milos Raonic            (8)    Nick Kyrgios          (WC)    674 62 64 764 Quarter-Finals

I had to do some slightly clever stuff to get the ’round’ column into shape using zoo’s na.locf function which I wrote about previously.

Unfortunately I couldn’t work out how to extract the flag with this version – that value is hidden in the ‘alt’ tag of an img and presumably html_table is just grabbing the text value of each cell. This version is much quicker though!

system.time(html("data/raw/2014.html") %>% 
  html_node("div#scoresResultsContent table.day-table") %>% html_table(header = FALSE) %>% 
  mutate(X1 = ifelse(X1 == "", NA, X1)) %>%
  mutate(round = ifelse(grepl("\\([0-9]\\)|\\(", X1), NA, X1)) %>% 
  mutate(round = na.locf(round)) %>%
  filter(!is.na(X8)) %>%
  select(winner = X3, winner_seeding = X1, loser = X7, loser_seeding = X5, score = X8, round))
 
   user  system elapsed 
  0.545   0.002   0.548

What I realised from writing this version is that I need to match all the columns with one call to html_nodes rather than getting the row and then each column in a loop.

I rewrote the function to do that:

scrape_matches3 = function(uri) {
  s = html(uri)
 
  players  = s %>% html_nodes("div#scoresResultsContent tr td.day-table-name a")
  seedings = s %>% html_nodes("div#scoresResultsContent tr td.day-table-seed")
  scores   = s %>% html_nodes("div#scoresResultsContent tr td.day-table-score a")
  flags    = s %>% html_nodes("div#scoresResultsContent tr td.day-table-flag img") %>% html_attr("alt") %>% str_trim()
 
  matches3 = data.frame(
    winner         = sapply(seq(1,length(players),2),  function(idx) players[[idx]] %>% html_text()),
    winner_seeding = sapply(seq(1,length(seedings),2), function(idx) seedings[[idx]] %>% html_text() %>% str_trim()),
    winner_flag    = sapply(seq(1,length(flags),2),    function(idx) flags[[idx]]),  
    loser          = sapply(seq(2,length(players),2),  function(idx) players[[idx]] %>% html_text()),
    loser_seeding  = sapply(seq(2,length(seedings),2), function(idx) seedings[[idx]] %>% html_text() %>% str_trim()),
    loser_flag     = sapply(seq(2,length(flags),2),    function(idx) flags[[idx]]),
    score          = sapply(scores,                    function(score) score %>% html_text() %>% str_trim())
  )
  return(matches3)
}

Let’s run and time that to check we’re getting back the right results in a timely manner:

> matches3 %>% sample_n(10)
                   winner winner_seeding winner_flag               loser loser_seeding loser_flag         score
70           David Ferrer            (7)         ESP Pablo Carreno Busta                      ESP  60 673 61 61
128        Alex Kuznetsov           (26)         USA         Tim Smyczek           (3)        USA   46 63 63 63
220   Rogerio Dutra Silva                        BRA   Kristijan Mesaros                      CRO         62 63
83         Kevin Anderson           (20)         RSA        Aljaz Bedene          (LL)        GBR      63 75 62
73          Kei Nishikori           (10)         JPN   Kenny De Schepper                      FRA     64 765 75
56  Roberto Bautista Agut           (27)         ESP         Jan Hernych           (Q)        CZE   75 46 62 62
138            Ante Pavic                        CRO        Marc Gicquel          (29)        FRA  46 63 765 64
174             Tim Puetz                        GER     Ruben Bemelmans                      BEL         64 62
103        Lleyton Hewitt                        AUS   Michal Przysiezny                      POL 62 6714 61 64
35          Roger Federer            (4)         SUI       Gilles Muller           (Q)        LUX      63 75 63
 
> system.time(scrape_matches3("data/raw/2014.html"))
   user  system elapsed 
  0.815   0.006   0.827

It’s still quick – a bit slower than html_table but we can deal with that. As you can see, I also had to add some logic to separate the values for the winners and losers – the players, seeds, flags come back as as one big list. The odd rows represent the winner; the even rows the loser.

Annoyingly we’ve now lost the ’round’ column because that appears as a table heading so we can’t extract it the same way. I ended up cheating a bit to get it to work by working out how many matches each round should contain and generated a vector with that number of entries:

raw_rounds = s %>% html_nodes("th") %>% html_text()
 
> raw_rounds
 [1] "Finals"               "Semi-Finals"          "Quarter-Finals"       "Round of 16"          "Round of 32"         
 [6] "Round of 64"          "Round of 128"         "3rd Round Qualifying" "2nd Round Qualifying" "1st Round Qualifying"
 
rounds = c( sapply(0:6, function(idx) rep(raw_rounds[[idx + 1]], 2 ** idx)) %>% unlist(),
            sapply(7:9, function(idx) rep(raw_rounds[[idx + 1]], 2 ** (idx - 3))) %>% unlist())
 
> rounds[1:10]
 [1] "Finals"         "Semi-Finals"    "Semi-Finals"    "Quarter-Finals" "Quarter-Finals" "Quarter-Finals" "Quarter-Finals"
 [8] "Round of 16"    "Round of 16"    "Round of 16"

Let’s put that code into the function and see if we end up with the same resulting data frame:

scrape_matches4 = function(uri) {
  s = html(uri)
 
  players  = s %>% html_nodes("div#scoresResultsContent tr td.day-table-name a")
  seedings = s %>% html_nodes("div#scoresResultsContent tr td.day-table-seed")
  scores   = s %>% html_nodes("div#scoresResultsContent tr td.day-table-score a")
  flags    = s %>% html_nodes("div#scoresResultsContent tr td.day-table-flag img") %>% html_attr("alt") %>% str_trim()
 
  raw_rounds = s %>% html_nodes("th") %>% html_text()
  rounds = c( sapply(0:6, function(idx) rep(raw_rounds[[idx + 1]], 2 ** idx)) %>% unlist(),
              sapply(7:9, function(idx) rep(raw_rounds[[idx + 1]], 2 ** (idx - 3))) %>% unlist())
 
  matches4 = data.frame(
    winner         = sapply(seq(1,length(players),2),  function(idx) players[[idx]] %>% html_text()),
    winner_seeding = sapply(seq(1,length(seedings),2), function(idx) seedings[[idx]] %>% html_text() %>% str_trim()),
    winner_flag    = sapply(seq(1,length(flags),2),    function(idx) flags[[idx]]),  
    loser          = sapply(seq(2,length(players),2),  function(idx) players[[idx]] %>% html_text()),
    loser_seeding  = sapply(seq(2,length(seedings),2), function(idx) seedings[[idx]] %>% html_text() %>% str_trim()),
    loser_flag     = sapply(seq(2,length(flags),2),    function(idx) flags[[idx]]),
    score          = sapply(scores,                    function(score) score %>% html_text() %>% str_trim()),
    round          = rounds
  )
  return(matches4)
}
 
matches4 = scrape_matches4("data/raw/2014.html")
 
> matches4 %>% filter(round %in% c("Finals", "Semi-Finals", "Quarter-Finals"))
           winner winner_seeding winner_flag           loser loser_seeding loser_flag            score          round
1  Novak Djokovic            (1)         SRB   Roger Federer           (4)        SUI 677 64 764 57 64         Finals
2  Novak Djokovic            (1)         SRB Grigor Dimitrov          (11)        BUL    64 36 762 767    Semi-Finals
3   Roger Federer            (4)         SUI    Milos Raonic           (8)        CAN         64 64 64    Semi-Finals
4  Novak Djokovic            (1)         SRB     Marin Cilic          (26)        CRO  61 36 674 62 62 Quarter-Finals
5 Grigor Dimitrov           (11)         BUL     Andy Murray           (3)        GBR        61 764 62 Quarter-Finals
6   Roger Federer            (4)         SUI   Stan Wawrinka           (5)        SUI     36 765 64 64 Quarter-Finals
7    Milos Raonic            (8)         CAN    Nick Kyrgios          (WC)        AUS    674 62 64 764 Quarter-Finals

We shouldn’t have added much to the time but let’s check:

> system.time(scrape_matches4("data/raw/2014.html"))
   user  system elapsed 
  0.816   0.004   0.824

Sweet. We’ve saved ourselves 29 seconds per page as long as the number of rounds stayed constant over the years. For the 10 years that I’ve looked at it has but I expect if you go back further the draw sizes will have been different and our script would break.

For now though this will do!

Categories: Blogs

R: dplyr – Update rows with earlier/previous rows values

Mark Needham - Mon, 06/29/2015 - 00:30

Recently I had a data frame which contained a column which had mostly empty values:

> data.frame(col1 = c(1,2,3,4,5), col2  = c("a", NA, NA , "b", NA))
  col1 col2
1    1    a
2    2 <NA>
3    3 <NA>
4    4    b
5    5 <NA>

I wanted to fill in the NA values with the last non NA value from that column. So I want the data frame to look like this:

1    1    a
2    2    a
3    3    a
4    4    b
5    5    b

I spent ages searching around before I came across the na.locf function in the zoo library which does the job:

library(zoo)
library(dplyr)
 
> data.frame(col1 = c(1,2,3,4,5), col2  = c("a", NA, NA , "b", NA)) %>% 
    do(na.locf(.))
  col1 col2
1    1    a
2    2    a
3    3    a
4    4    b
5    5    b

This will fill in the missing values for every column, so if we had a third column with missing values it would populate those too:

> data.frame(col1 = c(1,2,3,4,5), col2  = c("a", NA, NA , "b", NA), col3 = c("A", NA, "B", NA, NA)) %>% 
    do(na.locf(.))
 
  col1 col2 col3
1    1    a    A
2    2    a    A
3    3    a    B
4    4    b    B
5    5    b    B

If we only want to populate ‘col2′ and leave ‘col3′ as it is we can apply the function specifically to that column:

> data.frame(col1 = c(1,2,3,4,5), col2  = c("a", NA, NA , "b", NA), col3 = c("A", NA, "B", NA, NA)) %>% 
    mutate(col2 = na.locf(col2))
  col1 col2 col3
1    1    a    A
2    2    a <NA>
3    3    a    B
4    4    b <NA>
5    5    b <NA>

It’s quite a neat function and certainly comes in helpful when cleaning up data sets which don’t tend to be as uniform as you’d hope!

Categories: Blogs

An Illustrated Guide To Client Projects: Requirements vs Reality

Derick Bailey - new ThoughtStream - Sun, 06/28/2015 - 03:33

What the client claims the need vs what the end up getting are often two very different things. This isn’t always a bad thing, but it certainly can be. To that end, I wanted to illustrate what the reality of a client project often is vs what the client claimed it should have been.

How The Client Described The Project

In the client’s words, everything is an absolute must requirement. There are no features they cannot live without (in spite of them currently running their business without most of the “required” features).

Project requirements

The project is large, complex, expensive, and requires significant engineering effort.

How The Client Budgeted For The Project

Unless you are working with a client that has a history of using outsourced I.T. / development / design services, there is usually a very large gap between what the client claims they must have vs what they budgeted to get. The budget is usually enough to get the general shape of the requirements, but will be missing a lot of the fine detail.

Client budget

How The Project Was Delivered

Inevitably, there will be problems along the way. Technical issues will pop up. Interaction with the client will stall as they become busy. Questions will be left unanswered, and oh by the way, they won’t pay for “whistles and bells” or “design”. It just needs to be simple and work. With the constant battles that are fought, the project typically lacks important details even though it does solve the core problem.

Project delivery

What The Client Actually Needs

When it’s all said and done, the client might be reaching for this – with you, right there with them – as the answer to their actual problems.

What the client needs

Are You Sure You Need That Client?

Be careful who you take on as a client. It’s tempting to think that you can’t say no because you need the money, or frankly because you just don’t think you can ever say no. But you need to learn to say no and be picky about the clients you take. 

The beginning of a relationship with any client will set the tone. Are they argumentative, over-haggling and nit-picky about inane details? Or do they enjoy the conversation, speak openly and honestly, and look at you as a partner in the endeavor?

Your time and talent are worth more than you think. Be sure you are treated with the respect you deserve, and always treat the client with the respect that you wish you were getting.

Categories: Blogs

R: Command line – Error in GenericTranslator$new : could not find function “loadMethod”

Mark Needham - Sun, 06/28/2015 - 00:47

I’ve been reading Text Processing with Ruby over the last week or so and one of the ideas the author describes is setting up your scripts so you can run them directly from the command line.

I wanted to do this with my Wimbledon R script and wrote the following script which uses the ‘Rscript’ executable so that R doesn’t launch in interactive mode:

wimbledon

#!/usr/bin/env Rscript
 
library(rvest)
library(dplyr)
library(stringr)
library(readr)
 
# stuff

Then I tried to run it:

$ time ./wimbledon
 
...
 
Error in GenericTranslator$new : could not find function "loadMethod"
Calls: write.csv ... html_extract_n -> <Anonymous> -> Map -> mapply -> <Anonymous> -> $
Execution halted
 
real	0m1.431s
user	0m1.127s
sys	0m0.078s

As the error suggests, the script fails when trying to write to a CSV file – it looks like Rscript doesn’t load in something from the core library that we need. It turns out adding the following line to our script is all we need:

library(methods)

So we end up with this:

#!/usr/bin/env Rscript
 
library(methods)
library(rvest)
library(dplyr)
library(stringr)
library(readr)

And when we run that all is well!

Categories: Blogs

R: dplyr – squashing multiple rows per group into one

Mark Needham - Sun, 06/28/2015 - 00:36

I spent a bit of the day working on my Wimbledon data set and the next thing I explored is all the people that have beaten Andy Murray in the tournament.

The following dplyr query gives us the names of those people and the year the match took place:

library(dplyr)
 
> main_matches %>% filter(loser == "Andy Murray") %>% select(winner, year)
 
            winner year
1  Grigor Dimitrov 2014
2    Roger Federer 2012
3     Rafael Nadal 2011
4     Rafael Nadal 2010
5     Andy Roddick 2009
6     Rafael Nadal 2008
7 Marcos Baghdatis 2006
8 David Nalbandian 2005

As you can see, Rafael Nadal shows up multiple times. I wanted to get one row per player and list all the years in a single column.

This was my initial attempt:

> main_matches %>% filter(loser == "Andy Murray") %>% 
     group_by(winner) %>% summarise(years = paste(year))
Source: local data frame [6 x 2]
 
            winner years
1     Andy Roddick  2009
2 David Nalbandian  2005
3  Grigor Dimitrov  2014
4 Marcos Baghdatis  2006
5     Rafael Nadal  2011
6    Roger Federer  2012

Unfortunately it just gives you the last matching row per group which isn’t quite what we want.. I realised my mistake while trying to pass a vector into paste and noticing that a vector came back when I’d expected a string:

> paste(c(2008,2009,2010))
[1] "2008" "2009" "2010"

The missing argument was ‘collapse’ – something I’d come across when using plyr last year:

> paste(c(2008,2009,2010), collapse=", ")
[1] "2008, 2009, 2010"

Now, if we apply that to our original function:

> main_matches %>% filter(loser == "Andy Murray") %>% 
     group_by(winner) %>% summarise(years = paste(year, collapse=", "))
Source: local data frame [6 x 2]
 
            winner            years
1     Andy Roddick             2009
2 David Nalbandian             2005
3  Grigor Dimitrov             2014
4 Marcos Baghdatis             2006
5     Rafael Nadal 2011, 2010, 2008
6    Roger Federer             2012

That’s exactly what we want. Let’s tidy that up a bit:

> main_matches %>% filter(loser == "Andy Murray") %>% 
     group_by(winner) %>% arrange(year) %>%
     summarise(years  = paste(year, collapse =","), times = length(year))  %>%
     arrange(desc(times), years)
Source: local data frame [6 x 3]
 
            winner          years times
1     Rafael Nadal 2008,2010,2011     3
2 David Nalbandian           2005     1
3 Marcos Baghdatis           2006     1
4     Andy Roddick           2009     1
5    Roger Federer           2012     1
6  Grigor Dimitrov           2014     1
Categories: Blogs

Link: Slashdot on Mob Programming

Learn more about our Scrum and Agile training sessions on WorldMindware.com

Slashdot has a post on mob programming that, as usual, has brought out the poorly socialized extreme introverts and their invective. There are always interesting and insightful comments as well.  I recommend checking it out.  The post links to some studies about mob programming that are also interesting.

Try out our Virtual Scrum Coach with the Scrum Team Assessment tool - just $500 for a team to get targeted advice and great how-to informationPlease share!
facebooktwittergoogle_plusredditpinterestlinkedinmail

The post Link: Slashdot on Mob Programming appeared first on Agile Advice.

Categories: Blogs

Scala development with GitHub's Atom editor

Xebia Blog - Sat, 06/27/2015 - 15:57
.code { font-family: monospace; background-color: #eeeeee; }

GitHub recently released version 1.0 of their Atom editor. This post gives a rough overview of its Scala support.

Basic features

Basic features such as Scala syntax highlighting are provided by the language-scala plugin.

Some work on worksheets as found in e.g. Eclipse has been done in the scala-worksheet-plus plugin, but this is still missing major features and not very useful at this time.

Navigation and completion Ctags

Atom supports basic 'Go to Declaration' (ctrl-alt-down) and 'Search symbol' (cmd-shift-r) support by way of the default ctags-based symbols-view.

While there are multiple sbt plugins for generating ctags, the easiest seems to be to have Ensime download the sources (more on that below) and invoke ctags manually: put this configuration in your home directory and run the 'ctags' command from your project root.

This is useful for searching for symbols, but limited for finding declarations: for example, when checking the declaration for Success, ctags doesn't know whether this is scala.util.Success, akka.actor.Status.Success, spray.http.StatusCodes.Success or some other 3rd-party or local symbol with that name.

Ensime

This is where the Ensime plugin comes in.

Ensime is a service for Scala IDE support, originally written for the Scala support in Emacs. The project metadata for Ensime can be generated with 'sbt gen-ensime' from the ensime-sbt sbt plugin.

Usage

Start the Ensime server from Atom with 'cmd-shift-p' 'Ensime: start'. After a small pause the status bar proclaims 'Indexer ready!' and you should be good to go.

At this point the main features are 'jump to definition' (alt-click), hover for type info, and auto-completion:

atom.io ensime completion

There are some rough edges, but this is a promising start based on a solid foundation.

Conclusions

While Atom is already a pleasant, modern, open source, cross platform editor, it is clearly still early days.

The Scala support in Atom is not yet as polished as in IDE's such as IntelliJ IDEA or as stable as in more mature editors such as Sublime Text, but is already practically useful and has serious potential. Startup is not instant, but I did not notice a 'sluggish feel' as reported by earlier reviewers.

Feel free to share your experiences in the comments, I will keep this post updated as the tools - and our experience with them - evolve.

Categories: Companies

R: ggplot – Show discrete scale even with no value

Mark Needham - Sat, 06/27/2015 - 00:48

As I mentioned in a previous blog post, I’ve been scraping data for the Wimbledon tennis tournament, and having got the data for the last ten years I wrote a query using dplyr to find out how players did each year over that period.

I ended up with the following functions to filter my data frame of all the mataches:

round_reached = function(player, main_matches) {
  furthest_match = main_matches %>% 
    filter(winner == player | loser == player) %>% 
    arrange(desc(round)) %>% 
    head(1)  
 
    return(ifelse(furthest_match$winner == player, "Winner", as.character(furthest_match$round)))
}
 
player_performance = function(name, matches) {
  player = data.frame()
  for(y in 2005:2014) {
    round = round_reached(name, filter(matches, year == y))
    if(length(round) == 1) {
      player = rbind(player, data.frame(year = y, round = round))      
    } else {
      player = rbind(player, data.frame(year = y, round = "Did not enter"))
    } 
  }
  return(player)
}

When we call that function we see the following output:

> player_performance("Andy Murray", main_matches)
   year          round
1  2005    Round of 32
2  2006    Round of 16
3  2007  Did not enter
4  2008 Quarter-Finals
5  2009    Semi-Finals
6  2010    Semi-Finals
7  2011    Semi-Finals
8  2012         Finals
9  2013         Winner
10 2014 Quarter-Finals

I wanted to create a chart showing Murray’s progress over the years with the round reached on the y axis and the year on the x axis. In order to do this I had to make sure the ’round’ column was being treated as a factor variable:

df = player_performance("Andy Murray", main_matches)
 
rounds = c("Did not enter", "Round of 128", "Round of 64", "Round of 32", "Round of 16", "Quarter-Finals", "Semi-Finals", "Finals", "Winner")
df$round = factor(df$round, levels =  rounds)
 
> df$round
 [1] Round of 32    Round of 16    Did not enter  Quarter-Finals Semi-Finals    Semi-Finals    Semi-Finals   
 [8] Finals         Winner         Quarter-Finals
Levels: Did not enter Round of 128 Round of 64 Round of 32 Round of 16 Quarter-Finals Semi-Finals Finals Winner

Now that we’ve got that we can plot his progress:

ggplot(aes(x = year, y = round, group=1), data = df) + 
    geom_point() + 
    geom_line() + 
    scale_x_continuous(breaks=df$year) + 
    scale_y_discrete(breaks = rounds)

2015 06 26 23 37 32

This is a good start but we’ve lost the rounds which don’t have a corresponding entry on the x axis. I’d like to keep them so it’s easier to compare the performance of different players.

It turns out that all we need to do is pass ‘drop = FALSE’ to scale_y_discrete and it will work exactly as we want:

ggplot(aes(x = year, y = round, group=1), data = df) + 
    geom_point() + 
    geom_line() + 
    scale_x_continuous(breaks=df$year) + 
    scale_y_discrete(breaks = rounds, drop = FALSE)

2015 06 26 23 41 01

Neat. Now let’s have a look at the performances of some of the other top players:

draw_chart = function(player, main_matches){
  df = player_performance(player, main_matches)
  df$round = factor(df$round, levels =  rounds)
 
  ggplot(aes(x = year, y = round, group=1), data = df) + 
    geom_point() + 
    geom_line() + 
    scale_x_continuous(breaks=df$year) + 
    scale_y_discrete(breaks = rounds, drop=FALSE) + 
    ggtitle(player) + 
    theme(axis.text.x=element_text(angle=90, hjust=1))
}
 
a = draw_chart("Andy Murray", main_matches)
b = draw_chart("Novak Djokovic", main_matches)
c = draw_chart("Rafael Nadal", main_matches)
d = draw_chart("Roger Federer", main_matches)
 
library(gridExtra)
grid.arrange(a,b,c,d, ncol=2)

2015 06 26 23 46 15

And that’s all for now!

Categories: Blogs

Creative Collaboration

Doc On Dev - Michael Norton - Fri, 06/26/2015 - 20:17
I had the pleasure of presenting at NDC Oslo last week and the additional privilege of co-presenting a collaboration workshop along with Denise Jacobs and Carl Smith.

Creative Collaboration: Tools for Teams from Doc Norton
In this workshop, we cover Fist to Five voting, 5x7 Prioritization, and Collaboration Contracts. We had around 30 attendees for the workshop, allowing us to create 4 groups of approximately 8 people each.

After some ice-breakers, groups came up with product ideas by mashing two random words together and using first to five voting to rapidly identify a product idea they could all agree on. This was easier for some groups than others. It was interesting to see the dynamics as some groups discussed each combination prior to voting, some groups created multiple options before voting, and other groups ripped through options and found their product in a manner of minutes (as intended). It is often difficult for us to give up old habits even in pursuit of a better way.

Next up was brainstorming and prioritizing a list of items that needed to be done in order to launch our new awesome concept at a key conference in only three months. We started with each individual member writing at least two items they thought were critically important to prepare for the conference. We then removed duplicate items for each group and used 5x7 prioritization to come up with the top most important items for each group. At the end of the process, teams agreed that the resultant priorities were good and many were surprised at how easy and equitable the process was.

Finally, each group took their top 4 items and ran collaboration contracts against them. We did this in two passes; running the basic contract and resolving conflicts. We had one group that ended up with no conflicts. The other groups worked through their conflicts in relatively short order and the quality of conversation was high throughout. One group realized that even after they resolved the obvious conflicts, they had one individual who was in a decision making role on all four items. While this is not technically a conflict on a contract, it does indicate an issue. After some additional discussion, they were able to adjust the overall contract to everyone's satisfaction and eliminate the potential bottleneck.

This was our first time delivering this workshop and I thought it went quite well.

I'm planning to add Parallel Thinking to the workshop along with a couple more games to create a solid half-day collaboration tools workshop that can work for teams or groups.

If you're interested in this workshop for your team, let me know. Maybe, if we're lucky, Denise and Carl can come along too.

Categories: Blogs

To be a Profession or to Unionize in the Software Industry?

Agile Complexification Inverter - Fri, 06/26/2015 - 18:49

Which form of industry growth would you prefer - why?

Which path leads toward the culture you desire in a software development organization?

This is a wonderful article on the topic - read it and discuss with your colleagues.


Programmers don’t need a union. We need a profession.  BY 
"Unions work best for commodity labor, and I use that term non-pejoratively. Commodity work is easily measurable and people can often be individually evaluated for performance. For example, a fishing boat operator is measured according to the quantity of fish she procures. A lot of very important work is commodity labor, so I don’t intend to disparage anyone by using that term. Commodity work can be unionized because there aren’t large and often intangible discrepancies in quality of output, and collective bargaining is often the best way to ensure that the workers are fairly compensated for the value they produce. Software is not commodity work, however. It’s difficult to measure quality, and the field is so specialized that engineers are not remotely interchangeable. When the work is difficult to measure and large disparities of quality exist, you have a situation in which a certain less-egalitarian (in the sense of allowing top performers to receive high compensation, because it’s essential to encourage people to improve themselves) and more self-regulatory structure is required: a profession.""A profession is an attempt to impose global structure over a category of specialized, cognitively intensive work where the quality of output has substantial ramifications, but is difficult (if not, in the short term, impossible) to measure, giving ethics and competence primary importance. A profession is needed when it’s clear that not everyone can perform the work well, especially without specialized training. Here are some traits that signify the existence of a profession."1) Ethical obligations that supersede managerial authority.
2) Weak power relationships.
3) Continuing improvement and self-direction as requirements.
4) Allowance for self-direction.
5) Except in egregious cases, an agreement between employee and firm to serve each others’ interests, even after employment ends. 



Categories: Blogs

What Creates Trust in Your Organization?

Johanna Rothman - Fri, 06/26/2015 - 17:16

I published my most recent newsletter, Creating Trustworthy Estimates, this past week. I also noted on Twitter that one person said his estimates created trust in his organization. (He was responding to a #noestimate post that I had retweeted.)

Sometimes, estimates do create trust. They provide a comfortable feeling to many people that you have an idea of what size this beast is. That’s why I offer solutions for a gross estimate in Predicting the Unpredictable. I have nothing against gross estimates.

I don’t like gross estimates (or even detailed estimates) as a way to evaluate projects in the project portfolio because estimates are guesses. Estimates are not a great way to understand and discuss the value of a project. They might be one piece of the valuation discussion, but if you use them as the only way to value a project, you are missing the value discussion you need to have. See Why Cost is the Wrong Question for Evaluating Projects in Your Project Portfolio.

I have not found that only estimates create trust. I have found that delivering the product  (or interim product) creates more trust.

Way back, when I was a software developer, I had a difficult machine vision project. Back then, we invented as we went. We had some in-house libraries, but we had to develop new solutions for each customer.

I had an estimate of 8 weeks for that project. I prototyped and tried a gazillion things. Finally, at 6 weeks, I had a working prototype. I showed it to my managers and other interested people. I finished the project and we shipped it.

Many years later, when I was a consultant, I encountered one of those managers. He said to me, “We held our breath for 6 weeks until you showed us a prototype. You had gone dark and we were worried. We had no idea if you would finish.”

By that time, I had managed people like me. I asked them for visual updates on their status each week or two. I had learned from my experiences.

I asked that manager why they held their breath. I always used an engineering notebook. I could have explained my status at any time to anyone who wanted it. He replied, “We so desperately wanted your estimate to be true. We were so afraid it wasn’t. We had no idea what to do. When you showed us a working prototype, that’s when we started to believe you could finish the project.”

They trusted my initial estimate. It’s a good thing they didn’t ask for updated estimates each week. I remember that project as a series of highs and lows.

That’s the problem with invention/innovation. You can keep track of your progress. You can determine ways to make progress. And, with the highs, your meet or beat your estimate. With the lows, you extend your estimate. I remember that at the beginning of week 5 I was sure I was not going to meet my date. Then, I discovered a way to make the project work. I remember my surprise that it was something “that easy.” It wasn’t easy. I had tracked my experiments in my notebook. There wasn’t much more I could do.

Since then, I asked my managers, “When do you want to know my project is in trouble? As soon as it I think I’m not going to meet my date; after I do some experiments; or the last possible moment?” I create trust when I ask that question because it shows I’m taking their concerns seriously.

After that project, here is what I did to create trust:

  1. Created a first draft estimate.
  2. Tracked my work so I could show visible progress and what didn’t work.
  3. Delivered often. That is why I like inch-pebbles. Yes, after that project, I often had one- or two-day deliverables.
  4. If I thought I wasn’t going to make it, use the questions above to decide when to say, “I’m in trouble.”
  5. Delivered a working product.

PredictingUnpredictable-smallEstimates can be useful. They can show you the risks. And, I’m sure that only having estimates is insufficient for building trust. If you want to learn more about estimation, see Predicting the Unpredictable: Pragmatic Approaches to Estimating Cost or Schedule.

Categories: Blogs

How Do I Know If Agile Is Working?

AvailAgility - Karl Scotland - Fri, 06/26/2015 - 17:10
Moving the queen

Moving the queen, Gabriel Saldana, CC BY-SA

“How do I know if Agile is working?” This is a question I’ve been asked a lot recently in one form or another. If not Agile, its Scrum, or Kanban or SAFe or something similar. My usual response is something along the lines of “How do you know of anything is working?” And there generally isn’t a quick and easy answer to that!

I’ve come to the view that Lean and Agile practices and techniques are simply tactics. They are tactics chosen as part of a strategy to be more Lean and Agile. And becoming more Lean and Agile are seen as important to make necessary breakthroughs in performance in order to deliver desired results.

With that perspective, then the answer to “How do I know if Agile is working?” is that you achieve the desired results. That’s probably a long time to wait to find out, however, as it is a trailing measure. It is necessary, therefore, to identify some intermediate improvements which might indicate the results are achievable, and leading measures can be captured to give hat earlier feedback.

The lack of a quick and easy answer to “How do you know if anything is working?” is often because Lean and Agile have been introduced as a purely tactical initiative, without any thought to how they relates to strategy, what measurable improvements they might bring, and how any strategy and improvements will lead to desirable results. In fact very few people (if any) know exactly what those desirable results are!

I’m increasingly trying to work the other way – what the Lean community call Strategy Deployment. For any transformation to work, everybody in the organisation needs to know what results are being strived for, what the strategic goals are that will enable the necessary changes to get there, and what measurable improvements will indicate progress. Then the whole organisation can be engaged in designing and implementing tactical changes which might lead to improvement. Everything becomes a hypothesis and an experiment, which can be tested, the results shared and adjustments made.

In other words, Strategy Deployment leads to organisations becoming laboratories, where Lean and Agile can inform hypothesis on strategies, improvements and tactics. I think its the secret sauce to any transformation, which is why I’ll be talking about it more at various conferences over the rest of the year.

The first one is Agile Cymru in a couple of weeks. There’s a few tickets left, and considering the line-up of speakers, and the ticket cost, its incredible value. I highly recommend going, and I hope to see you there!

 

Categories: Blogs

Git Subproject Compile-time Dependencies in Sbt

Xebia Blog - Fri, 06/26/2015 - 14:33

When creating a sbt project recently, I tried to include a project with no releases. This means including it using libraryDependencies in the build.sbt does not work. An option is to clone the project and publish it locally, but this is tedious manual work that needs to be repeated every time the cloned project changes.

Most examples explain how to add a direct compile time dependency on a git repository to sbt, but they just show how to add a single project repository as a dependency using an RootProject. After some searching I found the solution to add projects from a multi-project repository. Instead of RootProject the ProjectRef should be used. This allows for a second argument to specify the subproject in the reposityr.

This is my current project/Build.scala file:

import sbt.{Build, Project, ProjectRef, uri}

object GotoBuild extends Build {
  lazy val root = Project("root", sbt.file(".")).dependsOn(staminaCore, staminaJson, ...)

  lazy val staminaCore = ProjectRef(uri("git://github.com/scalapenos/stamina.git#master"), "stamina-core")
  lazy val staminaJson = ProjectRef(uri("git://github.com/scalapenos/stamina.git#master"), "stamina-json")
  ...
}

These subprojects are now a compile time dependency and sbt will pull in and maintain the repository in ~/.sbt/0.13/staging/[sha]/stamina. So no manual checkout with local publish is needed. This is very handy when depending on an internal independent project/module and without needing to create a new release for every change. (One side note is that my IntelliJ currently does not recognize that the library is on the class/source path of the main project, so it complains it cannot find symbols and therefore cannot do proper syntax checking and auto completing.)

Categories: Companies

Inspirational Quotes, Inspirational Life Quotes, and Great Leadership Quotes

J.D. Meier's Blog - Fri, 06/26/2015 - 05:51

I know several people looking for inspiration.

I believe the right words ignite or re-ignite us.

There is no better way to prime your mind for great things to come than filling your head and hear with the greatest inspirational quotes that the world has ever known.

Of course, the challenge is finding the best inspirational quotes to draw from.

Well, here you go …

3 Great Inspirational Quotes Collections at Your Fingertips

I revamped a few of my best inspirational quotes collections to really put the gems of insight at your fingertips:

  1. Inspirational Quotes – light a fire from the inside out, or find your North Star that pulls you forward
  2. Inspirational Life Quotes -
  3. Great Leadership Quotes – learn what great leadership really looks like and how it helps lifts others up

Each of these inspirational quotes collection is hand-crafted with deep words of wisdom, insight, and action.

You'll find inspirational quotes from Charles Dickens, Confucius, Dr. Seuss, George Bernard Shaw, Henry David Thoreau, Horace, Lao Tzu,  Lewis Carroll, Mahatma Gandhi, Oprah Winfrey, Oscar Wilde, Paulo Coelho, Ralph Waldo Emerson, Stephen King, Tony Robbins, and more.

You'll even find an inspirational quote from The Wizard of Oz (and it’s not “There’s no place like home.”)

Inspirational Quotes Jump Start

Here are a few of my favorites inspirational quotes to get you started:

“Courage doesn’t always roar. Sometimes courage is the quiet voice at the end of the day saying, ‘I will try again tomorrow.’”

Mary Anne Radmacher

“Do not follow where the path may lead. Go, instead, where there is no path and leave a trail.”

Ralph Waldo Emerson

“Don’t cry because it’s over, smile because it happened.”

Dr. Seuss

“It is not length of life, but depth of life.”

Ralph Waldo Emerson

“Life is not measured by the number of breaths you take, but by every moment that takes your breath away.”

Anonymous

“You live but once; you might as well be amusing.”

Coco Chanel

“It is never too late to be who you might have been.”

George Eliot

“Smile, breathe and go slowly.”

Thich Nhat Hanh

“What lies behind us and what lies before us are tiny matters compared to what lies within us.”

Ralph Waldo Emerson

These inspirational quotes are living breathing collections.  I periodically sweep them to reflect new additions, and I re-organize or re-style the quotes if I find a better way.

I invest a lot of time on quotes because I’ve learned the following simple truth:

Quotes change lives.

The right words, at the right time, can be just that little bit you need, to breakthrough or get unstuck, or find your mojo again.

Have you had your dose of inspiration today?

Categories: Blogs

R: Scraping Wimbledon draw data

Mark Needham - Fri, 06/26/2015 - 01:14

Given Wimbledon starts next week I wanted to find a data set to explore before it gets underway. Having searched around and failed to find one I had to resort to scraping the ATP World Tour’s event page which displays the matches in an easy to access format.

We’ll be using the Wimbledon 2013 draw since Andy Murray won that year! This is what the page looks like:

2015 06 25 23 47 16

Each match is in its own row of a table and each column has a class attribute which makes it really easy to scrape. We’ll be using R’s rvest again. I wrote the following script which grabs the player names, seedings and score of the match and stores everything in a data frame:

library(rvest)
library(dplyr)
library(stringr)
 
s = html_session("http://www.atpworldtour.com/en/scores/archive/wimbledon/540/2013/results")
rows = s %>% html_nodes("div#scoresResultsContent tr")
 
matches = data.frame()
for(row in rows) {  
  players = row %>% html_nodes("td.day-table-name a")
  seedings = row %>% html_nodes("td.day-table-seed")
  score = row %>% html_node("td.day-table-score a")
 
  if(!is.null(score)) {
    player1 = players[1] %>% html_text() %>% str_trim()
    seeding1 = ifelse(!is.na(seedings[1]), seedings[1] %>% html_node("span") %>% html_text() %>% str_trim(), NA)
 
    player2 = players[2] %>% html_text() %>% str_trim()
    seeding2 = ifelse(!is.na(seedings[2]), seedings[2] %>% html_node("span") %>% html_text() %>% str_trim(), NA)
 
    matches = rbind(data.frame(winner = player1, 
                               winner_seeding = seeding1, 
                               loser = player2, 
                               loser_seeding = seeding2,
                               score = score %>% html_text() %>% str_trim(),
                               round = round), matches)
 
  } else {
    round = row %>% html_node("th") %>% html_text()
  }
}

This is what the data frame looks like:

> matches %>% sample_n(10)
               winner winner_seeding                       loser loser_seeding            score                round
61      Wayne Odesnik            (4)                Thiago Alves          <NA>            61 64 1st Round Qualifying
4     Danai Udomchoke           <NA>            Marton Fucsovics          <NA>       61 57 1210 1st Round Qualifying
233    Jerzy Janowicz           (24)                Lukasz Kubot          <NA>         75 64 64       Quarter-Finals
90       Malek Jaziri           <NA>             Illya Marchenko           (9)        674 75 64 2nd Round Qualifying
222      David Ferrer            (4)         Alexandr Dolgopolov          (26) 676 762 26 61 62          Round of 32
54  Michal Przysiezny           (11)                 Dusan Lojda          <NA>         26 63 62 1st Round Qualifying
52           Go Soeda           (13)               Nikola Mektic          <NA>            62 60 1st Round Qualifying
42    Ruben Bemelmans           (23) Jonathan Dasnieres de Veigy          <NA>            63 64 1st Round Qualifying
31        Mirza Basic           <NA>              Tsung-Hua Yang          <NA>     674 33 (RET) 1st Round Qualifying
179     Jurgen Melzer           <NA>              Julian Reister           (Q)    36 762 765 62          Round of 64

It also contains qualifying matches which I’m not so interested in. Let’s strip those out:

main_matches = matches %>% filter(!grepl("Qualifying", round)) %>% mutate(year = 2013)

We’ll also put a column in for ‘year’ so that we can handle the draws for multiple years later on.

Next I wanted to clean up the data a bit. I’d like to be able to do some queries based on the seedings of the players but at the moment that column contains numeric brackets in values as well as some other values which indicate whether a player is a qualifier, lucky loser or wildcard entry.

I started by adding a column to store this extra information:

main_matches$winner_type = NA
main_matches$winner_type[main_matches$winner_seeding == "(WC)"] = "wildcard"
main_matches$winner_type[main_matches$winner_seeding == "(Q)"] = "qualifier"
main_matches$winner_type[main_matches$winner_seeding == "(LL)"] = "lucky loser"
 
main_matches$loser_type = NA
main_matches$loser_type[main_matches$loser_seeding == "(WC)"] = "wildcard"
main_matches$loser_type[main_matches$loser_seeding == "(Q)"] = "qualifier"
main_matches$loser_type[main_matches$loser_seeding == "(LL)"] = "lucky loser"

And then I cleaned up the existing column:

tidy_seeding = function(seeding) {
  no_brackets = gsub("\\(|\\)", "", seeding)
  return(gsub("WC|Q|L", NA, no_brackets))
}
 
main_matches = main_matches %>% 
  mutate(winner_seeding = as.numeric(tidy_seeding(winner_seeding)),
         loser_seeding = as.numeric(tidy_seeding(loser_seeding)))

Now we can write a query against the data frame to find out when the underdog won i.e. a player with no seeding beat a player with a seeding or a lower seeded player beat a higher seeded one:

> main_matches %>%  filter((winner_seeding > loser_seeding) | (is.na(winner_seeding) & !is.na(loser_seeding)))
                  winner winner_seeding                 loser loser_seeding                  score          round year
1          Jurgen Melzer             NA         Fabio Fognini            30           675 75 63 62   Round of 128 2013
2          Bernard Tomic             NA           Sam Querrey            21       766 763 36 26 63   Round of 128 2013
3        Feliciano Lopez             NA          Gilles Simon            19             62 64 7611   Round of 128 2013
4             Ivan Dodig             NA Philipp Kohlschreiber            16 46 676 763 63 21 (RET)   Round of 128 2013
5         Viktor Troicki             NA      Janko Tipsarevic            14              63 64 765   Round of 128 2013
6         Lleyton Hewitt             NA         Stan Wawrinka            11               64 75 63   Round of 128 2013
7           Steve Darcis             NA          Rafael Nadal             5             764 768 64   Round of 128 2013
8      Fernando Verdasco             NA      Julien Benneteau            31             761 764 64    Round of 64 2013
9           Grega Zemlja             NA       Grigor Dimitrov            29       36 764 36 64 119    Round of 64 2013
10      Adrian Mannarino             NA            John Isner            18               11 (RET)    Round of 64 2013
11         Igor Sijsling             NA          Milos Raonic            17              75 64 764    Round of 64 2013
12     Kenny De Schepper             NA           Marin Cilic            10                  (W/O)    Round of 64 2013
13        Ernests Gulbis             NA    Jo-Wilfried Tsonga             6         36 63 63 (RET)    Round of 64 2013
14     Sergiy Stakhovsky             NA         Roger Federer             3         675 765 75 765    Round of 64 2013
15          Lukasz Kubot             NA          Benoit Paire            25               61 63 64    Round of 32 2013
16     Kenny De Schepper             NA           Juan Monaco            22              64 768 64    Round of 32 2013
17        Jerzy Janowicz             24       Nicolas Almagro            15              766 63 64    Round of 32 2013
18         Andreas Seppi             23         Kei Nishikori            12        36 62 674 61 64    Round of 32 2013
19         Bernard Tomic             NA       Richard Gasquet             9          767 57 75 765    Round of 32 2013
20 Juan Martin Del Potro              8          David Ferrer             4              62 64 765 Quarter-Finals 2013
21           Andy Murray              2        Novak Djokovic             1               64 75 64         Finals 2013

There are actually very few times when a lower seeded player beat a higher seeded one but there are quite a few instances of non seeds beating seeds. We’ve got 21 occurrences of underdogs winning out of a total of 127 matches.

Let’s filter that set of rows and see which seeds lost in the first round:

> main_matches %>%  filter(round == "Round of 128" & !is.na(loser_seeding))
           winner winner_seeding                 loser loser_seeding                  score        round year
1   Jurgen Melzer             NA         Fabio Fognini            30           675 75 63 62 Round of 128 2013
2   Bernard Tomic             NA           Sam Querrey            21       766 763 36 26 63 Round of 128 2013
3 Feliciano Lopez             NA          Gilles Simon            19             62 64 7611 Round of 128 2013
4      Ivan Dodig             NA Philipp Kohlschreiber            16 46 676 763 63 21 (RET) Round of 128 2013
5  Viktor Troicki             NA      Janko Tipsarevic            14              63 64 765 Round of 128 2013
6  Lleyton Hewitt             NA         Stan Wawrinka            11               64 75 63 Round of 128 2013
7    Steve Darcis             NA          Rafael Nadal             5             764 768 64 Round of 128 2013

Rafael Nadal is the most prominent but Stan Wawrinka also lost in the first round that year which I’d forgotten about! Next let’s make the ’round’ column an ordered factor one so that we can sort matches by round:

main_matches$round = factor(main_matches$round, levels =  c("Round of 128", "Round of 64", "Round of 32", "Round of 16", "Quarter-Finals", "Semi-Finals", "Finals"))
 
> main_matches$round
...     
Levels: Round of 128 Round of 64 Round of 32 Round of 16 Quarter-Finals Semi-Finals Finals

We can now really easily work out which unseeded players went the furthest in the tournament:

> main_matches %>% filter(is.na(loser_seeding)) %>% arrange(desc(round)) %>% head(5)
             winner winner_seeding             loser loser_seeding           score          round year
1    Jerzy Janowicz             24      Lukasz Kubot            NA        75 64 64 Quarter-Finals 2013
2       Andy Murray              2 Fernando Verdasco            NA  46 36 61 64 75 Quarter-Finals 2013
3 Fernando Verdasco             NA Kenny De Schepper            NA        64 64 64    Round of 16 2013
4      Lukasz Kubot             NA  Adrian Mannarino            NA  46 63 36 63 64    Round of 16 2013
5    Jerzy Janowicz             24     Jurgen Melzer            NA 36 761 64 46 64    Round of 16 2013

Next up I thought it’d be cool to write a function which showed which round each player exited in:

round_reached = function(player, main_matches) {
  furthest_match = main_matches %>% 
    filter(winner == player | loser == player) %>% 
    arrange(desc(round)) %>% 
    head(1)  
 
    return(ifelse(furthest_match$winner == player, "Winner", as.character(furthest_match$round)))
}

Our function isn’t vectorisable – it only works if we pass in a single player at a time so we’ll have to group the data frame by player before calling it. Let’s check it works by seeing how far Andy Murray and Rafael Nadal got:

> round_reached("Rafael Nadal", main_matches)
[1] "Round of 128"
> round_reached("Andy Murray", main_matches)
[1] "Winner"

Great. What about if we try it against each of the top 8 seeds?

> rbind(main_matches %>% filter(winner_seeding %in% 1:8) %>% mutate(name = winner, seeding = winner_seeding), 
        main_matches %>% filter(loser_seeding %in% 1:8) %>% mutate(name = loser, seeding = loser_seeding)) %>%
    select(name, seeding) %>%
    distinct() %>%
    arrange(seeding) %>%
    group_by(name) %>%
    mutate(round_reached = round_reached(name, main_matches))
Source: local data frame [8 x 3]
Groups: name
 
                   name seeding  round_reached
1        Novak Djokovic       1         Finals
2           Andy Murray       2         Winner
3         Roger Federer       3    Round of 64
4          David Ferrer       4 Quarter-Finals
5          Rafael Nadal       5   Round of 128
6    Jo-Wilfried Tsonga       6    Round of 64
7         Tomas Berdych       7 Quarter-Finals
8 Juan Martin Del Potro       8    Semi-Finals

Neat. Next up I want to do a comparison between the round they reached and the round you’d expect them to get to given their seeding but that’s for the weekend!

I’ve put a CSV file containing all the data in this gist in case you want to play with it. I’m planning to scrape a few more years worth of data before Monday and add in some extra fields as well but in case I don’t get around to it the full script in this blog post is included in the gist as well so feel free to tweak it if tennis is your thing.

Categories: Blogs

How Kanban Can Help Your Team

Is your team overburdened with too much work? Are there too many conflicting priorities? Watch this webinar to find out how Kanban can help your team. About This Webinar David Neal, Developer Advocate for LeanKit, recalls his experiences getting started with Kanban, sharing stories and lessons learned. You’ll also get tips on how to: Make […]

The post How Kanban Can Help Your Team appeared first on Blog | LeanKit.

Categories: Companies

Knowledge Sharing


SpiraTeam is a agile application lifecycle management (ALM) system designed specifically for methodologies such as scrum, XP and Kanban.