Skip to content

Blogs

NDC talk on SOLID in slices not layers video online

Jimmy Bogard - Thu, 07/02/2015 - 20:21

The talk I gave at NDC Oslo 2015 is up on SOLID architecture in slices not layers:

https://vimeo.com/131633177

In it I talk about flipping this style architecture:

To one that focuses on vertical deliverable features:

Enjoy!

Post Footer automatically generated by Add Post Footer Plugin for wordpress.

Categories: Blogs

Agency Product Owner Training Starts in August

Johanna Rothman - Thu, 07/02/2015 - 16:42

We have an interesting problem in some projects. Agencies, consulting organizations, and consultants help their clients understand what the client needs in a product. Often, these people and their organizations then implement what the client and agency develop as ideas.

As the project continues, the agency manager continues to help the client identify and update the requirements. Because this a limited time contract, the client doesn’t have a product manager or product owner. The agency person—often the owner—acts as a product owner.

This is why Marcus Blankenship and I have teamed up to offer Product Owner Training for Agencies.

If you are an agency/consultant/outside your client’s organization and you act as a product owner, this training is for you. It’s based on my workshop Agile and Lean Product Ownership. We won’t do everything in that workshop. Because it’s an online workshop, you’ll work on your projects/programs in between our meetings.

If you are not part of an organization and you find yourself acting as a product owner, this training is for you. See Product Owner Training for Agencies.

Categories: Blogs

SAFe 4.0 Sneak Peek 2!

Agile Product Owner - Wed, 07/01/2015 - 21:39

Note: This post was updated on July 16, 2015

Hi Everyone,

Since our last update, the team has been working on the next revision of the Big Picture, V4.0. In so doing, we have taken every opportunity to test assumptions and learn new insights from our practitioners in the field: SPCs, Agile coaches, and leaders who deal with different aspects of scaling development within large enterprises. Multiple SPC testing sessions in Boulder, CO, Herndon, VA, and Sydney, Australia are the latest examples. These sessions helped us finalize the current draft of the BP for SAFe 4.0. Thanks to all who contributed!

As a result, we have developed a more modular – and more expandable – version of the Big Picture. And as many of you know from our prior releases of SAFe LSE, the Expanded view  incorporates the LSE-developed content and provides guidance for teams building larger cyber-physical systems and really big software solutions. The Expanded view is below:

11

The Collapsed view provides guidance for cases when the value streams can be delivered by a single ART (50–125) people.

1

Obviously, there are other changes as well. These include:

  • New treatment and inclusion of Kanban systems at Portfolio, Value Stream and Program levels. Designed to be somewhat self-explanatory, but time will tell
  • The addition of the Enterprise icon; highlighting the connection of the program portfolio to the enterprise strategy
  • The inclusion of the Customer and Solution Context
  • Program and Solution epics
  • A new treatment for Solution Intent that now shows the dynamic nature of fixed and variable aspects. It will also serve as a launching pad for discussions of adaptive requirements and design, set-based development, and agile contracts
  • Value Stream Coordinator – a VS equivalent of the RTE
  • Solution Architect – multiple ART value streams typically require architectural governance and guidance
  • Update to Engineering and Quality Practices include the “XP” inspired attribution
  • Communities of Practice – this important construct, orthogonal to cross-functional teams, has been used in SAFe, but now has its now object for better articulation
  • “Release Any Time” – better wording (we hope) to explain the fact that Releases do not have to occur on the cadence
  • Explicit representation of the value stream, solution and customer in the collapsed view. Obviously, value streams always exist,  regardless of the the size of the solution.
  • The new “spanning palette”, which illustrates for example, how Metrics, Vision,  Roadmap and Milestones (new object) occur at multiple levels
  • The newer third dimension (grey background), which will be used to navigate to Lean Values, SAFe Principles, and Implementation content. Lean-Agile Leaders have been repositioned there as well.
  • New softer color palette

We appreciate comments and thoughts on this future version. With every increment, we evolve one step forward, together.

Thank you and stay tuned for new updates!

—The SAFe Framework Team: Dean, Alex, Richard, Inbar

Categories: Blogs

End-to-end Hypermedia: Building a React Client

Jimmy Bogard - Wed, 07/01/2015 - 18:06

In the last post, I walked through what is to me the most interesting part of REST – the client. It’s easy to build a server API, but no API is complete without someone actually using that API. This is where most REST examples fall down for me – they show all sorts of pretty pictures of hypermedia-rich JSON from the server, but no real examples of how to consume that API.

I walked through some jQuery code in the last post, but why stop with jQuery? That’s so 2010. Instead, I want to build around React. React is perfect for hypermedia because of its component-oriented nature. A resource’s representation can be broken down into its components, and React components then matched accordingly. But before we get into the client, I’ll need to modify my sample to consume React.

Installing React

As a shortcut, I’m just going to use ReactJS.Net to build React into my existing MVC app. I install the ReactJS.Net NuGet package, and add a script reference to my downloaded react.js library. Normally, I’d go through the whole Bower/npm path, but this seemed like the simplest path to integrate into my sample.

I’m going to create just a blank JSX file for all my React components for this page, and slim down my Index view to the basics:

<h2>Instructors</h2>
<div id="content"></div>
@section scripts{
    <script src="@Url.Content("~/Scripts/react-0.13.3.js")"></script>
    <script src="@Url.Content("~/Scripts/InstructorInfo.jsx")"></script>
    @{
        var href = Url.Action("Index", "Instructor", new {httproute = ""});
    }
    <script>
        React.render(
            React.createElement(InstructorsInfo, {href: '@href'}),
            document.getElementById("content")
        );
    </script>
}

All of the div placeholders are removed except one, for content. I pull in the React library and my custom React components. The ReactJS.Net package takes my JSX file and transpiles it into Javascript (as well as builds the needed files for in-browser debugging). Finally, I render my base React component, passing in the root URL for kicking off the initial request for instructors, and the DOM element in which to render the React component into.

Once I’ve got the basic React library up and running, it’s time to figure out how we would like to componentize our page.

Slicing our Page

If we look at the page we want to create, we need to take this page and create React components from the parts we find. Here’s our page from before:

Looking at this, I see three individual tables populated with collection+json data. I’m thinking I create one overall component composed of three individual items. Inside the table, I can break things up into the table, rows, header, cells and links:

I might need a few more, but this is a good start. Next, we can start building our React components.

React Components

First up is our overall component that contains our three tables of collection+json data. Since I have an understanding of what’s getting returned on the server side, I’m going to make an assumption that I’m building out three tables, and I can navigate links to drill down to more. Additionally, this component will be responsible for making the initial AJAX call and keeping the overall state. State is important in React, and I’ve decided to keep the parent component responsible for the resource state rather than each table. My InstructorInfo component is:

class InstructorsInfo extends React.Component {
  constructor(props) {
    super(props);
    this.state = {
      instructors: { },
      courses: { },
      students: { }
    };
    this._handleSelect = this._handleSelect.bind(this);
  }
  componentDidMount() {
    $.getJSON(this.props.href)
      .done(data => this.setState({ instructors: data }));
  }
  _handleSelect(e) {
    $.getJSON(e.href)
      .done(data => {
        var state = e.rel === "courses"
          ? { students: {}}
          : {};

        state[e.rel] = data;

        this.setState(state);
      });
  }
  render() {
    return (
      <div>
        <CollectionJsonTable data={this.state.instructors}
          onSelect={this._handleSelect} />
        <CollectionJsonTable data={this.state.courses}
          onSelect={this._handleSelect} />
        <CollectionJsonTable data={this.state.students}
          onSelect={this._handleSelect} />
      </div>
    )
  }
}

I’m using ES6 here, which makes building React components a bit nicer to work with. I first declare my React component, extending from React.Component. Next, in my constructor, I set up the initial state, a object with empty values for the instructors/courses/students state. Finally, I set up the binding for a callback function to bind to the React component as opposed to the function itself.

In the componentDidMount function, I perform the initial AJAX call and set the instructors collection state based on the data that gets back. The URL I use to make the initial call is based on the “href” of my components properties.

The _handleSelect function is the callback of the clicked link way down on one of the tables. I wanted to have the parent component manage fetching new collections instead of a child component figuring out what to do. That method makes the AJAX call based on the “href” passed in from the collection+json data, gets the state back and updates the relevant state based on the “rel” of the link. To make things easy, I matched up the state’s property names to the rel’s I knew about.

Finally, the render function just has a div with my three CollectionJsonTable components, binding up the data and select functions. Let’s look at that component next:

class CollectionJsonTable extends React.Component {
  render() {
    if (!this.props.data.collection) {
      return <div></div>;
    }
    if (!this.props.data.collection.items.length){
      return <p>No items found.</p>;
    }

    var containsLinks = _(this.props.data.collection.items)
      .some(item => item.links && item.links.length);

    var rows = _(this.props.data.collection.items)
      .map((item, idx) => <CollectionJsonTableRow
        item={item}
        containsLinks={containsLinks}
        onSelect={this.props.onSelect}
        key={idx}
        />)
      .value();

    return (
      <table className="table">
        <CollectionJsonTableHeader
          data={this.props.data.collection.items}
          containsLinks={containsLinks} />
        <tbody>
          {rows}
        </tbody>
      </table>
    );
  }
}

This one is not quite as interesting. It only has the render method, and the first part is just to manage either no data or empty data. Since my data can conditionally have links, I found it easier to inform child components whether or not links exist (through the lodash code), rather than every component having to re-figure this out.

To build up each row, I map the collection+json items to CollectionJsonTableRow components, setting up the necessary props (the item, containsLinks, onSelect and key items). In React, there’s no event aggregator so I have to pass down a callback function to the lowest component via properties all the way down. Finally, since I’m building a collection of components, it’s best practice to put some sort of key on these items so that React knows how to re-render correctly.

The final rendered component is a table with a CollectionJsonTableHeader and the rows. Let’s look at that header next:

class CollectionJsonTableHeader extends React.Component {
  render() {
    var headerCells = _(this.props.data[0].data)
      .map((datum, idx) => <th key={idx}>{datum.prompt}</th>)
      .value();

    if (this.props.containsLinks) {
      headerCells.push(<th key="links"></th>);
    }

    return (
      <thead>
        <tr>
          {headerCells}
        </tr>
      </thead>
    );
  }
}

This component also only has a render method. I map the data items from the first item in the collection, producing header cells based on the prompt from the collection+json data. If the collection contains links, I’ll add an empty header cell on the end. Finally, I render the header with the header cells in a row.

With the header done, I can circle back to the CollectionJsonTableRow:

class CollectionJsonTableRow extends React.Component {
  render() {
    var dataCells = _(this.props.item.data)
      .map((datum, idx) => <td key={idx}>{datum.value}</td>)
      .value();

    if (this.props.containsLinks) {
      dataCells.push(<CollectionJsonTableLinkCell
        key="links"
        links={this.props.item.links}
        onSelect={this.props.onSelect} />);
    }

    return (
      <tr>
        {dataCells}
      </tr>
    );
  }
}

The row’s responsibility is just to build up the collection of cells, plus the optional CollectionJsonTableLinkCell. As before, I have to pass down the callback for the link clicks. Similar to the header cells, I fill in the data value (instead of the prompt). Next up is our link cell:

class CollectionJsonTableLinkCell extends React.Component {
  render() {
    var links = _(this.props.links)
      .map((link, idx) => <CollectionJsonTableLink
        key={idx}
        link={link}
        onSelect={this.props.onSelect} />)
      .value();

    return (
      <td>{links}</td>
    );
  }
}

This one isn’t so interesting, it just loops through the links, building out a CollectionJsonTableLink component, filling in the link object, key, and callback. Finally, our CollectionJsonTableLink component:

class CollectionJsonTableLink extends React.Component {
  constructor(props) {
    super(props);
    this._handleClick = this._handleClick.bind(this);
  }
  _handleClick(e) {
    e.preventDefault();
    this.props.onSelect({
      href : this.props.link.href,
      rel: this.props.link.rel}
    );
  }
  render() {
    return (
      <a href='#' rel={this.props.link.rel} onClick={this._handleClick}>
        {this.props.link.prompt}
      </a>
    );
  }
}
CollectionJsonTableLink.propTypes = {
  onSelect: React.PropTypes.func.isRequired
};

The link clicks are the most interesting part here. I didn’t want my link itself to have the behavior of what to do on click, so I call my “onSelect” prop in the click event from my link. The _handleClick method calls the onSelect method, passing in the href/rel from the collection+json link object. In my render method, I just output a normal anchor tag, with the rel and prompt from the link object, and the onClick event bound to the _handleClick method. Finally, I indicate that the onSelect prop is required, so that I don’t have to check for its existence when the link is clicked.

With all these components, I’ve got a working example:

I found working with hypermedia and React to be a far nicer experience than just raw jQuery. I could reason about individual components at the same level as the hypermedia controls, matching what I was building much more effectively to the resource representation returned. I still have to have some sort of knowledge of how I’m going to navigate the links and what to do, but that logic is all encapsulated in my topmost component.

Each of the sub-components aren’t tied to my overall logic and can be re-used as much as I want across my application, allowing me to use collection+json extensively and not worry about having to parse the result again and again. I’ve got a component that can effectively render a nice table based on a collection+json representation.

Next, we’ll kick things up a notch and build out a React.Native implementation, pushing the limit of hypermedia with a dynamic native mobile client.

Post Footer automatically generated by Add Post Footer Plugin for wordpress.

Categories: Blogs

Pitfall of Scrum: Excessive Preparation/Planning

Learn more about our Scrum and Agile training sessions on WorldMindware.com

Regular big up-front planning is not necessary with Scrum. Instead, a team can just get started and use constant feedback in the Sprint Review to adjust it’s plans. Even the Product Backlog can be created after the first Sprint has started. All that is really necessary to get started is a Scrum Team, a product vision, and a decision on Sprint length. In this extreme case, the Scrum Team itself would decide what to build in its first Sprint and use the time of the Sprint to also prepare some initial Product Backlog Items. Then, the first Sprint Review would allow stakeholders to provide feedback and further develop the Product Backlog. The empirical nature of Scrum could even allow the Product Owner to emerge from the business stakeholders, rather than being assigned to the team right from the start.

Starting a Sprint without a Product Backlog is not easy, but it can be done. The team has to know at least a little about the business, and there should be some (possibly informal) project or product charter that they are aware of. The team uses this super basic information and decides on their own what to build in their first Sprint. Again, the focus should be on getting something that can be demoed (and potentially shippable). The team is likely to build some good stuff and some things that are completely wrong… but the point is to get the Inspect and Adapt cycle started as quickly as possible. Which means of course that they need to have stakeholders (customers, users) actually attend the demo at the end of the Sprint. The Product Owner may or may not even be involved in this first Sprint.

One important reason this is sometimes a good approach is the culture of “analysis paralysis” that exists in some organizations. In this situation, an organization is unable to do anything because they are so concerned about getting things right. Scrum is a framework for inspect and adapt and that can (and does) include the Product Backlog. Is it better for a team to sit idle while someone tries to do sufficient preparation? Or is it better to get started and inspect and adapt? This is actually a philosophical question (as well as a practical question). The mindset and philosophy of the Agile Manifesto and Scrum is that trying to produce valuable software is more important that documentation… that individuals and how they work together is more important than rigidly following a process or tool. I will agree that in many cases it is acceptable to do some up-front work, but it should be minimized, particularly when it is preventing people from starting to deliver value. The case of a team getting started without a product backlog is rare… but it can be a great way for a team to help an organization overcome analysis paralysis.

The Agile Manifesto is very clear: “The BEST architectures, requirements and designs emerge out of self-organizing teams.” [Emphasis added.]

Hugely memorable for me is the story that Ken Schwaber told in the CSM course that I took from him in 2003.  This is a paraphrase of that story:

I [Ken Schwaber] was talking to the CIO of a large IT organization.  The CIO told me that his projects last twelve to eighteen months and at the end, he doesn’t get what he needs.  I told him, “Scrum can give you what you don’t need in a month.”

I experienced this myself in a profound way just a couple years into my career as an Agile coach and trainer.  I was working with a department of a large technology organization.  They had over one hundred people who had been working on Agile pilot projects.  The department was responsible for a major product and executive management had approved a complete re-write.  The product managers and Product Owners had done a lot of work to prepare a product backlog (about 400 items!) that represented all the existing functionality of the product that needed to be re-written.  But, the big question, “what new technology platform do we use for the re-write?” had not yet been resolved.  The small team of architects were tasked with making this decision.  But they got stuck.  They got stuck for three months.  Finally, the director of the department, who had learned to trust my advice in other circumstances, asked me, “does Scrum have any techniques for making these kind of architectural decisions?”

I said, “yes, but you probably won’t like what Scrum recommends!”

She said, “actually, we’re pretty desperate.  I’ve got over a hundred people effectively sitting idle for the last three months.  What does Scrum recommend?”

“Just start.  Let the teams figure out the platform as they try to implement functionality.”

She thought for a few seconds.  Finally she said, “okay.  Come by this Monday and help me launch our first Sprint.”

The amazing thing was that the teams didn’t lynch me when on Monday she announced that “our Agile consultant says we don’t need to know our platform in order to get started.”

The first Sprint (two weeks long) was pretty chaotic.  But, with some coaching and active support of management, they actually delivered a working increment of their product.  And decided on the platform to use for the rest of the two-year project.

You must trust your team.

If your organization is spending more than a few days preparing for the start of a project, it is probably suffering from this pitfall.  This is the source of great waste and lost opportunity.  Use Scrum to rapidly converge on the correct solutions to your business problems instead of wasting person-years of time on analysis and planning.  We can help with training and coaching to give you the tools to start fast using Scrum and to fix your Scrum implementation.

This article is a follow-up article to the 24 Common Scrum Pitfalls written back in 2011.

Try out our Virtual Scrum Coach with the Scrum Team Assessment tool - just $500 for a team to get targeted advice and great how-to informationPlease share!
facebooktwittergoogle_plusredditpinterestlinkedinmail

The post Pitfall of Scrum: Excessive Preparation/Planning appeared first on Agile Advice.

Categories: Blogs

On Running LeadingAgile

Leading Agile - Mike Cottmeyer - Wed, 07/01/2015 - 03:16

There is so much I’ve been wanting to write the past year or so about the business of LeadingAgile. Those of you following our blog for a while will know that it wasn’t all that long ago that I was working at VersionOne, left for Pillar, and then started out as an independent consultant and formed LeadingAgile.

I got really busy really fast and quickly started selling more work than I could do alone. So I began growing the team. Over the past few years we’ve built a really awesome group of consultants and an equally awesome group of support staff to help us run our operations. There are about 40 of us now and we are still growing.

I’ve always felt that our approach was unique enough that we needed a team of dedicated folks that were totally bought into our system, could grow with us as we evolved our understanding of our models, and were readily available when we needed them. For those reasons, we’ve always hired rather than using subcontractors.

People ask me all the time how we hire at LeadingAgile given that many of our contracts are short term and we never quite know where the work is going to come from. Lot’s of companies use subcontractors to mitigate that risk, but since we’ve decided to absorb that risk internally, we have to have a plan.

That’s a little of what I want to share with you today.

Starting LeadingAgile

We started LeadingAgile with effectively no money in the bank. There was no venture funding, no nest egg to draw on, not even a credit card to live off of when I got this started. Risk management went something like this… how much do I need to pay my bills and where do I think I can go to get that work.

I was fortunate to have friends in the industry that said they could subcontract work to me if I needed it… and that was my safety net. By that time I was pretty well networked here in Atlanta, so I took a chance. Fortunately for my wife and kids… everything worked out. We didn’t go bankrupt.

I started earning more money than we were spending, so we used the extra to get out of debt and build working capital. After about a year and a half, we used some of that working capital to hire our first consultant. Having someone to share the workload allowed me to write more, speak more, and sell more.

As our client list grew, we began hiring more consultants and building our staff. Over time we’ve gotten LeadingAgile to the size it is today, and we’ve built a foundation that can support doubling or tripling our consulting team as demand continues to rise. We’ve learned a lot doing it.

Running LeadingAgile

Over time, I’ve learned that we have four major risk areas… variables maybe… that we have to constantly manage in order to successfully run our company.

1. Working Capital – at any given time we need to have sufficient working capital avaiable to weather a storm or to invest in new opportunities. As we make hiring and growth decisions, we never know which of those new opportunities are going to play out. Understanding how long we can maintain the company without having to let anyone go, or adjust our strategy, is a key factor in decision making.

2. Utilization – We always have the choice to deliver work or to hold off. We can often work with our clients to go faster or slower depending on what they are trying to accomplish and the availability of our team. Understanding the rate we want to burn down our backlog of work is critical. Consultants on the bench are like spoiled inventory. Once a day is passed, you can never bill for it.

3. Sales – We are constantly marketing LeadingAgile and our ideas. We are constantly building relationships with people that might buy from us now or someday in the future. We can attempt to accelerate our sales pipeline or slow it down. Sometimes we don’t want a deal to close because we can’t effectively service it. Other times we need deals to close because we have people on the bench.

4. Growth – If we sell more business than we have staff to deliver, then we have to think about growing our team. We can’t grow our team just in time, so we have to invest, bringing people in early, getting them up to speed with our approach, and having them shadow our current consultants, so they are fully prepared to be part of a transformation team if and when the new business gets signed.

If you graph these variables, it looks something like this…

graph

The green line represents the maximum revenue we could realize in a given month. The red line represents our break even revenue for the month. The blue line shows what we are actually expecting to deliver in any given month. The general theory is that if we stay in between the maximum and minimum every month, we are in a good place and can sustain the company.

But here is the deal… we know our fixed cost, we know our variable costs, we sometimes can predict utilization in a given month, but more than a month or two out is a crap shoot. Even within a given month lots of things can happen. We never know exactly what deals are going to sell, what deals or going to continue, or what deals are going to end. The uncertainty at times can be mind-numbing.

One of my folks said yesterday, it feels like we are always on the verge of having 10 people on the bench or hiring 10 people, we just don’t know which.

So here is how we think about managing the company in the face of this kind of extreme uncertainty.

Our job is make sure that we have as many options available as possible at all times to maximize the chances of good stuff happening. We always want to have a fall back plan if something bad happens.

For us that means the following…

1. Working Capital – We always want to have enough working capital to survive if our revenue falls below the break even threshold in a given month. Ideally, I’d like to be able to survive 6-12 months running at a loss. Now granted, if that happened there would be other fundamentals to work on, and maybe we’d have to acknowledge we were spending too hot or had a bad strategy, but having financial safety prevents you from making rash decisions.

2. Utilization – We try to create options with our clients so we have some room on existing engagements in case one clients slows down and another client needs some extra help. We encourage and support our consultants bringing other team members into their engagements whenever possible, even if it’s not in a billable capacity, so that there is greater familiarity with each other and we all know how each other work. The better we work together, the more portable we are between accounts.

3. Sales – We try to maintain about four times the level of active business development than we think we could ever win, or that we think we could even do, because you never know when something is going to fall through. You can never directly influence the buying cycle of a client, so having lots of opportunities in the pipeline at any one time increases the chances that something will work out when you need it. We spend a ton of time educating folks on our approach before the sale, so managing the pipe is a lot of work.

4. Growth – And when we are really good at all this, firing on all cylinders, and everything closes at once, we might find ourselves with more work than we can actually do. We maintain relationships with a ton of great consultants out in the industry, and are always on the lookout for more great people to add to the team. When we need to hire, it’s important that we have a list of folks that are ready to come work for us when we need them. Our people (and our approach) are fundamentally what we sell.

What this basically means is that we are constantly forecasting forward, assessing the probability that utilization is going to be high, that clients are going to continue, or that new deals are going to close. We assess our level of working capital and regularly place bets on future outcomes. We bet on deals, we bet on people, we bet on cash flow. The goal is to manage risk such that we make more good bets than bad.

Our experience has been that if we do the right things with our money, we are constantly doing the right things for our clients, we are super diligent about managing the sales process and making sure we are talking to new companies all the time, and we always have a solid pipeline of great people to come work for us… we have a shot at making all this work. So far, we’ve had a good run.

The key point in all of this, is that nothing in anything we do is actually knowable. We can model it. We can make informed predictions. But we can never know it for sure. We live in a world of creating options, managing risk, making good decisions, and having safety in the bank if things don’t play out like we’ve planned. Having more options than we need is they key to sustaining and growing our company.

In Conclusion

As I take a step back and look at where we’ve been over the past five years, it’s a really fascinating journey. Applying concepts like Lean Startup and Real Options when you’ve got the livelihood of forty families, and the trust of a hundred or more clients, riding on your success is pretty sobering. For me it’s a testament to how powerful these concepts are when you strive to authentically apply them to your work.

 

The post On Running LeadingAgile appeared first on LeadingAgile.

Categories: Blogs

R: write.csv – unimplemented type ‘list’ in ‘EncodeElement’

Mark Needham - Wed, 07/01/2015 - 00:26

Everyone now and then I want to serialise an R data frame to a CSV file so I can easily load it up again if my R environment crashes without having to recalculate everything but recently ran into the following error:

> write.csv(foo, "/tmp/foo.csv", row.names = FALSE)
Error in .External2(C_writetable, x, file, nrow(x), p, rnames, sep, eol,  : 
  unimplemented type 'list' in 'EncodeElement'

If we take a closer look at the data frame in question it looks ok:

> foo
  col1 col2
1    1    a
2    2    b
3    3    c

However, one of the columns contains a list in each cell and we need to find out which one it is. I’ve found the quickest way is to run the typeof function over each column:

> typeof(foo$col1)
[1] "double"
 
> typeof(foo$col2)
[1] "list"

So ‘col2′ is the problem one which isn’t surprising if you consider the way I created ‘foo':

library(dplyr)
foo = data.frame(col1 = c(1,2,3)) %>% mutate(col2 = list("a", "b", "c"))

If we do have a list that we want to add to the data frame we need to convert it to a vector first so we don’t run into this type of problem:

foo = data.frame(col1 = c(1,2,3)) %>% mutate(col2 = list("a", "b", "c") %>% unlist())

And now we can write to the CSV file:

write.csv(foo, "/tmp/foo.csv", row.names = FALSE)
$ cat /tmp/foo.csv
"col1","col2"
1,"a"
2,"b"
3,"c"

And that’s it!

Categories: Blogs

Should I Patch Built-In Objects / Prototypes? (Hint: NO!)

Derick Bailey - new ThoughtStream - Tue, 06/30/2015 - 15:49

A question was asked via twitter:

@derickbailey @rauschma What do you think about adding methods on built-in prototypes (e.g. String.prototype)?

— Boris Kozorovitzky (@zbzzn) June 30, 2015

So, I built a simple flow chart to answer the question (created w/ draw.io)

Should i patch this

All joking aside, there’s only 1 situation where you should patch a built in object or prototype: if you are building a polyfill to accurately reproduce new JavaScript features for browsers and runtimes that do not support the new feature yet.

Any other patch of a built-in object is cause for serious discussion of the problems that will ensue.

And if you think you should be writing a polyfill, stop. Go find a community supported and battle-tested polyfill library that already provides the feature you need.

Want some examples of the problems?

Imagine this: you have a global variable in a browser, called “config”. Do you think anyone else has ever accidentally or purposely created a “config” variable? What happens when you run in to this situation, and your code is clobbered by someone else’s code because they use the same variable name?

Now imagine this being done on built-in objects and methods, where behaviors are expected to be consistent and stable. If I patch a “format” method on to the String.prototype, and then you load a library that patches it with different behavior, which code will continue working? How will I know why my format function is now failing? What happens when you bring in a new developer and you forget to educate them on the patched and hacked built-in objects in your system?

Go read up on “monkey-patching” in the Ruby community. They learned these lessons the hard way, YEARS ago. You will find countless horror stories and problems caused by this practice.

Here are some examples, to get you started.

But, what if …

NO! There is always a better way to get the feature you need. Decorator / wrapper objects are a good place to start. Hiding the implementation behind your API layer, where you actually need the behavior is also a good place to be.

The point is…

DO NOT PATCH THE BUILT-IN OBJECTS OR PROTOTYPES

Ever.

Your code, your team and your sanity will thank you.

Categories: Blogs

Story Splitting: Where Do I Start?

Leading Agile - Mike Cottmeyer - Tue, 06/30/2015 - 14:16

I don’t always follow the same story splitting approach when I need to split a story. It has become intuitive for me so I might not be able to write about everything I do or what goes through my mind or how I know. But I can put here what comes to mind at the moment:

Look at your acceptance criteria. There is often some aspect of business value in each acceptance criteria that can be split out into a separate story that is valuable to the Product Owner.

Consider the tasks that need to be done. Can any of them be deferred (to a later sprint)? (And  no, testing is not a task that can be deferred to a later sprint.) If so, then consider whether any of them are separately valuable to the Product Owner. If so, perhaps that would be a good story to split out.

If there are lots of unknowns, if it’s a 13 point story because of unanswered questions, make a list of the questions and uncertainties. For each, ask whether it’s a Business Analyst (BA) to-do or a Tech to-do. Also ask for each whether it’s easy and should be considered “grooming”. If it’s significant enough and technical, maybe you should split that out as a Research Spike. Then make an assumption about the likely outcome of the spike, or the desired outcome of the spike, note the assumption in the original story, and reestimate the original story given the assumption.

Look in the story description for conjunctions since and’s and or’s are a clue that the story may be doing too much. Consider whether you can split the story along the lines of the conjunctions.

Other Story Splitting ideas:
  • Workflow steps: Identify specific steps that a user takes to accomplish the specific workflow, and then implement the work flow in incremental stages
  • Business Rule Variations
  • Happy path versus error paths
  • Simple approach versus more and more complex approaches
  • Variations in data entry methods or sources
  • Support simple data 1st, then more complex data later
  • Variations in output formatting, simple first, then complex
  • Defer some system quality (an “ility”). Estimate or interpolate first, do real-time later. Support a speedier response later.
  • Split out parts of CRUD. Do you really really really need to be able to Delete if you can Update or Deactivate? Do you really really really need to Update if you can Create and Delete? Sure, you may need those functions, but you don’t have to have them all in the same sprint or in the same story.

Some of the phrases in the above list may be direct quotes or paraphrases from Dean Leffingwell’s book “Agile Software Requirements”.

The post Story Splitting: Where Do I Start? appeared first on LeadingAgile.

Categories: Blogs

Dev Lunch as a Power Tool

At my current company I’ve been going out to lunch pretty much every day I’m in the office. I know a lot of developers bring their lunches and like to eat alone in silence, but I’m probably less on the introvert side. I’ve always made it a point to have lunches with developers on a regular basis, but my current standing lunches with have been an evolution of that.

Breaking bread and catching up on stories, families, etc are a great way to bond with developers you don’t work with on a regular basis. Most of the engineers I go to lunch with work on another team, so I’m constantly keeping up to date with their current work and letting them know about what my team is doing. As a bonus I get out of the office and eat some hot food, though not necessarily the healthiest fare. Sharing and communication between teams really helps even in a flat startup organization like ours. I know some companies have meals catered in the office which works as well, but if you’re at a regular place, lunch out can work just as well.

As a suggestion if you tend to eat your lunches alone or at your desk, try to make a habit of eating out with some other devs or even other employees at least once a week.

Categories: Blogs

Do What Works… Even If It’s Not Agile.

Leading Agile - Mike Cottmeyer - Mon, 06/29/2015 - 22:19

I think I’ve come to the conclusion that ‘agile’ as we know it isn’t the best starting place for everyone that wants to adopt agile. Some folks, sure… everyone, probably not.

For many companies something closer to a ‘team-based, iterative and incremental delivery approach, using some agile tools and techniques, wrapped within a highly-governed Lean/Kanban based program and portfolio management framework’ is actually a better place to start.

Why?

Well, many organizations really struggle forming complete cross-functional teams, building backlogs, producing working tested software on regular intervals, and breaking dependencies. In the absence of these, agile is pretty much impossible.

Scrum isn’t impossible, mind you.

Agile is impossible.

So how does a ‘team-based, iterative and incremental delivery approach, using some agile tools and techniques, wrapped within a highly-governed Lean/Kanban based program and portfolio management framework’ actually work?

Let me explain.

First, I want to form Scrum-like teams around as much of the delivery organization as I can. I’ll form teams around shared components, feature teams, services teams, etc. Ideally, I’d like to form teams around objects that would be end up being real Scrum teams in some future state.

These Scrum-like teams operate under the same rules as a normal Scrum team, they are complete and cross-functional, internally self-organizing, same roles, ceremonies, and artifacts as a normal Scrum team, but with much higher focus on stabilizing velocity and less on adaptation.

Why ‘Scrum-like’ teams?

Dependencies. #dependenciesareevil

These teams have business process and requirements dependencies all around them. They have architectural dependencies between them. They have organizational dependencies due to the current management structure and likely some degree of matrixing.

Until those dependencies are broken, it’s tough to operate as a small independent Scrum team that can inspect and adapt their way into success. Those dependencies have to be managed until they can be broken. We can’t pretend they aren’t there.

How do I manage dependencies?

This is where the ‘Lean/Kanban’ based program and portfolio governance comes in. Explicitly model the value stream from requirements identification all the way through delivery. Anyone that can’t be on a Scrum team gets modeled in this value stream.

We like to form small, dedicated, cross-functional teams (explicitly not Scrum teams) to decompose requirements, deal with cross-cutting concerns, flow work into the Scrum-like teams, and coordinate batch size along with all the upstream and downstream coordination.

Early on, we might be doing 3-6-9 even 12-18 month roadmapping, creating 3-6 month feature level release plans, and fine grained, risk adjusted release backlogs at the story level. The goal is to nail quarterly commitments and to start driving visibility for longer term planning.

Not agile?

Don’t really care, this is a great first step toward untangling a legacy organization that is struggling to get a foot hold adopting agile. Many companies we work with, this is agile enough, but ideally this is only the first step toward greater levels of agile maturity.

How do I increase maturity?

Goal #1 was to stabilize the system and build trust with the organization. This isn’t us against them, it’s not management against the people, it’s working within the constraints of the existing system to get better business results… and fast.

Over time, you want to continue working to reduce batch size at the enterprise level, you want to progressively reduce dependencies between teams, you want to start funding teams and business capabilities rather than projects, you want to invest to learn.

Lofty goals, huh?

That said, those are lofty goals for an organization that can’t form a team, build a backlog, or produce working tested software every few weeks. Those are lofty goals for an organization that is so mired in dependencies they can’t move, let alone self-organize.

Our belief is that we are past the early adopters. We are past the small project teams in big companies. We are past simply telling people to self-organize and inspect and adapt. We need a way to crack companies apart, to systematically refactor them into agile organizations.

Once we have the foundational structures, principles, guidelines in place, and a sufficient threshold of people is bought into the new system and understands how it operates, then we can start letting go, deprecating control structures, and really living the promise of agile.

The post Do What Works… Even If It’s Not Agile. appeared first on LeadingAgile.

Categories: Blogs

Book Review: Managing Humans

thekua.com@work - Mon, 06/29/2015 - 21:31

I remember hearing about Managing Humans several years ago but I only got around to buying it and getting through reading it.

Managing Humans

It is written by the well-known Michael Lopp otherwise known as Rands, who blogs at Rands and Repose.

The title is a clever take on working in software development and Rands shares his experiences working as a technical manager in various companies through his very unique perspective and writing style. If you follow his blog, you can see it shine through in the way that he tells stories, the way that he creates names around stereotypes and situations you might find yourself in the role of a Technical Manager.

He offers lots of useful advice that covers a wide variety of topics such as tips for interviewing, resigning, making meetings more effective, dealing with specific types of characters that are useful regardless of whether or not you are a Technical Manager or not.

He also covers a wider breath of topics such as handling conflict, tips for hiring, motivation and managing upwards (the last particularly necessary in large corporations). I felt like some of the topics felt outside the topic of “Managing Humans” and the intended target audience of a Technical Manager such as tips for resigning (yourself, not handling it from your team) and joining a start up.

His stories describe the people he has worked with and situations he has worked in. A lot of it will probably resonate very well with you if you have worked, or work in large software development firm or a “Borland” of our time.

The book is easy to digest in chunks, and with clear titles, is easy to pick up at different intervals or going back for future reference. The book is less about a single message, than a series of essays that offer another valuable insight into working with people in the software industry.

Categories: Blogs

Dealing with People You Can’t Stand

J.D. Meier's Blog - Mon, 06/29/2015 - 16:46

“If You Want To Go Fast, Go Alone. If You Want To Go Far, Go Together” – African Proverb

I blew the dust off some olds posts to rekindle some of the most important information for work and life.

It’s about dealing with people you can’t stand.

Whether you think of them as jerks, bullies, or just difficult people, the better you can deal with difficult people, the better you can get things done and make things happen.

And the more you learn how to bring out the best, in people at their worst, the less you’ll find people you can’t stand.

How To Bring Out the Best in People at Their Worst (Including Yourself)

Everything I needed to learn about dealing with difficult people, I learned from the book Dealing with People You Can’t Stand: How to Bring Out the Best in People at Their Worst, by Dr. Rick Brinkman and Dr. Rick Kirschner.

It’s one of the most brilliant, thoughtful books I’ve ever read on interpersonal skills and dealing with all sorts of bad behaviors.

The real key to dealing with difficult behavior is more than just recognizing bad behaviors in other people.

It’s recognizing bad behaviors in yourself, the kind that contribute to and amplify other people’s bad behaviors.

The more you know, the more you grow, and this is truly one of those transformational books.

Learn How To Deal with Difficult People (and Gain Some Mad Interpersonal Skills)

I’ve completely re-written my pot that provides an overview of the big ideas in Dealing with People You Can’t Stand:

Dealing with People You Can’t Stand

Even better, I’ve re-written all of my posts that talk through the 10 Types of Difficult People, and what to do about them.

I have to warn you:  Once you learn the 10 Types of Difficult People, you’ll be using the labels to classify bad behaviors that you experience in the halls, in meetings, behind your back, etc.

With that in mind, here they are …

10 Types of Difficult People

Here are the 10 Types of Difficult People at a glance:

  1. Grenade Person – After a brief period of calm, the Grenade person explodes into unfocused ranting and raving about things that have nothing to do with the present circumstances.
  2. Know-It-Alls – Seldom in doubt, the Know-It-All person has a low tolerance for correction and contradiction. If something goes wrong, however, the Know-It-All will speak with the same authority about who’s to blame – you!
  3. Maybe Person – In a moment of decision, the Maybe Person procrastinates in the hope that a better choice will present itself.
  4. No Person – A No Person kills momentum and creates friction for you. More deadly to morale than a speeding bullet, more powerful than hope, able to defeat big ideas with a single syllable.
  5. Nothing Person – A Nothing Person doesn’t contribute to the conversation. No verbal feedback, no nonverbal feedback, Nothing. What else could you expect from … the Nothing Person.
  6. Snipers – Whether through rude comments, biting sarcasm, or a well-timed roll of the eyes, making you look foolish is the Sniper’s specialty.
  7. Tanks – The Tank is confrontational, pointed and angry, the ultimate in pushy and aggressive behavior
  8. Think-They-Know-It-Alls – Think-They-Know-It-All people can’t fool all the people all the time, but they can fool some of the people enough of the time, and enough of the people all of the time – all for the sake of getting some attention.
  9. Whiners – Whiners feel helpless and overwhelmed by an unfair world. Their standard is perfection, and no one and nothing measures up to it.
  10. Yes Person – In an effort to please people and avoid confrontation, Yes People say “yes” without thinking things through.

I warned you.  Are you already thinking about some Snipers in a few meetings that you have, or is there a Yes Person driving you nuts (or are you that Yes Person?)

Have you talked to a Think-They-Know-It-All lately, or worse, a Know-It—All?

Never fear, I’ve included actionable insights and recommendations for dealing with all the various bad behaviors you’ll encounter.

The Lens of Human Understanding

If all this talk about dealing with difficult people, and having silly labels seems like a gimmick, it’s not.  It’s actually deep insight rooted in a powerful, but simple framework that Dr. Rick Brinkman and Dr. Rick Kirschner refer to as the Lens of Human Understanding:

The Lens of Human Understanding

Once I learned The Lens of Human Understanding, so many things fell into place.

Not only did I understand myself better, but I could instantly see what was driving other people, and how my behavior would either create more conflict or resolve it.

But when you don’t know what makes people tick, it’s very easy to get ticked off, or to tick them off.

Here’s looking at you … and other people … and their behaviors … in a brand new way.

You Might Also Like

25 Books the Most Successful Microsoft Leaders Read and Do

Interpersonal Skills Books

Personal Development Hub on Sources of Insight

Personal Development Resources at Sources of Insight

The Great Leadership Quotes Collection

Categories: Blogs

R: Speeding up the Wimbledon scraping job

Mark Needham - Mon, 06/29/2015 - 07:36

Over the past few days I’ve written a few blog posts about a Wimbledon data set I’ve been building and after running the scripts a few times I noticed that it was taking much longer to run that I expected.

To recap, I started out with the following function which takes in a URI and returns a data frame containing a row for each match:

library(rvest)
library(dplyr)
 
scrape_matches1 = function(uri) {
  matches = data.frame()
 
  s = html(uri)
  rows = s %>% html_nodes("div#scoresResultsContent tr")
  i = 0
  for(row in rows) {  
    players = row %>% html_nodes("td.day-table-name a")
    seedings = row %>% html_nodes("td.day-table-seed")
    score = row %>% html_node("td.day-table-score a")
    flags = row %>% html_nodes("td.day-table-flag img")
 
    if(!is.null(score)) {
      player1 = players[1] %>% html_text() %>% str_trim()
      seeding1 = ifelse(!is.na(seedings[1]), seedings[1] %>% html_node("span") %>% html_text() %>% str_trim(), NA)
      flag1 = flags[1] %>% html_attr("alt")
 
      player2 = players[2] %>% html_text() %>% str_trim()
      seeding2 = ifelse(!is.na(seedings[2]), seedings[2] %>% html_node("span") %>% html_text() %>% str_trim(), NA)
      flag2 = flags[2] %>% html_attr("alt")
 
      matches = rbind(data.frame(winner = player1, 
                                 winner_seeding = seeding1, 
                                 winner_flag = flag1,
                                 loser = player2, 
                                 loser_seeding = seeding2,
                                 loser_flag = flag2,
                                 score = score %>% html_text() %>% str_trim(),
                                 round = round), matches)      
    } else {
      round = row %>% html_node("th") %>% html_text()
    }
  } 
  return(matches)
}

Let’s run it to get an idea of the data that it returns:

matches1 = scrape_matches1("http://www.atpworldtour.com/en/scores/archive/wimbledon/540/2014/results")
 
> matches1 %>% filter(round %in% c("Finals", "Semi-Finals", "Quarter-Finals"))
           winner winner_seeding winner_flag           loser loser_seeding loser_flag            score          round
1    Milos Raonic            (8)         CAN    Nick Kyrgios          (WC)        AUS    674 62 64 764 Quarter-Finals
2   Roger Federer            (4)         SUI   Stan Wawrinka           (5)        SUI     36 765 64 64 Quarter-Finals
3 Grigor Dimitrov           (11)         BUL     Andy Murray           (3)        GBR        61 764 62 Quarter-Finals
4  Novak Djokovic            (1)         SRB     Marin Cilic          (26)        CRO  61 36 674 62 62 Quarter-Finals
5   Roger Federer            (4)         SUI    Milos Raonic           (8)        CAN         64 64 64    Semi-Finals
6  Novak Djokovic            (1)         SRB Grigor Dimitrov          (11)        BUL    64 36 762 767    Semi-Finals
7  Novak Djokovic            (1)         SRB   Roger Federer           (4)        SUI 677 64 764 57 64         Finals

As I mentioned, it’s quite slow but I thought I’d wrap it in system.time so I could see exactly how long it was taking:

> system.time(scrape_matches1("http://www.atpworldtour.com/en/scores/archive/wimbledon/540/2014/results"))
   user  system elapsed 
 25.570   0.111  31.416

About 30 seconds! The first thing I tried was downloading the file separately and running the function against the local file:

> system.time(scrape_matches1("data/raw/2014.html"))
   user  system elapsed 
 25.662   0.123  25.863

Hmmm, that’s only saved us 5 seconds so the bottleneck must be somewhere else. Still there’s no point making a HTTP request every time we run the script so we’ll stick with the local file version.

While browsing rvest’s vignette I noticed a function called html_table which I was curious about. I decided to try and replace some of my code with a call to that:

matches2= html("data/raw/2014.html") %>% 
  html_node("div#scoresResultsContent table.day-table") %>% html_table(header = FALSE) %>% 
  mutate(X1 = ifelse(X1 == "", NA, X1)) %>%
  mutate(round = ifelse(grepl("\\([0-9]\\)|\\(", X1), NA, X1)) %>% 
  mutate(round = na.locf(round)) %>%
  filter(!is.na(X8)) %>%
  select(winner = X3, winner_seeding = X1, loser = X7, loser_seeding = X5, score = X8, round)
 
> matches2 %>% filter(round %in% c("Finals", "Semi-Finals", "Quarter-Finals"))
           winner winner_seeding           loser loser_seeding            score          round
1  Novak Djokovic            (1)   Roger Federer           (4) 677 64 764 57 64         Finals
2  Novak Djokovic            (1) Grigor Dimitrov          (11)    64 36 762 767    Semi-Finals
3   Roger Federer            (4)    Milos Raonic           (8)         64 64 64    Semi-Finals
4  Novak Djokovic            (1)     Marin Cilic          (26)  61 36 674 62 62 Quarter-Finals
5 Grigor Dimitrov           (11)     Andy Murray           (3)        61 764 62 Quarter-Finals
6   Roger Federer            (4)   Stan Wawrinka           (5)     36 765 64 64 Quarter-Finals
7    Milos Raonic            (8)    Nick Kyrgios          (WC)    674 62 64 764 Quarter-Finals

I had to do some slightly clever stuff to get the ’round’ column into shape using zoo’s na.locf function which I wrote about previously.

Unfortunately I couldn’t work out how to extract the flag with this version – that value is hidden in the ‘alt’ tag of an img and presumably html_table is just grabbing the text value of each cell. This version is much quicker though!

system.time(html("data/raw/2014.html") %>% 
  html_node("div#scoresResultsContent table.day-table") %>% html_table(header = FALSE) %>% 
  mutate(X1 = ifelse(X1 == "", NA, X1)) %>%
  mutate(round = ifelse(grepl("\\([0-9]\\)|\\(", X1), NA, X1)) %>% 
  mutate(round = na.locf(round)) %>%
  filter(!is.na(X8)) %>%
  select(winner = X3, winner_seeding = X1, loser = X7, loser_seeding = X5, score = X8, round))
 
   user  system elapsed 
  0.545   0.002   0.548

What I realised from writing this version is that I need to match all the columns with one call to html_nodes rather than getting the row and then each column in a loop.

I rewrote the function to do that:

scrape_matches3 = function(uri) {
  s = html(uri)
 
  players  = s %>% html_nodes("div#scoresResultsContent tr td.day-table-name a")
  seedings = s %>% html_nodes("div#scoresResultsContent tr td.day-table-seed")
  scores   = s %>% html_nodes("div#scoresResultsContent tr td.day-table-score a")
  flags    = s %>% html_nodes("div#scoresResultsContent tr td.day-table-flag img") %>% html_attr("alt") %>% str_trim()
 
  matches3 = data.frame(
    winner         = sapply(seq(1,length(players),2),  function(idx) players[[idx]] %>% html_text()),
    winner_seeding = sapply(seq(1,length(seedings),2), function(idx) seedings[[idx]] %>% html_text() %>% str_trim()),
    winner_flag    = sapply(seq(1,length(flags),2),    function(idx) flags[[idx]]),  
    loser          = sapply(seq(2,length(players),2),  function(idx) players[[idx]] %>% html_text()),
    loser_seeding  = sapply(seq(2,length(seedings),2), function(idx) seedings[[idx]] %>% html_text() %>% str_trim()),
    loser_flag     = sapply(seq(2,length(flags),2),    function(idx) flags[[idx]]),
    score          = sapply(scores,                    function(score) score %>% html_text() %>% str_trim())
  )
  return(matches3)
}

Let’s run and time that to check we’re getting back the right results in a timely manner:

> matches3 %>% sample_n(10)
                   winner winner_seeding winner_flag               loser loser_seeding loser_flag         score
70           David Ferrer            (7)         ESP Pablo Carreno Busta                      ESP  60 673 61 61
128        Alex Kuznetsov           (26)         USA         Tim Smyczek           (3)        USA   46 63 63 63
220   Rogerio Dutra Silva                        BRA   Kristijan Mesaros                      CRO         62 63
83         Kevin Anderson           (20)         RSA        Aljaz Bedene          (LL)        GBR      63 75 62
73          Kei Nishikori           (10)         JPN   Kenny De Schepper                      FRA     64 765 75
56  Roberto Bautista Agut           (27)         ESP         Jan Hernych           (Q)        CZE   75 46 62 62
138            Ante Pavic                        CRO        Marc Gicquel          (29)        FRA  46 63 765 64
174             Tim Puetz                        GER     Ruben Bemelmans                      BEL         64 62
103        Lleyton Hewitt                        AUS   Michal Przysiezny                      POL 62 6714 61 64
35          Roger Federer            (4)         SUI       Gilles Muller           (Q)        LUX      63 75 63
 
> system.time(scrape_matches3("data/raw/2014.html"))
   user  system elapsed 
  0.815   0.006   0.827

It’s still quick – a bit slower than html_table but we can deal with that. As you can see, I also had to add some logic to separate the values for the winners and losers – the players, seeds, flags come back as as one big list. The odd rows represent the winner; the even rows the loser.

Annoyingly we’ve now lost the ’round’ column because that appears as a table heading so we can’t extract it the same way. I ended up cheating a bit to get it to work by working out how many matches each round should contain and generated a vector with that number of entries:

raw_rounds = s %>% html_nodes("th") %>% html_text()
 
> raw_rounds
 [1] "Finals"               "Semi-Finals"          "Quarter-Finals"       "Round of 16"          "Round of 32"         
 [6] "Round of 64"          "Round of 128"         "3rd Round Qualifying" "2nd Round Qualifying" "1st Round Qualifying"
 
rounds = c( sapply(0:6, function(idx) rep(raw_rounds[[idx + 1]], 2 ** idx)) %>% unlist(),
            sapply(7:9, function(idx) rep(raw_rounds[[idx + 1]], 2 ** (idx - 3))) %>% unlist())
 
> rounds[1:10]
 [1] "Finals"         "Semi-Finals"    "Semi-Finals"    "Quarter-Finals" "Quarter-Finals" "Quarter-Finals" "Quarter-Finals"
 [8] "Round of 16"    "Round of 16"    "Round of 16"

Let’s put that code into the function and see if we end up with the same resulting data frame:

scrape_matches4 = function(uri) {
  s = html(uri)
 
  players  = s %>% html_nodes("div#scoresResultsContent tr td.day-table-name a")
  seedings = s %>% html_nodes("div#scoresResultsContent tr td.day-table-seed")
  scores   = s %>% html_nodes("div#scoresResultsContent tr td.day-table-score a")
  flags    = s %>% html_nodes("div#scoresResultsContent tr td.day-table-flag img") %>% html_attr("alt") %>% str_trim()
 
  raw_rounds = s %>% html_nodes("th") %>% html_text()
  rounds = c( sapply(0:6, function(idx) rep(raw_rounds[[idx + 1]], 2 ** idx)) %>% unlist(),
              sapply(7:9, function(idx) rep(raw_rounds[[idx + 1]], 2 ** (idx - 3))) %>% unlist())
 
  matches4 = data.frame(
    winner         = sapply(seq(1,length(players),2),  function(idx) players[[idx]] %>% html_text()),
    winner_seeding = sapply(seq(1,length(seedings),2), function(idx) seedings[[idx]] %>% html_text() %>% str_trim()),
    winner_flag    = sapply(seq(1,length(flags),2),    function(idx) flags[[idx]]),  
    loser          = sapply(seq(2,length(players),2),  function(idx) players[[idx]] %>% html_text()),
    loser_seeding  = sapply(seq(2,length(seedings),2), function(idx) seedings[[idx]] %>% html_text() %>% str_trim()),
    loser_flag     = sapply(seq(2,length(flags),2),    function(idx) flags[[idx]]),
    score          = sapply(scores,                    function(score) score %>% html_text() %>% str_trim()),
    round          = rounds
  )
  return(matches4)
}
 
matches4 = scrape_matches4("data/raw/2014.html")
 
> matches4 %>% filter(round %in% c("Finals", "Semi-Finals", "Quarter-Finals"))
           winner winner_seeding winner_flag           loser loser_seeding loser_flag            score          round
1  Novak Djokovic            (1)         SRB   Roger Federer           (4)        SUI 677 64 764 57 64         Finals
2  Novak Djokovic            (1)         SRB Grigor Dimitrov          (11)        BUL    64 36 762 767    Semi-Finals
3   Roger Federer            (4)         SUI    Milos Raonic           (8)        CAN         64 64 64    Semi-Finals
4  Novak Djokovic            (1)         SRB     Marin Cilic          (26)        CRO  61 36 674 62 62 Quarter-Finals
5 Grigor Dimitrov           (11)         BUL     Andy Murray           (3)        GBR        61 764 62 Quarter-Finals
6   Roger Federer            (4)         SUI   Stan Wawrinka           (5)        SUI     36 765 64 64 Quarter-Finals
7    Milos Raonic            (8)         CAN    Nick Kyrgios          (WC)        AUS    674 62 64 764 Quarter-Finals

We shouldn’t have added much to the time but let’s check:

> system.time(scrape_matches4("data/raw/2014.html"))
   user  system elapsed 
  0.816   0.004   0.824

Sweet. We’ve saved ourselves 29 seconds per page as long as the number of rounds stayed constant over the years. For the 10 years that I’ve looked at it has but I expect if you go back further the draw sizes will have been different and our script would break.

For now though this will do!

Categories: Blogs

R: dplyr – Update rows with earlier/previous rows values

Mark Needham - Mon, 06/29/2015 - 00:30

Recently I had a data frame which contained a column which had mostly empty values:

> data.frame(col1 = c(1,2,3,4,5), col2  = c("a", NA, NA , "b", NA))
  col1 col2
1    1    a
2    2 <NA>
3    3 <NA>
4    4    b
5    5 <NA>

I wanted to fill in the NA values with the last non NA value from that column. So I want the data frame to look like this:

1    1    a
2    2    a
3    3    a
4    4    b
5    5    b

I spent ages searching around before I came across the na.locf function in the zoo library which does the job:

library(zoo)
library(dplyr)
 
> data.frame(col1 = c(1,2,3,4,5), col2  = c("a", NA, NA , "b", NA)) %>% 
    do(na.locf(.))
  col1 col2
1    1    a
2    2    a
3    3    a
4    4    b
5    5    b

This will fill in the missing values for every column, so if we had a third column with missing values it would populate those too:

> data.frame(col1 = c(1,2,3,4,5), col2  = c("a", NA, NA , "b", NA), col3 = c("A", NA, "B", NA, NA)) %>% 
    do(na.locf(.))
 
  col1 col2 col3
1    1    a    A
2    2    a    A
3    3    a    B
4    4    b    B
5    5    b    B

If we only want to populate ‘col2′ and leave ‘col3′ as it is we can apply the function specifically to that column:

> data.frame(col1 = c(1,2,3,4,5), col2  = c("a", NA, NA , "b", NA), col3 = c("A", NA, "B", NA, NA)) %>% 
    mutate(col2 = na.locf(col2))
  col1 col2 col3
1    1    a    A
2    2    a <NA>
3    3    a    B
4    4    b <NA>
5    5    b <NA>

It’s quite a neat function and certainly comes in helpful when cleaning up data sets which don’t tend to be as uniform as you’d hope!

Categories: Blogs

An Illustrated Guide To Client Projects: Requirements vs Reality

Derick Bailey - new ThoughtStream - Sun, 06/28/2015 - 03:33

What the client claims the need vs what the end up getting are often two very different things. This isn’t always a bad thing, but it certainly can be. To that end, I wanted to illustrate what the reality of a client project often is vs what the client claimed it should have been.

How The Client Described The Project

In the client’s words, everything is an absolute must requirement. There are no features they cannot live without (in spite of them currently running their business without most of the “required” features).

Project requirements

The project is large, complex, expensive, and requires significant engineering effort.

How The Client Budgeted For The Project

Unless you are working with a client that has a history of using outsourced I.T. / development / design services, there is usually a very large gap between what the client claims they must have vs what they budgeted to get. The budget is usually enough to get the general shape of the requirements, but will be missing a lot of the fine detail.

Client budget

How The Project Was Delivered

Inevitably, there will be problems along the way. Technical issues will pop up. Interaction with the client will stall as they become busy. Questions will be left unanswered, and oh by the way, they won’t pay for “whistles and bells” or “design”. It just needs to be simple and work. With the constant battles that are fought, the project typically lacks important details even though it does solve the core problem.

Project delivery

What The Client Actually Needs

When it’s all said and done, the client might be reaching for this – with you, right there with them – as the answer to their actual problems.

What the client needs

Are You Sure You Need That Client?

Be careful who you take on as a client. It’s tempting to think that you can’t say no because you need the money, or frankly because you just don’t think you can ever say no. But you need to learn to say no and be picky about the clients you take. 

The beginning of a relationship with any client will set the tone. Are they argumentative, over-haggling and nit-picky about inane details? Or do they enjoy the conversation, speak openly and honestly, and look at you as a partner in the endeavor?

Your time and talent are worth more than you think. Be sure you are treated with the respect you deserve, and always treat the client with the respect that you wish you were getting.

Categories: Blogs

R: Command line – Error in GenericTranslator$new : could not find function “loadMethod”

Mark Needham - Sun, 06/28/2015 - 00:47

I’ve been reading Text Processing with Ruby over the last week or so and one of the ideas the author describes is setting up your scripts so you can run them directly from the command line.

I wanted to do this with my Wimbledon R script and wrote the following script which uses the ‘Rscript’ executable so that R doesn’t launch in interactive mode:

wimbledon

#!/usr/bin/env Rscript
 
library(rvest)
library(dplyr)
library(stringr)
library(readr)
 
# stuff

Then I tried to run it:

$ time ./wimbledon
 
...
 
Error in GenericTranslator$new : could not find function "loadMethod"
Calls: write.csv ... html_extract_n -> <Anonymous> -> Map -> mapply -> <Anonymous> -> $
Execution halted
 
real	0m1.431s
user	0m1.127s
sys	0m0.078s

As the error suggests, the script fails when trying to write to a CSV file – it looks like Rscript doesn’t load in something from the core library that we need. It turns out adding the following line to our script is all we need:

library(methods)

So we end up with this:

#!/usr/bin/env Rscript
 
library(methods)
library(rvest)
library(dplyr)
library(stringr)
library(readr)

And when we run that all is well!

Categories: Blogs

R: dplyr – squashing multiple rows per group into one

Mark Needham - Sun, 06/28/2015 - 00:36

I spent a bit of the day working on my Wimbledon data set and the next thing I explored is all the people that have beaten Andy Murray in the tournament.

The following dplyr query gives us the names of those people and the year the match took place:

library(dplyr)
 
> main_matches %>% filter(loser == "Andy Murray") %>% select(winner, year)
 
            winner year
1  Grigor Dimitrov 2014
2    Roger Federer 2012
3     Rafael Nadal 2011
4     Rafael Nadal 2010
5     Andy Roddick 2009
6     Rafael Nadal 2008
7 Marcos Baghdatis 2006
8 David Nalbandian 2005

As you can see, Rafael Nadal shows up multiple times. I wanted to get one row per player and list all the years in a single column.

This was my initial attempt:

> main_matches %>% filter(loser == "Andy Murray") %>% 
     group_by(winner) %>% summarise(years = paste(year))
Source: local data frame [6 x 2]
 
            winner years
1     Andy Roddick  2009
2 David Nalbandian  2005
3  Grigor Dimitrov  2014
4 Marcos Baghdatis  2006
5     Rafael Nadal  2011
6    Roger Federer  2012

Unfortunately it just gives you the last matching row per group which isn’t quite what we want.. I realised my mistake while trying to pass a vector into paste and noticing that a vector came back when I’d expected a string:

> paste(c(2008,2009,2010))
[1] "2008" "2009" "2010"

The missing argument was ‘collapse’ – something I’d come across when using plyr last year:

> paste(c(2008,2009,2010), collapse=", ")
[1] "2008, 2009, 2010"

Now, if we apply that to our original function:

> main_matches %>% filter(loser == "Andy Murray") %>% 
     group_by(winner) %>% summarise(years = paste(year, collapse=", "))
Source: local data frame [6 x 2]
 
            winner            years
1     Andy Roddick             2009
2 David Nalbandian             2005
3  Grigor Dimitrov             2014
4 Marcos Baghdatis             2006
5     Rafael Nadal 2011, 2010, 2008
6    Roger Federer             2012

That’s exactly what we want. Let’s tidy that up a bit:

> main_matches %>% filter(loser == "Andy Murray") %>% 
     group_by(winner) %>% arrange(year) %>%
     summarise(years  = paste(year, collapse =","), times = length(year))  %>%
     arrange(desc(times), years)
Source: local data frame [6 x 3]
 
            winner          years times
1     Rafael Nadal 2008,2010,2011     3
2 David Nalbandian           2005     1
3 Marcos Baghdatis           2006     1
4     Andy Roddick           2009     1
5    Roger Federer           2012     1
6  Grigor Dimitrov           2014     1
Categories: Blogs

Link: Slashdot on Mob Programming

Learn more about our Scrum and Agile training sessions on WorldMindware.com

Slashdot has a post on mob programming that, as usual, has brought out the poorly socialized extreme introverts and their invective. There are always interesting and insightful comments as well.  I recommend checking it out.  The post links to some studies about mob programming that are also interesting.

Try out our Virtual Scrum Coach with the Scrum Team Assessment tool - just $500 for a team to get targeted advice and great how-to informationPlease share!
facebooktwittergoogle_plusredditpinterestlinkedinmail

The post Link: Slashdot on Mob Programming appeared first on Agile Advice.

Categories: Blogs

R: ggplot – Show discrete scale even with no value

Mark Needham - Sat, 06/27/2015 - 00:48

As I mentioned in a previous blog post, I’ve been scraping data for the Wimbledon tennis tournament, and having got the data for the last ten years I wrote a query using dplyr to find out how players did each year over that period.

I ended up with the following functions to filter my data frame of all the mataches:

round_reached = function(player, main_matches) {
  furthest_match = main_matches %>% 
    filter(winner == player | loser == player) %>% 
    arrange(desc(round)) %>% 
    head(1)  
 
    return(ifelse(furthest_match$winner == player, "Winner", as.character(furthest_match$round)))
}
 
player_performance = function(name, matches) {
  player = data.frame()
  for(y in 2005:2014) {
    round = round_reached(name, filter(matches, year == y))
    if(length(round) == 1) {
      player = rbind(player, data.frame(year = y, round = round))      
    } else {
      player = rbind(player, data.frame(year = y, round = "Did not enter"))
    } 
  }
  return(player)
}

When we call that function we see the following output:

> player_performance("Andy Murray", main_matches)
   year          round
1  2005    Round of 32
2  2006    Round of 16
3  2007  Did not enter
4  2008 Quarter-Finals
5  2009    Semi-Finals
6  2010    Semi-Finals
7  2011    Semi-Finals
8  2012         Finals
9  2013         Winner
10 2014 Quarter-Finals

I wanted to create a chart showing Murray’s progress over the years with the round reached on the y axis and the year on the x axis. In order to do this I had to make sure the ’round’ column was being treated as a factor variable:

df = player_performance("Andy Murray", main_matches)
 
rounds = c("Did not enter", "Round of 128", "Round of 64", "Round of 32", "Round of 16", "Quarter-Finals", "Semi-Finals", "Finals", "Winner")
df$round = factor(df$round, levels =  rounds)
 
> df$round
 [1] Round of 32    Round of 16    Did not enter  Quarter-Finals Semi-Finals    Semi-Finals    Semi-Finals   
 [8] Finals         Winner         Quarter-Finals
Levels: Did not enter Round of 128 Round of 64 Round of 32 Round of 16 Quarter-Finals Semi-Finals Finals Winner

Now that we’ve got that we can plot his progress:

ggplot(aes(x = year, y = round, group=1), data = df) + 
    geom_point() + 
    geom_line() + 
    scale_x_continuous(breaks=df$year) + 
    scale_y_discrete(breaks = rounds)

2015 06 26 23 37 32

This is a good start but we’ve lost the rounds which don’t have a corresponding entry on the x axis. I’d like to keep them so it’s easier to compare the performance of different players.

It turns out that all we need to do is pass ‘drop = FALSE’ to scale_y_discrete and it will work exactly as we want:

ggplot(aes(x = year, y = round, group=1), data = df) + 
    geom_point() + 
    geom_line() + 
    scale_x_continuous(breaks=df$year) + 
    scale_y_discrete(breaks = rounds, drop = FALSE)

2015 06 26 23 41 01

Neat. Now let’s have a look at the performances of some of the other top players:

draw_chart = function(player, main_matches){
  df = player_performance(player, main_matches)
  df$round = factor(df$round, levels =  rounds)
 
  ggplot(aes(x = year, y = round, group=1), data = df) + 
    geom_point() + 
    geom_line() + 
    scale_x_continuous(breaks=df$year) + 
    scale_y_discrete(breaks = rounds, drop=FALSE) + 
    ggtitle(player) + 
    theme(axis.text.x=element_text(angle=90, hjust=1))
}
 
a = draw_chart("Andy Murray", main_matches)
b = draw_chart("Novak Djokovic", main_matches)
c = draw_chart("Rafael Nadal", main_matches)
d = draw_chart("Roger Federer", main_matches)
 
library(gridExtra)
grid.arrange(a,b,c,d, ncol=2)

2015 06 26 23 46 15

And that’s all for now!

Categories: Blogs