Using Economics to Encourage Testing Incrementally (or As You Go Along)

At TriAgile, I had an interesting conversation with a Product Owner. She described to me a problem where the testers could not keep up and their behaviors were actually holding them back. Let me describe her situation…

In her content development team, they had a couple of testers. They manually tested hyperlinks and other HTML/JavaScript/CSS elements towards the end of the iteration. While she would love to move to automated testing, there were some hurdles to get software approved for use, plus she had this whole behavioral mindset she needed to overcome. The testers on her team felt building and running tests incrementally as a developer completed work on acceptance criteria was wasteful. They preferred to do it after a story was completed by teh content developer; this then always put them being crunched. No matter how hard this Product Owner tried to convince them to do their testing as they go, they resisted. Her Scrum Master was also not providing any influence one way or the other.

As we discussed this at TriAgile, I finally settled on economics to help her understand the situation.

Suppose a content developer produced a defect that prevented a CSS library from working by using a faulty assumption (let’s say it was as simple as a misspelled directory in the URL). And let’s suppose this faulty assumption caused the error to be reproduced 10 times. And further, let’s say each time the person did this, it took them 10 minutes to implement each instance. Lastly, it took the tester 5 minutes to test EACH instance.  So let’s do some math. (All of the times are hypothetical; they could be longer or shorter.)

So first up, testing at the end: 10 x 10 minutes implementation + 10 x 5 minutes testing = 150 minutes. But wait, we now have to fix those errors. So presuming that great information got passed back to the developer and it only takes them 5 min to correct each instance, we need to add: 10 x 5 minutes fix time + 10 x 5 minutes retesting = 100 minutes.  So our total time to get to done is 150 + 100 = 250 minutes to implement, test, correct, and retest the work. Our Product Owner had actually said that this kind of error replication had happened multiple times.

OK, what would have happened if it happened incrementally? Well our implementation time is the same, but after the first implementation occurs it would go to get tested. If an error is found, it goes back to the content developer and having seen the error she or he was making, they can now avoid reproducing it. So the time would be something like this: 1 x 10 minutes implementation + 1 x 5 minutes testing = 15 min, then 1 reworked item x 5 min + 1 item retested x 5 min = 10 min, and finally 9 remaining items x 10 minutes implementation + 9 x 5 minutes testing = 130. Total time now is 160 min.

If the cost was $2/minute (assuming a $120/hour rate employee), you easily wasted

$250 – $160 or $90.

Now multiply this by however many teams are not testing as they go and how many times they have this happen.

Of course there could be items caught that are not recurring, but the fact of the matter is, every recurrence of an error that has to be backed out introduces a lot of waste into the system (defect waste for you Lean/Six Sigma types). Testing as you go and stopping ‘the line’ to prevent future defects from occurring saves money in the long run since labor time is what we are talking about.

In addition to this direct savings as calculated above, one ALSO has the queue time for each test that is awaiting to being tested before it can be OK-ed to produce value. In the first instance, this may be building up considerably, delaying production readiness. And suppose out of the 10 occurrences above, only 8 could be completed because we’re near the end of the iteration? Then we’re probably not going to get all of them tested and any fixes done in time. If we had been testing along the way, then if something didn’t get tested, we could talk with the Product Owner about releasing what was completed and successfully tested. Something of value is completed as opposed to deploying nothing. There is a real opportunity cost for this delay.

So there is something to be learned by each area with this. For the tester, testing completed work, even manually, incrementally keeps you from becoming a bottleneck to producing value. For the developer, giving developed items to the tester incrementally and getting feedback after each item allows you to correct along the way, possibly avoiding future errors. And for the business, having this occur incrementally actually reduces both the real and potential opportunity costs of the work.

Incorporating Security into User Stories

Violet_ForcefieldOne of the biggest initial resistors I have run across within Federal government employee stakeholders are information security personnel (and their supporting contractors).  This is often because when they learn about how requirements are managed with user stories they don’t see a fit for their requirements; in the Federal space these are guided by FIPS 200 and NIST 800-53 (currently at rev 4). Other writings on the subject do little to help them. Most advocate a separate and distinct type of story such as this paper by SafeCode or relying wholly on Dark or Evil stories written from a point of view of someone trying to gain access to a system, or deny access to a system.

There is no reason software-centric security controls can’t become user stories and/or acceptance criteria to user stories.  This post is going to attempt to show you a bit how to think on this. There is value in using dark stories as well, but I advocate first getting stories that incorporate NIST controls within the backlog.

First up, one must understand that the controls in NIST 800-53 cover a large number of dimensions including physical space control, configuration management, training, etc.  When attempting to convert the controls into user stories and/or acceptance criteria, focus only on the ones that are software-centric.  Controls that deal with authentication, authorization, audit logging, system monitoring, or encryption are great candidates.

Second, per NIST guidance, the organization is expected to establish the baseline needs of applications/systems and then select and tailor the controls needed. This responsibility can fall to the people managing the product features (in Scrum this would be the Product Owner) in concert with IT Security staff. By articulating these as user stories or acceptance criteria to user stories, these now have business value. The person managing what gets done no longer has to make a leap of faith or be told by someone externally that they just have to have it.

IN the following examples, we’re going to follow the Specifications by Example format and use a generics  records review system as an example that places records in a list for authoritative storage (a separate system); it handles Personal Identifiable Information. Let’s start with a story…

User Login Story

As a User Requiring Authentication,
I want to Login to the DataReview application
So that I can review incoming survey data for quality

//Standard Login Scenario
Given the username jsmith // valid userid
When I attempt a login using “nS3cure” // correct password
Then I the main home portal page is displayed for me

Given the username jsmith // valid userid
When I attempt a login using “123” // incorrect password
Then I see the login page again with the statement “Incorrect userid or password” stated along with a count of Login attempts.

Given the username smithj // invalid userid
When I attempt a login using “nS3cure”     //password correctness is immaterial
Then I see the login page again with the statement “Incorrect userid or password” stated along with a count of Login attempts.

// Multiple Login Failure Scenario
Given two failed login attempts and the username jsmith // valid userid – 3 attempts require a lock out for a period of time
When I attempt a login using using “123” // incorrect password
Then I see the login page again with the statement “Third failed login attempt, your IP address has been locked out for 3 hours” // 3 hrs by policy

Given two failed login attempts and the username smithj // invalid userid – 3 attempts require a lock out for a period of time
When I attempt a login using “nS3cure”     //password correctness is immaterial
Then I see the login page again with the statement “Third failed login attempt, your IP address has been locked out for 3 hours” // 3 hrs by policy

Given the IP address 130.3.55.121 is locked and the password lockout timer less than or equal to 3 hours // we’re testing using a static IP
When I attempt a login using any username or passowrd
Then I see the login page again with the statement “This IP address has been locked out.” and the password lockout timer is reset to 3 hours.

There could be numerous other additional acceptance criteria as well (password complexity for example), but at least now one can see how these user stories can articulate some of the NIST requirements (in this case AC-7 Unsuccessful Logon Attempts).

Let’s look at how security controls for auditing significant events might show up as acceptance criteria.  In the following example from our records management system, post QC review, we are adding the person’s record into a queue for inclusion into the authoritative system. It is fundamental that we know when this data is placed in the queue for transmittal. This has been determined to be an auditable event per control AU-2.

Approve Record Story

As a QC,
I want to approve records
So that they can be queued for entry into the official records repository.

//Approval
Given authenticated user jsmith has a valid QC role and bjones submitted the record “Mary Maryland” to jsmith for approval
When I approve the record “Victoria Virginia”
Then I see the message “Victoria Virginia record approved”,  the record is appended to awaiting transmittal queue list, and an entry is made to the security audit log with <date-time>, “jsmith”, “QC”, “approved “Victoria Virginia” // policy requires approvals to official records to be logged

//Disapproval
Given authenticated user jsmith has a valid QC role and bjones submitted the record “Mary Maryland” to jsmith for approval
When I disapprove the record “Mary Maryland”
Then I see the message “Mary Maryland record disapproved and returned to bjones” and  the record is returned as the first item in the review queue for bjones

Here we can see how the record gets logged (one form of auditing) when approved.  Because the record isn’t being readied to transition to the authoritative system when disapproved, it was determined that event wasn’t auditable.

Hopefully, this helps folk understand how NIST 800-53 security controls can be incorporated into user stories. By putting them into this format, the development tem now can develop to them and hook them into acceptance testing via something like Cucumber.

 

The Story of Codemess

It’s that time of the season, so it’s time for a story…eiMAogLin

’Twas the night before review

The team stayed up late
To get all the stories done;
Tasks cleaned off their plate.

The CI server
Pulled all of the code
Mongo, Apache,
and a server named Node.

Each dev checked in their bits
Fast as light
Check the acceptance criteria? Pshaw!
I want to go home tonight.

So assumptions were made
With no consultation
And when the code built successfully,
The team squealed with elation!

But they skipped some tests
That kept showing up red
And just prayed that the demo
Would run and not be dead.

So next day was show time
They filed in with fake smiles
The Scrum Master put on all of her charms
And her witty guiles.

They fired up the screens
And showed all their work
The Product Owner turned red
He knew he was going to feel like a jerk.

He couldn’t accept not one
Not two, three, nor four
All stores had failed
Absolutely no score.

So nothing was right
While their efforts were of heroes
All of their stories
The points completed were zeroes.

A failed Sprint they had
One to be remembered
They should be glad
They had not been dismembered.

How they had worked
Needed serious reflection
But to hell with the retro
On to make this damn correction.

So off the team went
Stuck that they knew right
To code and recode
Many a more sleepless night.

The team talked to no one
Silence fell on them all
The Product Owner was be-puzzled
They never did call.

So with that I must state
Team and Product Owner should be as one
Collaborate more often
More stories will get “done-done”.

Keep to your retros
Use them to explore
The reasons for failure
I deeply ask, no implore!

Then enjoy your holidays
With family and good friends
Use the values and principles
As the means to the ends.

So Merry Christmas; Joyous Mawlid el-Nabi
And Happy Rohatsu and Hanukah
And any other you celebrate
Like Yule, Solstice or Kwanzaa.

Happy Holidays may it be filled with tests of green and zero of red…

Demonstrating the INVEST Criteria

potters_gold-2

I’ve been doing some rather “loftier” types of post, let’s return to something a bit more fundamental to (software) product development, user stories and in particular the INVEST acronym as developed by Bill Wake (see INVEST in Good Stories, and SMART Tasks). I was helping a coworker with some good examples of stories to showcase the INVEST criteria and felt this may be a useful post for people.

Let’s start with two formats User Stories may be expressed, we’ll stick with latter:

Who-What-Why

Or more commonly as

As a (role or persona)

I want to (perform some business function)

So that I can (get some business value/rationale)

Usually breakdowns in good user stories fail to articulate one or more of the INVEST criteria. Let’s look at each separately along with some examples.

I = Independent

We want stories to be independent; an independent story should be small vertical slice through most, if not all, of the software stack (UI, business logic, data persistence, etc.). Let’s start with a counter example to help demonstrate this.

As a decision-maker,

I want the data selection table menu to show the latest option results

So that I can determine which one to analyze.

Sounds OK right? Not really, the menu is a UI item. Where is this data going to come from, presumably a database, file, or API. It may get processed in a middle tier to do some filtering or sorting. The UI layer where the menu resides is only one layer; this story would be dependent on other stories in other layers to be able to be implementable. Usually any story that goes into the ‘how’, becomes less independent. Let’s rewrite it to –

As a decision-maker,

I want to view the latest option results

So that I can determine which one to analyze.

Besides appearing simpler, this doesn’t specify the menu, leaving the development team needing to do all the tasks to implement the results. Tasks could be querying the table, apply filter algorithm for outliers, sort from highest to lowest, display as a menu. It also doesn’t lock the team into the how – if the result could also come from an API or web service they can present those as an options to the product owner for selection; same with the menu, perhaps a table would be better.

N = Negotiable

Negotiable means the product owner and development team can make trade-offs on the priority of the story and/or acceptance criteria. Again let’s start with a counter example.

As a survey reviewer

I want to compare multiple respondent data sets

So that I can see if a correlation may exist.

What data sets? What data of the data sets? How is the product owner supposed to negotiate on this? Let’s add some detail –

As a survey reviewer

I want to compare age bracket data to geographic region

So that I can see if particular geographic regions contain particular high levels of a particular age group.

This is more negotiable; why? Suppose there was a second story –

As a survey reviewer

I want to compare income bracket data to geographic region

So that I can see if particular geographic regions contain particular high levels of a particular income.

Now the product owner can negotiate on which one is more important? They could also dig into acceptance criteria and talk about the ages or incomes that make up those brackets or what level of granularity they need to do for the regions. Often non-negotiable stories, ones that seem that MUST be done and can’t be ranked against others that MUST be done also are an indicator they are too big; they encompass too much.

V = Valuable

Another counter example will illustrate a story that doesn’t articulate value…

As a decision-maker,

I want to view the latest results

So that I can see them in order.

Why do I want to see them in order? (It’s presumed the order desired would be acceptance criteria. Better to specify the why, this also usually indicates why not only is the function needed, but why the particular acceptance criteria was chosen. Here is our refined story again –

As a decision-maker,

I want to view the latest results

So that I can determine which one to analyze.

Now we know why we need to do it.

E= Estimable

We don’t care so much about the estimate, which is one reason we use relative estimation based on complexity over trying to nail down an estimate in effort/length of time (hours for either). We care that some amount of certainty in the complexity can be articulated; this gives us a gauge that it is understood well enough to start. The higher the estimate, the less certainty, meaning it is more complex. At some point, this may require splitting into 2 or more stories to reduce complexity.

As a investor,

I want the latest analysis

So that I can decide what to do.

What do we mean by latest analysis? How do we estimate that? And that value statement doesn’t help; what decision are we trying to make – the business function – and why do I want to make it – the why. Here’s a story that may be estimable (providing acceptance criteria can be drawn from this)

As a investor,

I want the latest ROI graph with my minimum threshold shown

So that I can decide whether to continue making this investment.

OK, we want a graph, which we know must draw on data; if the raw data needs to go through calculations, we will need to do that. This threshold, is it entered or stored somewhere? Looks like well need tests to ensure the calculations are done properly. If we need to ensure web accessibility for people with sight disabilities, we may need a textual equivalent. Regardless, even with this uncertainty, being able to see most of the tasks and thinking on their complexity will give me the ability to estimate. Many have found that the estimate becomes pointless once the team actually has confidence they can complete it along with other stories in an iteration; remember this is mostly to describe common understanding. This may take months or even years to get to that point though.

S = Sized properly

Hand-in-hand with estimable, is sizing. If the story is large, really complex, then we need to think about splitting it into smaller independent stories. A good example of a story that is probably too large is the first story that dealt with a survey reviewer. The stories that follow it describing the data sets to compare are smaller and clearer and probably could be successfully implemented within an iteration. Who knows if the first one could? Also, if I couldn’t I get no partial credit for getting some of it done. If I get any small story done, then I can take credit for it.

And lastly, T = Testable

Testable stories are determined by their acceptance criteria. Let’s go to our first good story and fill in some acceptance criteria to see this clearly.

As a decision-maker,

I want to view the latest option results

So that I can determine which one to analyze.

When we turn the card over, we find the…

Acceptance Criteria:

  • Display options as menu choices
  • Display options in descending order from highest to lowest
  • Display results below my threshold in red and bold these
  • Don’t display negative results
  • Option results are calculated by the uncertainty index to the simulation result
  • Return the results in 0.3 of a second

These are easily testable, manually or in an automated fashion. (NOTE: there is a more sophisticated method called Given-When-Then from Specifications by Example by Gojko Adzic that allow these tests to be more easily automated in tools such as Cucumber.)

Recommendations for a PM to start using an Agile Approach

NOTE: This post originally appeared on my former BoosianSpace blog on 2 Nov 2011.

I have a colleague here at EPA, she was very interested in getting started using an Agile approach to help her produce a better project (or really the software application they were to build).  I’m going to repeat and expand our conversation in the hope that it may prove useful for others.  This is set of recommendations I had for her based on her context.  I’ll start with a brief overview of that context.

Project Context

This will be a greenfield development project; i.e. there is no legacy code to worry about or legacy data to migrate. It is intended to be a public facing web application.  The infrastructure is fairly open, but will need to incoporate GIS services that deal with watersheds; if she finds it useful she may utilize a cloud service to host this.  As a public site, it needs to comply with Section 508.  There is some consideration for a mobile app as well…

The development work will be done by an offsite contractor; this contractor to her knowledge has not done any Agile development projects as of yet.  The GIS services portion will most likely be developed by a specific subcontractor on that team and another may provide UX.  Her biggest constraint will most likely be available funds; schedule and scope (hopefully) can both be in play.

Finally, she has a very interested Govt product owner and a group that is interested in participating that represents stakeholders of urban watersheds.  The target goal is to have people represent the interests and activities occurring at particular local, small watersheds and then utilize GIS services to identify the larger watersheds these are a part of so the relevant groups of interest (e.g. Chesapeake Bay Foundation) can be portfolio managers of these watersheds and provide upper level guidance.

Initiation

She was interested in getting started.  I had previousl recommended some books/directions for learning and I’ll repeat some of those here as they apply.  But she was interested in some specifics.

My first recommendation was to develop a project charter; this should have the following:

  • A description of the project goals and risks
  • A relevant ranking of these goals from a simply worded measurement perspective; I recommended having 4-6 of these in the Project Success Sliders format.  This allows people to understand that trade-offs must be made.  How she gets there should be a facilitated project charter discussion.
  • If needed, a description of roles and responsibilities of the organizations contributing to/participating on the project.  I’m not sure this is necessary given she has an engaged stakeholder.

Finally, I’d suggest that the project charter have a high level prioritized set of functional areas/epics as a roadmap for what will be developed (essentially it shoudl include a release plan).  If she is able, she can get a high level of estimates in time for these with the potential contractor, add some management reserve and then calculate the amount of funds to develop and a maintenance estimate for this with a presumed product life-cycle of 5 years.  The roadmap should have the highest business value and riskiest items first, then simply high business value, then simply risky, and lastly other items that are desired.  This will ensure that risky items have less opportunity to hold-up the project in its delivery of high business value items.Once funded, the roadmap will have only a subset of activities that can be met, this will become the release plan.

Contract Considerations

I’d recommend doing this as time and materials with an award fee.  The bid should be in two parts: the initial development to release 1.0 and the long-term software maintenance of the resulting application as an option.  This perhaps could be a fixed yearly cost.  I recommended a warranty period (perhaps 60 days) to assess how well the application is doing from a quality standpoint. Depending on how good (or bad) the application turns out to be, you can execute this option.  Really good, execute it if it is a good deal.  It also could potentially give you a point for renegotiating.  If it is really bad, the team that developed is probably the best team to maintain it also, but there would need to be some incentives around improvement.

I recommended that the contract call for a dedicated team and that team’s full participation along the entire development project AND the optional maintenance component if executed.

I’d make the award two-fold, the execution of the maintenance option is one.  The bigger one though is that if the contractor delivers under budget (the contract ceiling on development) and the quality is an acceptable level; then the remainder of the funds get split in half – the contractor gets that as pure profit, the agency can deobligate the other half and use it for something else.  It’s a win-win.

These above recommendations need to be worked out with the Contracts Office.

Recommended Agile Approach

I recommended starting with Scrum as an Agile Project Management framework. I made this recommendation based on a few things:

  • It is lightweight and supports rolling wave planning so that detailed tasks can be articulated just-in-time
  • It looks as though there will be an engaged product owner and a set of actual users that can be tapped to provide rapid feedback
  • Given her and the product owner will get a set amount of funding, which will then layout what prioritized epics that can be accomplished, she will need to be able to measure progress; Scrum’s velocity technique is useful for this.  As an initial start, I recommended 2 week Sprints and if the team finds they aren’t making what they pull in consistently regardless of how many that are pulled in, perhaps shorten the iteration time to one week.
  • Given the stakeholder audience and non-familiarity with techniques such as planning poker, I recommended the concept of inch-pebbles, all tasks/stories should be broken down in Sprint Planning to something that will last no longer than 2 normal workdays or less as a work estimate.
  • The initial Sprint planning session should be expected to be about 4 hours.  All remaining ones could be planned to be 2 hours.  The Sprint Plan should be the prioritized backlog of stories/tasks and also identify when subject matter expertise is needed.  This will allow an estimate of when these people need to be available to provide information as requirements.
  • Sprint Reviews should be scheduled for about 2 hours and consist of a demo of ‘done’ software; the definition of ‘done’ should be very clear and agreed upon by all parties.  I’d recommend ensuring it is deployable ready software.  It’s been coded, tested, added to the build, and had some amount of regression testing done.  Again because of lack of expertise, I’m not counting on any continuous integration or automated test suite to be activated.  Regression testing will be smoke tests in reality.
  • Retrospectives should be scheduled for about 2 hours and directly follow Sprint Review.   This needs to be sold to the product owner as how the team can improve AND possibly deliver more or deliver what can be done at lower cost while still maintaining quality.  It won’t guarantee it, but it will improve the chance it will happen. I recommend conducting the Retrospectives using the format described in Esther Derby and Diana Larsen’s book Agile Retrospectives.
  • Try and make the Retro immediately follow the Review and the next Planning session immediately follw the Retro.
  • Plan for a daily stand-up of 15 mnutes max.  Investigate some form of teleconference/vid conference capability for this.  Do the same with some on-line white-boarding, mind-mapping, etc.  for the Sprint Planning/Review and Retros.  The team will be distributed.  If possible, try and bring as much of the team together as possible for the Review/Retro/planning sessions.

Recommended Technical Practices

I’ll conclude with a set of technical practices I recommended.  Due to the team’s lack of experience,I didn’t recommend too many; I tried to focus on a few key items that ensure the team delivers high quality software and that what ever is delivered meets the requirements specified.  The entire scope may not be completed, but you want it working correctly and properly.

Good SOLID principles will be the foundation.

Use of Specifications by Example (see book of the same title by Godjko Adzic), whether automated with Cucumber, Lettuce, Fitnesse, or JBehave or performed manually will ensure the software meets requirements.  It makes it easy to iterate over the requirements as well.

Develop for a scenario at a time.  Use unit testing and develop tests before coding.  Once a test has been written, check into the source code repository.  Once the code has been written and passes the test, check it into the source code repository. Iterate until the scenario test passes.  Move onto the next scenario.  Once all scenarios have passed, move to the next requirement/example set.

At least every 2-3 days have someone execute elected sets of the specs fully for the entire app to date to see if any bugs have crept in as new features get implemented.

Use an issue tracker for the tasks/stories and also to track any bugs that show up during regression tests; I recommend Trac.  It’s OSS and works well.  It allows implementation of a pull process so that as developers bring stories into work from the backlog and they immediately get assigned as owners.  This is useful for the PM (Scrum Master) to see what is being worked at any point in time.

I hope folks find this useful for how to ease into being Agile if they have a similar context.  There is no one size fits all approach though, so consider this as just one approach.  because there is a clear product owner, interested user representation, and it is a greenfield project, this was my set of recommendations to try initially.  Using Retrospectives, hopefuly the team will adapt what I described above to meet their needs.