Driving Cost Out of Engineering, Part I

To drive cost out of engineering in a software company requires a thoughtful and thorough approach, well beyond layoffs, because engineering costs are far more than a simple labor coefficient.
- Ed Carroll, Sales Executive, Agilis Solutions

June 21, 2006

Introduction

Several financial experts have recently predicted more mergers and acquisitions and fewer initial public offerings (IPOs) in 2006 and beyond, due largely to the increased cost of public capital, driven higher as a result of the Sarbanes-Oxley Act of 2002. Companies seeking capital are expected to find private equity more attractive. On the other hand, companies in need of capital that cannot find it through a private placement might become attractive as an acquisition target.

Once a merger is completed, it is common to look for ways to leverage undervalued assets and cut unnecessary costs. In a software company, engineering can easily find itself at both ends of the stick--being expected to produce more while cutting costs. However, cost cutting in engineering can be disastrous if managed poorly, without a clear understanding of the impact of the cost reduction on the business. Managed well, driving cost out of engineering can galvanize the company.

The High Cost of Personnel

In a software company, most engineering cost is from personnel (salaries and associated burdened cost). One obvious solution to reduce cost is to lay off unnecessary personnel. Although I have focused much of my effort as an engineering executive on building highly effective teams, even I will admit in hindsight that when forced to make cuts I was usually able to find positions that did not need to be filled or could be eliminated without too much negative impact (other than the humane one). However, there is a definite point-of-no-return where further cuts would make it impossible to produce the desired products at an acceptable schedule and quality level. The trick is to always keep that magic level below the staffing level by increasing the efficiency of the overall engineering program.

A common mistake is to make too close a connection between specific work and a specific individual (whom we will call “Joe”). If work stops when Joe is sick or on vacation, then the work is too tightly coupled to the individual. If the company is forced to cut Joe, his work would stop altogether. If an old piece of Joe’s code should break, Joe must stop work he is doing on a new product to fix the problem--and you get a double whammy: impact to the existing customer and to the new-product schedule. Even worse, at some point, Joe may be so busy fixing problems in existing code that he is no longer available for new-product development.

An obvious solution is to disassociate specific work from a specific individual. What if, instead, code is so well commented that every engineer could work in the same code base with ease, every engineer was familiar with the same development tools, and every engineer was trained in group standards and procedures? Then work assignments could be handed out based on considerations closer to the business need or domain problem. Consider the alternative: the most pressing business problem is handed to the weakest member of your team, simply because he is the only one who can understand the code he wrote. If it is the most pressing problem in the business, shouldn’t the most talented engineer be assigned?

Another common mistake is when no one looks at anyone else’s code. The thinking seems to be that we will know whether Joe is a good designer/coder when the customer complains about his module. But how do you really know if you do not actually inspect Joe’s design/code? And if Joe never sees anyone else’s design/code, how is he to learn what good code should look like? Not only are some people better coders than others, but some are better analysts or designers or quality-assurance engineers. Some people are good at analysis, but not at design; some at design and code, but not QA; and so on. Since every person is unique, it does not make sense to assume that every member of the team will be equally good at every part of the software development life cycle (SDLC). SDLC methodologies, engineering standards, project management, quality assurance, change management, configuration management, and version-control procedures are all techniques to compensate for lack of abilities and individual differences in talent.

Implementing and maintaining SDLC methodologies, engineering standards, and project management, quality assurance, change management, configuration management, and version-control procedures are difficult to do well. This is one reason so many executives consider the use of offshore resources. Some executives figure that the lower cost of an offshore team can cover the cost of a dysfunctional local team--but this is a bad assumption. A highly cost-efficient engineering team, local or offshore, is one where its members' talents are used optimally. This is important because the cost of poor quality is much higher than the cost of labor.

The High Cost of Building it Wrong

Some software products seem to contain a lot of bugs, particularly in early releases. This might be because engineering teams are often forced to work toward release deadlines that were unreasonably set (for a lot of reasons). To meet the deadlines, team members cut corners, skip steps or fail to look hard for mistakes, or management releases the product with known bugs. Errors in software code cost much more than most people realize. To start with is the cost of identifying that an error exists, plus the cost to fix the error. If the error is identified during the requirements phase, then the cost involves the requirement-review effort, the changes to the requirements documentation and the cost of any cascading error identification and changes, plus a re-review.

If the error is identified during design, the cost involves a similar scenario with design documentation, cascading effects to other components because of the change, plus a re-review. However, there is the potential that the error could cause changes to requirements as well as design, resulting in a double cascading effect.

Errors identified in the coding phase have the potential for a triple cascading effect--requiring changes to requirements, design and code. Errors identified in implementation similarly have the potential for a quadruple cascading effect. This cascading effect of ever-increasing cost is why it is so important to develop software correctly the first time. This is also why good software-engineering processes are so important. Their objective is to ensure that every code object or line of code is written correctly the first time--from the perspective of user requirements, to design elegance, to ease of integration, and system compatibility, scalability, and performance--no matter who is performing the work involved.

Once the product has been released, errors continue to add cost to the product. The number of errors found in the field or by the customer dictates the number and sophistication of the product-support team: the more errors found, the more support personnel required to keep the customer happy; the more complex the problems, the more expensive the support personnel required.

Errors found by customers have an added effect on revenue in addition to increased cost. Customers of buggy products will likely not volunteer to give referrals or aid more sales, and customers will be likely to replace the buggy product sooner than later. Product longevity is hugely influenced by the number of errors a customer experiences. It can all be boiled down to cost efficiency. Every error (found or unfound) raises the cost of the product, and when discovered by customers, reduces revenue potential.

One obvious solution is to never set unreasonable release dates, but this is unrealistic. Establishing effective (and efficient) engineering processes to optimize personnel (compensating for team-member differences) is or should be chiefly about keeping cost to a minimum. Cost is kept lowest by using resources most efficiently and by finding and eliminating errors as early as possible in the SDLC.

The Low Cost of Building it Right

The Software Engineering Institute (SEI) of Carnegie Mellon University has documented a 39 percent productivity improvement in overall operations for engineering teams with well-established continuous process improvement programs (CPI) as compared with teams that rely solely on the talent of their members. Whether one implements a CPI program according to the SEI or some other CPI program, it probably does not matter in the larger scheme of things. What is important is that a 39 percent productivity improvement across a large engineering team significantly overcompensates for the cost of implementing effective engineering processes and tools.

I have heard several knowledgeable people question the perceived cost of implementing effective engineering processes. Consider that the 39 percent improvement figure includes the cost of those tools and techniques. It is very difficult to nail down the specific cost of implementing software-development tools and techniques from one engineering team to another. This is because some of the tools and techniques would likely be implemented with or without a special focused effort, and often many members already have experience in the common tools and techniques to be deployed. And then, as learning occurs (assuming the tools and techniques are right for the job) operations improve gradually and implementation cost are offset by productivity gains. A 39 percent productivity improvement across a large engineering team will pay for a lot of tools and techniques.

Predictability: The prior example regarding unreasonable release dates can be solved with an accurate estimating process. Typical estimates at the start of a release (before requirements are fully known) can be off by as much as 400 percent, according to industry benchmarks. That means that a project estimated at six months might actually take up to two years to complete, or cost five times what was planned. These poor estimates are developed by managers who do not ask the engineers for their input into the estimate or base the estimate on their own anecdotal experience (“gut feel”) from supposedly previous work.

In contrast, accurate estimates are based on extensive documented historical data, require the use of a measuring technique that breaks work down into recognizable components (WBS elements, use cases, functions, transactions, etc.), evaluates components for complexity, considers technical and environmental impact (language, tools, object reuse, experience and motivation of the team, closeness to the domain experts, etc.), and adjusts the estimate further for overall risk factors (stability of the requirements and other unknowns). There are engineering teams that regularly estimate project cost and schedule as described here at the start of every release cycle and with an accuracy of greater then 90 percent. That would be a less than 10 percent error at project completion, as compared to a 400 percent error. Think about it: what would a 400 percent error in estimate do to the cost of goods sold (COGS) in your company? Now compare that with a 10 percent error, and decide which would be more palatable to the CFO and CEO.

Flexibility: Building software is more like writing a book than manufacturing a laptop computer. There is a certain structure like the table of contents or book outline to a program, which can guide the development. And certain parts of software programs are often drawn from, reused from, or copied from previous efforts.

Some software engineers have lambasted the implementation of engineering processes for as long as engineering processes have been around. I have personally been accused of "limiting creativity" or for "trying to turn this place into HP(!)" as I laid down another rule for yet another process. But these complainers miss the larger picture--the cost vs. product picture, or the "let’s not reinvent the wheel" picture. In fact, this is a major difference between an engineered solution and a non-engineered solution; engineering should consider the whole system picture. The trouble is, the whole picture is often not visible at any point in time along the SDLC (sometimes, not even at the end of the project). Therefore, if the whole picture is not visible, how does one proceed? Certainly stopping or waiting until all of the requirements are defined is a proven way to kill a large project. The answer is flexibility.

The Agile Alliance describes several processes for staying flexible during software development. Key to these flexible processes are the concepts of staying in close collaboration with the domain experts (end users, clients), keeping the domain experts deeply involved throughout the SDLC, and iterating through small incremental releases that build collectively toward the public product release. If one were to never estimate the larger project, only focusing at the next two-day or one-week release, one would never know when the project was complete until it was too late and well over budget. This is one of the common misunderstandings with Agile development. If instead--as part of each iteration--both the immediate iteration and the entire release are re-estimated (based on the new information from work completed to that point in the project), then real tradeoff decisions can be made to ensure total project success.

To successfully re-estimate a project on a daily/weekly basis requires an effective estimating process based on solid and deep historical data and an automated tool that enables an estimate to be created quickly and easily (see my previous article: Estimating Software).

What does it cost to lock down requirements or freeze design on a large project? The conventional wisdom is that success demands that no changes be made to the project requirements. Like errors, changes made to requirements can have an ever-increasing cascading effect on a project the deeper into the SDLC these changes are made. The difference, however, should be that change is intentional and the cost of the cascading effect should be taken into consideration before proceeding, but errors are not intentional and the cascading effects are often hidden. The cost of making a change can be estimated, but the cost of an error is usually a surprise.

I have heard well-meaning people say, “Keep the users away until the software is complete.” They seem to be afraid that the users will want changes, and I guess they cannot handle changes. I have even heard senior engineers say, “Keep the users away so that they will not change the requirements.” But what if we taught our engineers that making changes is simply the process for “getting it right”? I love this example that Ward Cunningham once gave me in a coffee-shop conversation:

"A lot of programmers are afraid to make schema changes. What if they were to start a project with the simplest schema. Then as the project evolved, they would be forced to change the schema many times. By the end of the project, they would no longer be afraid to change schema.”

What a great example of designing with simplicity and developing with courage! Think of the cost of the alternative--building a system that users do not want, will not use, will not buy! What is the value of a software product that does not sell?

Build versus buy: It is important to note that there is a significant time investment, as well as cost, involved in implementing processes. Highly effective engineering teams are typically a product of slow evolutionary change (continuous process improvement), rather than dictated efficiency (the natural aversion of the subjugated to the actions of a dictator tends to cancel out any productivity gains). Many techniques need to be tried across a few projects before it is confirmed that they are right for that team, that set of products, and that environment. It typically takes several projects before a new process is ingrained into the whole SDLC methodology. And then there will be processes that need to be thrown out and replaced (which starts the process all over again).

SEI tracks the typical evolution of an engineering team through its assessment program as averaging over five years to achieve the highest level--the 39 percent productivity improvement that comes when the team is operating at its highest level of optimization). Most engineering teams, however, never achieve that level.

When a merger contract is signed, should a company wait for five years as the engineering team is built to a high level of efficiency? Perhaps. It may be critical to company success for the company to have a highly effective engineering team. However, those who negotiate a merger are unlikely to be willing to wait five years to achieve their projected cost savings.

An alternative to building a highly effective internal engineering team is to buy one. Some companies specialize in providing software-engineering services and can assemble highly effective engineering teams in a matter of weeks, not years. The real trick is understanding that to engage another company to outsource engineering is a serious partnership, requiring significant scrutiny to ensure that the outsourcing company truly has the highly effective engineering-team capabilities it claims. Compounding the challenge is the fact that many of the companies that build such teams are located outside the United States, perhaps related to the fact that most teams assessed at the highest level by SEI--175 out of 180--are also located outside the United States.

Deciding to outsource engineering is nothing new, but the decision to do so should be carefully and systematically thought through (see my article: Why Outsource?). On the other hand, to have a team operating at 39 percent higher productivity, producing fewer (or no) errors through predictable and flexible processes, and priced at offshore rates--this can be a very compelling proposition.

Conclusion

To drive cost out of engineering in a software company requires a thoughtful and thorough approach, well beyond layoffs, because engineering costs are far more than a simple labor coefficient. The cost of errors in a product can be substantially more than the cost of labor. Certainly, with more mergers and acquisitions occurring in the future, there will be cost-cutting measures applied, but layoffs are only one technique, and not necessarily the best when compared to the productivity gains from an optimized engineering team. Processes that find and eliminate errors early in the SDLC are much more cost effective, in both the short run and the long term.

Processes that accurately predict project costs, that can be easily scheduled, and that can enable quick planning in other departments and up-to-the-minute tradeoff decisions to be made effectively can help to eliminate costly overruns or irreversible (wrong) decisions. Processes that can enable flexibility in requirements and design give the team options and the end user the system he or she really wants to buy--translating into greater revenue and product longevity. Once again, it all boils down to money. And the ultimate question might be how to most cost-effectively obtain that highly effective engineering team: to build it internally (over a period of years) or to partner with a firm specializing in supplying highly effective software engineering teams to deliver high-quality software products.

About the Developer

Ed Carroll
Sales Executive, Agilis Solutions

Ed has been building software products for more than 20 years, with particular expertise in automating economic analyses, decision support and supply-chain management processes. He is currently a sales executive with Agilis Solutions and has provided strategic technology leadership in roles such as the vice president of engineering for Egghead.com, director of technology at Nike and director of software engineering at Boeing.

Contact the author.



  Privacy Statement