The Mythical Man-Month

This classic book was written by Frederick P. Brooks, Jr. who also wrote the famous 'No Silver Bullet' article. This attempt at a summary was done by Bernard I. Ng in 1990 while in Sun's CIM Technology Group.

Table Of Contents

Chapter 1: The Tar Pit
Chapter 2: The Mythical Man-Month
Chapter 3: The Surgical Team
Chapter 4: Aristocracy, Democracy, and System Design
Chapter 5: The Second-System Effect
Chapter 6: Passing the Word
Chapter 7: Why did the Tower of Babel Fail?
Chapter 8: Calling the Shot
Chapter 9: Ten Pounds in a Five Pound Sack
Chapter 10: The Documentary Hypothesis
Chapter 11: Plan to Throw One Away
Chapter 12: Sharp Tools
Chapter 13: The Whole and the Parts
Chapter 14: Hatching a Catastrophe
Chapter 15: The Other Face
Epilogue

Chapter 1: The Tar Pit

A ship on the beach is a lighthouse to the sea. (Dutch Proverb)

Large system programming is likened to a scene from prehistory with dinosaurs, mammoths and sabertooths struggling in tar pits, and sinking. Few systems have met goals, schedules and budgets. We must understand this problem before we can solve it.

The Programming Systems Product - A Program is the object an individual uses in estimating productivity, it is ready to run in the programmer's environment. The Product can be tested, repaired and extended by anybody, is usable in many environments for many sets of data. It is generalized and documented, and costs at least 3 times more than a program for the same functionality. The System is a collection of interacting programs, I/O must conform in syntax and semantics to precisely defined interfaces. Testing is extensive, grows combinatorically, and causes the cost to increase at least 3 times. A Systems Product is a truly useful object but costs at least 9 times as much as a Program.


Program | Programming System
effort x1 | x3
--------------------------------+---------------------------------
x3 | x9
Programming Product | Programming System Product

The Joys of the Craft:

The Woes of the Craft:

My Conclusion: Learn from the mistakes of others, spend effort understanding the nature of the work, costs, tradeoffs etc.


Chapter 2: The Mythical Man-Month

Good cooking takes time. If you are made to wait,
it is to serve you better, and to please you.
(Menu of Restaurant Antoine, New Orleans)

Most Software Projects are late:

  1. Our techniques of estimating are poorly developed,
  2. Our estimations confuse effort with progress, hiding the wrong assumption that men and months are interchangeable,
  3. Due to uncertainty in estimates, Software Managers hesitate to be as stubborn as Antoine's chef,
  4. Schedule progress is poorly monitored, techniques used in other engineering fields are radical innovations in the software world (also, half a building is more obvious than half an operating system),
  5. When the schedule slips, the first response is to add manpower, like gasoline to a fire.

Optimism - All programmers are optimists. Perhaps the hundreds of nitty frustrations drive away all but those who habitually focus on the end goal. Perhaps it is merely that computers are young, programmers are younger, and the young are always optimists. How often do we hear, "This time it will surely run" or "I just found the last bug". This is the 1st false assumption underlying the scheduling of systems programming that 'all will go well'. Dorothy Sayers, in "Mind of the Maker", divided creativity into 1) idea, 2) implementation, 3) interaction. Many creative activities deal in intractable media; paint smears, wood splits, and electronic components fail. But computer programming, however, creates with an exceedingly tractable medium. A programmer builds from pure thought-stuff: concepts and very flexible representations thereof. We expect few difficulties in implementation, hence the optimism. But our ideas are faulty and we have bugs, hence our optimism is unjustified. For each task, there is a probability that all will NOT go well. For a large project, the probability that each of the many tasks will go well is extremely small.

The Man-Month - Cost varies as the product of the number of men and the number of months. Progress does NOT. Hence the man-month is a dangerous and deceptive unit of measure for software projects. Men and months are only inter -changeable when a task can be partitioned among workers with no communication between them. This is true of plucking fruits but NOT even approximately true of systems programming. When a task cannot be partitioned, application of more effort has NO effect on the schedule. The bearing of a child takes 9 months no matter how many women are assigned. Even in partitionable tasks, communication costs have to be added for additional manpower, for training and inter-comm. Training costs, in the technology, goals, strategy and plan, are linearly proportional to the additional manpower. Intercommunication costs grow worse for each addition, especially if each subtask must be separately coordinated with each other subtask ( n(n-1)/2 ). Since software is inherently a systems effort, an exercise in complex interrelationships, communication effort dominates the decrease in individual task time brought about by partitioning. Adding more men then LENGTHENS, NOT SHORTENS, the schedule.

Systems Test - Due to our optimism, this is the most mis-scheduled part of programming. A rule of thumb for scheduling a software task:

 /-------------------------------------------------------\
| 1/3 planning |
| 1/6 coding |
| 1/4 component test and early system test |
| 1/4 system test with all components done |
\-------------------------------------------------------/
Note: 1) The 1/3 devoted to planning is larger than normal. Even so, it's barely enough to produce a detailed, solid specification. If research or exploration of totally new techniques is required, add more time in proportion to coding. 2) The 1/2 added to debugging is much larger than normal, and ALWAYS needed. 3) The part that is easy to estimate, coding, is given only 1/6. Few projects have allowed 1/2 the schedule for testing, most end up spending the time anyway. Failure to plan for system testing is especially disastrous since bad news, late and without warning, is unsettling to managers and customers, and most projects are on schedule until and except in system testing.

Gutless Estimating - For the programmer, as for the chef, urgency of the patron may govern the scheduled completion of the task, but it CANNOT govern the actual completion. An omelette promised in 2 minutes affords the customer two choices, wait or eat it raw. The cook could also turn up the heat and serve a half-burnt, half-raw omelette. False scheduling to match a patron's desired date is more common in our discipline than elsewhere in engineering because it is VERY DIFFICULT to make a vigorous, plausible, job-risking defense of an estimate that is derived by no quantitative method, supported by little data, and brought about by the hunches of developers. We need to publicize productivity and bug incidence figures, estimating rules etc. And until such become commonplace, we need to stiffen our backbones and defend our estimates with the assurance that poor hunches are better than wish-derived estimates.

Regenerative Schedule Disaster - What does one do when an essential software project is behind schedule? Add manpower naturally. This may or may NOT help. We have to demythologize the 'man-month'. Oversimplifying outrageously ...

Brook's Law: "Adding manpower to a late software project makes it later."

The maximum number of men depends on the number of independent subtasks. The minimum number of months depends on the sequential constraints of the project. From these 2 quantities, one can derive schedules using fewer men and more months with the only risk being product obsolescence. One cannot get workable schedules using more men and fewer months, more software projects have gone awry for lack of calendar time than for all other causes combined.

My Conclusion:
Good thing no cheap, cheap thing no good.
(anonymous Singapore Armed Forces NCO)


Chapter 3: The Surgical Team

These studies revealed large individual differences between high and low performers, often by an order of magnitude.
(a study on programmer productivity by Sackman, Erikson and Grant)

The Problem - A study was made on a group of experienced programmers and the ratios between the best and worst performances averaged about 10:1 on productivity measurements and an amazing 5:1 on program speed and space metrics. In short the $60,000/year programmer may well be 10 times as productive as the $30,000/year one. The converse may also be true since the data showed no correlation whatsoever between experience and performance (although this is counter-intuitive). Young programming managers are in favor of having small sharp teams. The problem with the small, sharp team concept is that it's too slow for really big systems. The cruel dilemma is: for efficiency and conceptual integrity , one prefers a few good minds doing the design and construction, yet for large systems, one wants a way to bring considerable manpower to bear.

Mill's Proposal - Harlan Mills suggests a 'surgical team' approach.

How it works - Many professional minds are at work on the problem, but the system is a product of one mind, or at most two, acting uno animo. 1) There is conceptual integrity. 2) Inevitable differences of judgement are settled by the surgeon unilaterally. These 2 differences, lack of problem division, and the superior-subordinate relationship make it possible for the surgical team to act uno animo. Yet the specialization of function is the key to efficiency, since there is a radically simpler communication pattern among the members.

Scaling up - The problem, however, still exists for very large systems. A 10-man team can be effective no matter how it's organized. Success of scaling-up depends on coordinating as many surgeons as the number of teams required since conceptual integrity of each piece has already been radically improved within each team. For conceptual integrity of the entire system, a sharp distinction has to be made between architecture and implementation, and the overall system architect must confine himself scrupulously to architecture.

My Conclusion: Although some of Mill's ideas are obviously dated, the essence is still true in that conceptual integrity and specialization of function contribute a lot to efficiency and success of a team.


Chapter 4: Aristocracy, Democracy, and System Design

This great church is an incomparable work of art. There is neither aridity nor confusion in the tenets it sets forth ... (comment on architectural integrity of the Reims Cathedral)

Conceptual Integrity - Many European Cathedrals show differences in plan or architectural style between parts built by different generations. Even though programming systems have not taken centuries to build, they reflect conceptual disunity far worse than that of cathedrals. This arises not from a serial succession of master designers, but from the separation of design into many tasks done by many men. Conceptual integrity is THE most important consideration in system design. A consistent deficient system is arguably 'better' than one that contains many good but independent and uncoordinated ideas.

Achieving Conceptual Integrity - The purpose of a programming system is to make a computer easy to use. Ease of use is enhanced only if the time gained in functional specification exceeds the time lost in learning, remembering, and searching manuals. The ratio of function to complexity is the ultimate test of system design. Neither function nor simplicity alone defines a good system. Function aside, simplicity is NOT enough, things should be straightforward as well. It is NOT enough to learn the elements and rules of combination; one must also learn the idiomatic usage, a whole lore of how the elements are combined in practice. Simplicity and straightforwardness proceed from conceptual integrity and unity of design, which are dictated by ease of use.

Aristocracy and Democracy - Conceptual integrity dictates that the design must proceed from one mind, or from a few agreeing resonant minds. Schedule pressures dictate that system building needs many hands. One way to achieve this fanout is to structure the teams as shown above. The second is a careful division of labour between architecture and implementation. Architecture is the complete and detailed specification of the user interface. The ARCHITECT is the user's agent, bringing professional and technical knowledge to bear in the unalloyed interest of the user. Aristocracy vs. democracy is a deeply emotional question. Is all the creative, fun work left to the intellectual elite? Won't we get a better product by getting all the good ideas from all the team. Often the fresh concepts come from an IMPLEMENTOR or a user. But good features and ideas that do not integrate with a system's basic concepts are best left out. If there appear many such important but incompatible ideas, one scraps the whole system and starts again on an integrated system with different basic concepts. Aristocracy for the architecture needs no apology because of the importance of conceptual integrity. But the design of an implementation, given an architecture, requires and allows as much design creativity, as many new ideas, and as much technical brilliance as the design of external specifications. The cost-performance ratio of the product will depend most heavily on the implementor, just as ease of use depends most heavily on the architect. An artist's aphorism asserts,"Form is liberating." The external provision of an architecture enhances, not cramps, the creative style of an implementation team. They focus at once on the part of the problem no one has addressed, and inventions begin to flow.

What Does the Implementor Do While Waiting - When it is proposed that a small architecture team write all the specifications for a software system, implementors raise 3 objections:

The 1st objection is treated in the next chapter , the 2nd is an illusion as argued above, the last objection is one of timing and phasing. As Blaauw points out, total creative effort involves 1) architecture, 2) implementation and 3) realization, which can in fact be begun in parallel and proceed simultaneously. The implementors can devise algorithms, build tools, and set up a framework for their coming work (see my conclusion).

My Conclusions:

  1. Architecture and implementation should be clearly separated, even when it is done by the same people.
  2. Conceptual integrity is more important than immediately apparent because the ease of use and cost savings are tremendous in the long term.
  3. We are far from having a perfect software development environment, one where the creation of robust software is only as difficult as specifying its highest-level behavior. As such, implementors (as opposed to architects), should devote their time and energy to enhancing their development environment and improving the quality and quantity of their reusable software base.

Chapter 5: The Second-System Effect

Add little to little and there will be a big pile. (OVID)

If we separate the responsibility for functional specification from the responsibility for implementing a fast, cheap product, what discipline bounds the architect's inventive enthusiasm? Continuous, careful, and sympathetic communication between architect and implementor.

Interactive Discipline for the Architect - The architect of a building works against a budget, using estimation techniques that are later confirmed or corrected by a contractor's bid. Software architects usually only have one contractor but can have early and continuous communication, which gives the architect good cost readings, and the implementor confidence in the design, without blurring the clear division of responsibilities. When an estimate seems too high, the architect can cut the design down, or challenge the estimate by suggesting cheaper alternatives. The latter is inherently emotion-generating so the architect should:

  1. Remember that the implementor has the inventive and creative responsibility for the implementation so SUGGEST, not dictate.
  2. Always be prepared to suggest a way of implementing anything specified, and be prepared to accept any other way that works as well.
  3. Deal quietly and privately in such suggestions.
  4. Be ready to forego credit for suggested improvements.
Normally the implementor will counter by suggesting changes to the architecture. Often he is right, minor features may have unexpectedly large real costs.

Self-Discipline - The Second-System Effect - An architect's 1st work is apt to be spare and clean. He knows that he doesn't know what he's doing, and proceeds carefully and with great restraint. As he designs his 1st work, frills and embellishments get stored away to be used 'next time'. He finishes the 1st system, has firm confidence and a demonstrated mastery of that class of systems, and is ready to build a second system. This second is the most DANGEROUS system a man ever designs. When he does his 3rd and later ones, his prior experiences will confirm each other as to the general characteristics of such systems, and their differences will identify parts of his experience that are particular and NOT generalizable. The general tendency is to over-design the second system, using all the ideas that were cautiously sidetracked on the 1st one. The result, Ovid says, is a 'big pile'. An architect obviously cannot skip the second system but he can be conscious of the peculiar hazards of that system, and exert self -discipline to avoid functional ornamentation and to avoid extrapolation of functions that are obviated by changes in assumptions and purposes. A discipline that will open an architect's eyes is to assign each little function a value in terms of development and maintenance costs. A project manager can avoid the second-system effect by insisting on a senior architect who has at least two systems under his belt, staying aware of the special temptations and asking the right questions to ensure that the philosophical concepts and objectives are fully reflected in the detailed design.

My Conclusions:

  1. Architecture should be designed in an interactive manner between architects, implementors and users.
  2. An architect should be aware of the second-system effect and NOT overdesign functionality that costs more than it's worth.


Chapter 6: Passing the Word

He'll sit here and he'll say, "Do this! Do that!" And nothing will happen. (Harry S. Truman on Presidential Power)

How can 10 architects maintain the conceptual integrity of a system which 1000 men are building?

Written Specifications - the Manual - The manual is a necessary tool for external specification of a product, it is the architect's chief product. It is used to get feedback from users and implementors to show where the design is awkward to use or build. It should be dated and versioned so that implementors can fit it into their schedule. The architecture manual must describe everything the user sees, and refrain from describing anything the user does not see. The style must be precise and fully detailed, not necessarily lively. Consistency of prose should be maintained by one or two people.

Formal Definitions - Formal definitions are precise and tend to be more complete, gaps show more conspicuously and are filled earlier, but they are less comprehensible. English prose should be mixed in to show structural principles, delineate structure in stages or levels, and give examples. They complement formal definitions by marking exceptions, emphasizing contrasts, and explaining why. Formal definitions and English prose must be used in a primary-secondary manner, with one form being the authoritative word on any matter, so as not to create confusion. Much care must be taken not to over-prescribe with formal definitions, otherwise it will start to dictate implementation issues.

Direct Incorporation - A technique for disseminating and enforcing definitions is to use include files and macros to design the declaration of passed parameters or shared storage, and require implementations to use them.

Conferences and Courts - Meetings are necessary. Hundreds of man-to-man consultations must be supplemented by larger, formal, weekly gatherings. And for extremely large projects, week-long biannual round tables might be necessary. Anyone can propose problems or changes, but proposals must be distributed in writing before the meeting. Emphasis is on creativity rather than decisions. A few solutions are passed to one or more architects for detailing into precise ECOs. These come up for decisions, after being circulated with pros and cons well delineated. If consensus is not reached, the chief architect decides. Keep minutes, and distribute them promptly. 1) Same group meets for months, so nobody has to be brought up to speed. 2) Everyone is authorized to make binding commitments (no advisory roles). 3) Solutions to problems are sought both within and outside the obvious boundaries. 4) Formality of WRITTEN proposals focuses attention, forces decisions, avoids committee-drafted inconsistencies. 5) Clear vesting of decision-making power in chief architect avoids compromise and delay.

Multiple Implementations - Architects, if they are a separate group, should have enough time to work, and as much political clout as implementors. One of the causes and/or results of having the time and influence, is the need and/or capability to build multiple implementations of the architecture. The architecture will be cleaner and discipline tighter if at least two implementations are built initially.

The Telephone Log - As implementation proceeds, countless questions of architectural interpretation arise, such questions and the corresponding answers should be made available to everyone involved through a logging mechanism.

Product Test - A product test group is a product manager's best friend and adversary at the same time. This surrogate customer, specialized for finding flaws, will find where specifications were not met, design decisions were not understood or accurately implemented. They are a necessary link in the chain by which design is done, and need to operate early and simultaneously with design.

My Conclusion: The telephone log, or generally a 'Question & Answer Log', seems like a good idea. No mention of requirements gathering though.


Chapter 7: Why did the Tower of Babel Fail?

Intro:
... Then they said, "Come, let us build ourselves a city with a tower whose top shall reach the heavens, so that we may not be scattered all over the earth." ... Come, let us go down, and there make such a babble of their language that they will not understand one another's speech." Thus the Lord dispersed them from all over the earth, so that they had to stop building the city.
(Genesis 11:1-8)

A Management Audit of the Babel Project - By the book of Genesis, the tower of Babel was man's 2nd engineering undertaking after Noah's Ark. Babel was the 1st engineering fiasco. 1) It had a clear, impossible mission, but it failed long before this limitation. 2) Lots of manpower. 3) Lots of material. 4) No time constraints. 5) Adequate technology existed. What caused it to fail were 1) COMMUNICATION and 2) ORGANIZATION.

Communication in the Large Programming Project - The 'Babel syndrome' exists today in that groups change their assumptions without informing others. 1) There should be clear definition of intergroup dependencies, encouraging hundreds of calls to commonly interpret the written documents. 2) There should be regular project meetings with one team after another giving technical briefings. 3) A formal project workbook must be started at the beginning.

The Project Workbook - Not a separate document, it's a structure imposed on the documents that the project will be producing anyway. All documents need to be part of this: objectives, external specs, internal specs, administrative memos etc. Technical prose is almost immortal, many sentences proposing the product or the 1st design is very valuable to the technical writer. Early design of the project workbook ensures the documentation structure is crafted, not haphazard, it molds later writing into the structure. The workbook helps ensure that relevant information gets to the people that need it. All memoranda should be numbered to get ordered lists. It can be tree-structured allows maintenance by subtree. This workbook should be on-line, with dates on revisions, change summaries available etc.

Organization in the Large Programming Project - With n workers on a project, there are (n^2-n)/2 interfaces which may require communication, and potentially almost 2^n teams to coordinate. The purpose of organization is to reduce the amount of communication and coordination through 'division of labour' and 'specialization of function'. A tree-like organization stems (bad pun?) from the fact that no man can serve two masters. Communication structure is not as restrictive, and gives rise to staff groups, task forces, committees, and even matrix-like organizations found in many engineering labs. Any subtree needs: 1) a mission 2) a producer 3) a technical director or architect 4) a schedule 5) a division of labour 6) interface definitions among the parts. All the above is obvious except the distinction between producer and technical director. The producer assembles the team, divides the work and establishes the schedule. He acquires and keeps on acquiring necessary resources. He communicates outside the team, upwards and sideways. He establishes the pattern of communication within the team. The technical director conceives the design, identifies subparts, specifies how it will look from outside, and sketches the internals. He provides unity and conceptual integrity to the whole design and limits system complexity. The talents for the two roles are quite different and the following can work: 1) Producer and technical director are the same man. This is workable for 3 to 6 man teams, and only if such rare people are found. 2) Producer may be boss, and director his right-hand man. For this to work, the producer must proclaim the director's authority to make technical decisions. They must see alike on fundamental technical philosophy. 3) The director many be boss, the producer his right-hand man. This is a more suitable arrangement for larger subtrees of a really big project.

My Conclusion:


Chapter 8: Calling the Shot

Practice is the best of all instructors. (Publilius)

Experience is a dear teacher, but fools will learn at no other.
(Poor Richard's Almanac)

How does one estimate how long a system programming job will take? Even though coding is probably the easiest to estimate, one does not estimate the coding part and apply whatever ratios one feels the coding portion is to get the estimate for the whole project. Building an isolated small program is very different from building a programming systems product. Linear extrapolation of a runner's 100m sprint timing would suggest running a mile in under 3 minutes. Effort goes as a power of size even when communication is reduced to one man and his memories. A study done by Nanus and Farr at SDC show:

/---------------------------------------------------------------\
| effort = (constant) X (number of instructions)^1.5 |
\---------------------------------------------------------------/

Portman's Data - Charles Portman of ICL found his teams missing schedules by 1/2. The estimates were done carefully using PERT charts and the errors were mostly accounted for by the teams only realizing 50% of the work week. Machine downtime, higher-priority shorter jobs, meetings, paperwork, sickness, personal time, leave etc. accounted for the rest.

Aron's Data - Joel Aron of IBM studied 9 large systems and found developer productivity (without system test) to be:

/---------------------------------------------------------------\
| Very few interactions 10,000 instructions/man-year |
| Some interactions 5,000 |
| Many interactions 1,500 |
\---------------------------------------------------------------/

Harr's Data - John Harr of Bell Labs, working on ESS found this:

------------------------------------------------------------------
Job type Prog. Staff Years man- Program Words/
units years words man-year
------------------------------------------------------------------
Operational 50 83 4 101 52,000 515
Maintenance 36 60 4 81 51,000 630
Compiler 13 9 2.5 17 38,000 2230
Translator 15 13 2.5 11 25,000 2270
------------------------------------------------------------------
Productivity is stated in words/man-year. The 1st two control programs were more complex than the translators, but did they take more time due to the complexity or due to the division into more modules and more people? One can only guess.

OS/360 Data - IBM OS/360 experience confirm the above conclusions that the striking differences in productivity is related to the complexity and difficulty of the task itself. Brook's data shows compilers to be 3 times as bad as batch application programs, and operation systems to be 3 times as bad as compilers.

Corbato's Data - Using PL/I on Multics showed: 1) Productivity seems constant in terms of elementary statements. 2) Productivity may be increased as much as 5 times when a suitable high-level language is used.

My Conclusion:


Chapter 9: Ten Pounds in a Five Pound Sack

The author should gaze at Noah, and ... learn, as they did in the Ark, to crowd a great deal of matter into a very small compass.
(Sydney Smith, Edinburgh Review)

Program Space as Cost - The space occupied by a program is a principal cost (see conclusion). But even when programs take a lot of space, one must ask, "What does it do?" What does one get in ease-of-use and performance? Could the $ be more fruitfully used for hardware, software, personnel? Since size is a large part of the user cost, the builder must set limits and devise reduction methods. Like any cost, size itself is not bad, unnecessary size is.

Size Control - This task is partly technical and partly managerial. When size targets (or any or constraint) is given. The builders have to be educated about system-wide considerations. Size-speed tradeoffs come in huge quantum leaps. Controlling the size of a module unnecessarily, may cost another module which depends on it to slow the entire system down by orders of magnitude. Two lessons to learn are: 1) Set total budgets, so as not to throw the rubbish into your neighbors fence. 2) Define exactly what a module must do when you specify any constraints.

Space Techniques - Making a program small requires invention and craftsman -ship. More function implies more space, speed being constant. How many options should a user get, keeping in mind economies of scale? For a given function, the more space the faster, is true for an amazingly large range, making it feasible to set space budgets. To make good space-time tradeoffs, teams should be trained for use of new machines and languages, and components should be available in quick and squeezed versions to the perform lower level functions.

Representation Is the Essence of Programming - Very often, strategic breakthroughs will come from redoing the representation of data or tables. The programmer at wit's end for lack of space can often do best by disentangling from code, rearing back, and contemplating the data.

My Conclusion:


Chapter 10: The Documentary Hypothesis

The hypothesis: Amid a wash of paper, a small number of documents become the critical pivots around which every project's management revolves. These are the manager's chief personal tools.

Certain documents a project must prepare seem like a white tide threatening to engulf the team, but a certain small set embodies and expresses much of the management of the project. The preparation of each one serves to focus thought and crystallize decisions, and their maintenance becomes a surveillance and warning mechanism. The documents serve as a check list, status control and data base for reporting progress.

Documents for a Computer Product :-

Documents for a University Department :-

Documents for a Software Project :-

Why have Formal Documents :-

My Conclusion: Documents should be regarded as tools, the right ones used for a project are invaluable in communicating a plan and achieving it.


Chapter 11: Plan to Throw One Away

There is nothing in this world constant but inconstancy. (Swift)

It is common sense to take a method and try it.
If it fails, admit it frankly and try another.
But above all, try something.
(Franklin D. Roosevelt)

Pilot Plants and Scaling Up - Chemical engineers use pilot plants to scale quantities up. Programming system builders have also been exposed to this lesson, but is seems to have not yet been learned. In most projects, the 1st system is barely usable, too slow, too big, awkward to use, or all 3. The discard and redesign may be done piece-by-piece or in 1 shot. The management question is not whether to do a pilot and throw it away (you WILL do that), it is whether to PLAN for it.

The Only Constancy Is Change Itself - A user's need and a user's perception of the need will change as programs are built, tested and used. The very existence of a tangible object serves to contain and quantize user demand for changes. Both the tractability and invisibility of software exposes its builders to perpetual changes in requirements. Not only are changes in objective inevitable, changes in development strategy and technique are also inevitable. The throw-one-away concept is acceptance of reality.

Plan the System for Change - Careful modularization, precise and complete definition of interfaces, extensive subroutining, and complete documentation of these, use of high-level language, self-documenting techniques so as to reduce errors, and using compile-time operations to incorporate standard declarations helps powerfully in making changes. Quantization is essential, each product should have numbered versions, each version must have its own schedule and freeze date, after which changes go into the next version.

Plan the Organization for Change - Cosgrove advocates treating all plans, milestones, and schedules as tentative to facilitate change. But the common failing of programming groups today is too little management, not too much. If the organizational structure is threatening in any way, designs will not be documented until they are defensible. Each person should be assigned jobs that broaden them to keep the force technically flexible. 2 or 3 top programmers should be reserved as a technical calvary to rescue the thickest battles. The barriers to success are sociological, Bell Labs abolishes titles, all are Members of Technical Staff, other companies like IBM have dual career paths. Reassignments from managerial to technical positions should be accompanied with promotions and raises but not vice versa, overcompensating for the cultural forces is necessary. Senior people should be sent for training and organized in surgical teams as described in Chapter 3. It is relatively easy to reassign a whole surgical team to a different programming task when organizational changes are necessary. It is a long-run answer to the problem of flexible organization.

Two Steps Forward and One Step Back - Changes after delivery are called program maintenance but is fundamentally different from hardware maintenance. Most hardware maintenance involves replacing, cleaning and ECOs. Most ECOs fix defects in implementation rather than architecture and so are invisible to the user. Most program maintenance consist of changes that repair design defects, much more often than hardware, these include added functions and are visible to the user. Total cost of maintaining a widely used program is typically 40% or more of the development cost. More users usually find more bugs. The problem with program maintenance is that fixing a defect has a substantial (20-50%) chance of introducing another. So 2 steps forward and 1 back. Even subtle defects show themselves as some kind of local failure, often with system wide ramifications, usually non-obvious. Any attempt to fix it with minimum effort will repair the local and obvious, but unless the structure is pure or the documentation very fine, the far-reaching effects of the repair will be overlooked. The repairer is usually not the person who wrote it but someone junior. Program maintenance requires far more system testing per statement written than regular development. Regression testing must approximate exercising the entire system and is very costly. Methods of designing programs so as to eliminate or at least illuminate side effects can have immense payoff. Designs with fewer people and fewer interfaces also have fewer bugs.

One Step Forward and One Step Back - Lehman and Belady studied releases of a large OS: total number of modules increases linearly with release number, but affected modules increases exponentially. All repairs tend to destroy structure, increase entropy and disorder. Less and less effort is spent fixing original design flaws, more and more time fixing flaws introduced by earlier fixes. Sooner or later, fixing ceases to gain any ground. Also usable, the system has worn out as a base for progress. Machine, configuration and requirement changes eventually dictate ground-up redesign. Systems program building is an entropy-decreasing process, hence inherently metastable. Program maintenance is an entropy-increasing process, even its most skillful execution only delays the subsidence of the system into unfixable obsolescence.

My Conclusion:


Chapter 12: Sharp Tools

A good workman is known by his tools. (Proverb)

Even at this late date (1975!!!), many programming projects are still operated like machine shops so far as tools are concerned. It is obviously much more efficient to have common development and maintenance of general-purpose programming tools. One toolmaker per team should master all the common tools, instruct the rest of the team, and build specialized tools as needed. There is an insidious temptation to gather all tool builders to augment the common tool team for greater efficiency, but there is always a need for specialized tools.

Target Machines - Machine support is usefully divided up into target and vehicle machines. Separate target and debugging machines with sufficient system resources should be available, if the machine/OS is new, a specialized team to schedule and run all tests on the limited resource may be necessary.

Vehicle Machines and Data Services - Logical simulators provide a debugging vehicle long before the real target exists. Equally important, it gives a dependable vehicle even after the target is available, the target vehicle is more accurate, but susceptible to frequent change. Preproduction hardware does not work as defined and does not work reliably, this shifting base is bad enough, but hardware failures are worse. For the same reason, one wants compilers and debuggers that run on dependable vehicles, even if testing is to be done on target systems. Program libraries, carefully integrated and released, serve as formal separation and progression mechanisms to isolate faults and localize bugs. Documentation tools and performance simulators are also very useful development tools.

High-Level Language and Interactive Programming - High-level languages provide productivity and debugging speed. Interactive programming (as opposed to batch) is the only way to go since debugging is the hard and slow part of system programming, and slow turnaround is the bane of debugging.

My Conclusion: This chapter contained many outdated suggestions which were omitted, but the point remains that effective tools make an immense difference.


Chapter 13: The Whole and the Parts

I can call spirits from the vasty deep. Why so can I, or so can any man; but will they come when you do call for them? (Shakespeare, King Henry IV, Part I)

Designing the Bugs Out - The most pernicious and subtle bugs are from mismatched assumptions made by authors of various components. Conceptual integrity addresses these problems directory, makes it easier to use, to build, and less subject to bugs. Many failures concern aspects that were never quite specified. Long before code exists, the specification must be handed to an outside testing group to be scrutinized for completeness and clarity. Developers can't do this, they will happily invent their way through gap and obscurities. Niklaus Wirth formalized top-down design in a 1971 paper. A good top-down design avoids bugs with the clarity of structure and representation making the precise statement of requirements and functions of the modules easier, the partioning and independence of modules avoids system bugs, the suppression of detail makes flaws in the structure more apparent, and the design can be tested at each of its refinement steps. Top-down design reduces the temptation to salvage a bad basic design and patch it with all kinds of cosmetic relief. Dijkstra formalized structured programming built on theoretical structure by Bohm and Jacopini.

Component Debugging - Debugging went through a great cycle from 1955 to 1975. On-machine debugging was necessary, testing sections with planned stops because of long I/O delays. Due to scarce computer resources, memory dumps were used but involved laborious desk work. As memory sizes increased, snapshots were taken of the relevant segments. Finally in 1959, Codd and his coworkers and Strachey each reported work aimed at time-shared debugging, Corbato and his MIT colleagues implemented an experimental system in 1963, which led to MULTICS, TSS and other time-sharing systems of today.

System Debugging - The unexpectedly hard part of building a programming system is system tes. It will take longer than expected and justifies a thoroughly systematic and planned approach. Rules of system debugging are:

My Conclusion: Integration of system components is a much more difficult task than apparent. But using appropriate methodologies as described above can make it go smoother and avert disasters.


Chapter 14: Hatching a Catastrophe

None love the bearer of bad news. (Sophocles)
How does a project get to be a year late?
... One day at a time.

Software disasters are due to termites, not tornadoes; schedules slip imperceptibly, but inexorably. Major calamities are easier to handle, by responding with major force, but day-by-day slippage is harder to recognize, prevent and make up. Down-time, illness, family problems etc. all add up.

Milestones or Millstones? - Have a schedule! Milestones must be concrete, specific, measurable events, defined with knife-edge sharpness. Ridiculous observations from projects is that coding is 90% finished 50% of the total coding time, debugging is 99% complete, 90% of the time and planning is 100% complete 99% of the time. Studies of large-scale development show that:

Fuzzy milestones grind down morale, deceives the team about lost time until it is irremediable, chronic slippage is a morale-killer.

The Other Piece Is Late, Anyway - Baseball managers recognize "hustling" (running faster, moving sooner and trying harder than necessary) as an essential gift of great players and great teams. Hustle provides the cushion, the reserve capacity, that enables programming teams to cope with routine mishaps, to anticipate and forfend minor calamities. We must get excited about one-day slips because they eventually cause catastrophes. PERT charts/critical path schedules highlight which slips matter, and which will soon matter.

Under the Rug - 1st-line managers tend to hide slippages from their bosses in the hope of making up somehow. There is a conflict of interest where the 1st-line manager wants to retain authority and the boss wants to get an accurate status. The boss must reduce the role conflict, inspire sharing of status, and yank the rug back (review parts of the PERT charts regularly). Bosses must resist the temptation to give orders while getting status, otherwise the employee/1st-line manager will be less likely to give accurate status in the future. Large projects may benefit from investing in a plans & controls task force. Early warning systems identify critical delays which can still be fixed.

My Conclusion:


Chapter 15: The Other Face

What we do not understand we do not possess. (Goethe)

O give me commentators plain,
Who with no deep researches vex the brain. (Crabbe)

What Documentation Is Required? - Different levels are required for the casual user, one who depends upon it, and a programmer who must enhance it.

The Flow-Chart Curse - Flow charts are most thoroughly oversold, most programs don't need them, and few need more than one page of them. Goldstine and von Neumann introduced them to group inscrutable machine-language statements into clusters of significance, but as Iverson early recognized, systematic high-level languages already cluster related statements. It is obsolete!

Self-Documenting Programs - Basic data processing teaches against trying to maintain independent files in synchronism, yet we attempt to maintain independent sets of machine-readable programs and human-readable documentation. A 1st notion is to use labels, declarations, and symbolic names to convey as much meaning as possible. A 2nd notion is to use space and format as much as possible. The 3rd is to insert paragraphs of prose as comments. Line-by-line comments are often overused and of little help. All this must be done when writing the programs to really minimize the total amount of work required.

My Conclusion:


Unabridged Epilogue

The tar pit of software engineering will continue to be sticky for a long time to come. One can expect the human race to continue attempting systems just within or just beyond our reach; and software systems are perhaps the most intricate and complex of man's handiworks. The management of this complex craft will demand our best use of new languages and systems, our best adaptation of proven engineering management methods, liberal doses of common sense, and a God-given humility to recognize our fallibility and limitations.

My Conclusion: God help us!