Skip to content

Parnas (1972): On the Criteria for Decomposing Systems into Modules

This six-page paper changed how software engineers think about modules. Before Parnas, the accepted practice was to decompose a system by drawing a flowchart and making each major processing step a module. Parnas demonstrated, with a single concrete example, that this produces systems where every module depends on every other module’s internal data representation — making the system difficult to change, difficult to develop in parallel, and difficult to understand.

The alternative he proposed was deceptively simple: each module should hide a design decision. Not a processing step, not a subroutine — a decision about data representation or algorithm that is likely to change. The module’s interface exposes operations, not structure. If the hidden decision changes, only that module changes.

Four years later, Margaret Hamilton would formalize this insight into mathematics. Her Access axiom — a module can only read or write data that its position in the hierarchy grants it access to — is Parnas’s “secret” restated as a provable structural constraint. Parnas argued by example that information hiding works. Hamilton proved by axiom that it must.

Parnas chose a problem small enough to fit in a few paragraphs yet large enough to admit genuinely different decompositions: a KWIC (Key Word In Context) index production system.

Specification:

  • Input is a set of lines; each line is an ordered set of words; each word is an ordered set of characters
  • Each line can be “circularly shifted” — remove the first word and append it at the end
  • Output is all circular shifts of all lines, sorted alphabetically

For the input line “abcd efgh ijkl mnop”, the circular shifts are:

Shift
abcd efgh ijkl mnop
efgh ijkl mnop abcd
ijkl mnop abcd efgh
mnop abcd efgh ijkl

The problem is trivial to implement. The question is how to divide the implementation into modules.

The conventional approach decomposes by flowchart — each module corresponds to a step in the processing pipeline:

ModuleResponsibility
InputRead data lines, store them in memory
Circular ShiftProduce an index array listing the first character position and original line for each circular shift
AlphabetizingSort the index array alphabetically
OutputPrint the sorted circular shifts using the index array and stored lines
Master ControlSequence the above

Every module knows how lines are stored in memory. Every module knows the format of the index array. The Input module decides the storage representation, and every subsequent module depends on that decision.

Parnas’s alternative decomposes by information hiding — each module hides a design decision:

ModuleInterfaceHidden Decision
Line StorageCHAR(l,w,c) — get the ccth character of the wwth word of line ll; SETCHAR — store a character; WORDS(l) — count words in line llHow lines are stored (packed array, linked list, disk-backed, etc.)
InputReads input, calls Line StorageInput format
Circular ShifterCSCHAR(l,w,c) — get the ccth character of the wwth word of the llth circular shift; CSSetup — initializeHow circular shifts are represented (virtual index computation vs. physical copies)
AlphabetizerITH(i) — return the index of the iith shift in alphabetical order; ALSetup — sortSorting algorithm
OutputPrint the sorted shiftsOutput format
Master ControlSequence modulesControl flow

The critical difference: no module other than Line Storage knows how lines are represented in memory. Circular Shifter does not copy shifted lines — it computes shift indices on the fly through its CSCHAR function. Alphabetizer does not rearrange data — it produces an ordering. Each module’s interface is a set of operations, not a data structure.

Parnas evaluates on three dimensions:

Consider changing how lines are stored in memory — say, from storing all lines in core to packing four characters per word, or reading from disk on demand.

  • Decomposition 1: Every module must change. Input stores differently, Circular Shift reads differently, Alphabetizing compares differently, Output formats differently.
  • Decomposition 2: Only Line Storage changes. Every other module calls CHAR(l,w,c) and gets the same character regardless of the underlying representation.

Consider changing the input format:

  • Both decompositions: only the Input module changes. This is a wash — both decompositions hide this decision in the same place.

The asymmetry is the point. Decomposition 1 localizes some changes but not others. Decomposition 2 localizes every change to the module that owns the corresponding decision.

In Decomposition 1, the interface between modules is the shared data representation itself — array layouts, index formats, character packing conventions. Every team must agree on these representations before any module can be implemented.

In Decomposition 2, the interface is a set of function signatures. Teams can implement modules in parallel, substituting stubs for modules that are not yet complete. The Line Storage module could be rewritten from a linked list to a B-tree without any other module knowing.

In Decomposition 1, understanding any module requires understanding the data representation shared across all modules. In Decomposition 2, understanding a module requires understanding only its interface and the interfaces of the modules it calls. The internal representation is invisible.

Parnas’s conclusion is a single sentence that became one of the most quoted in software engineering:

“We propose instead that one begins with a list of difficult design decisions or design decisions which are likely to change. Each module is then designed to hide such a decision from the others.”

This is the secret principle: every module has a secret — a design decision that only that module knows. The module’s interface is chosen so that the secret can change without affecting any other module. The designer’s job is to identify which decisions are likely to change and to draw module boundaries around them.

Parnas observes that Decomposition 2 naturally forms a hierarchy:

  • Level 2: Line Storage (lowest, used by everyone)
  • Level 3: Input and Circular Shifter (use Line Storage)
  • Level 4: Output and Alphabetizer (use Circular Shifter)
  • Master Control at the top

Higher levels use services of lower levels. Lower levels can be extracted and reused independently — levels 1 and 2 alone could support a different application entirely.

But Parnas is careful to separate two ideas. Hierarchical structure and information-hiding decomposition are independent properties. A conventional flowchart decomposition could also produce a hierarchy. The information-hiding criterion says what each module should contain (a design decision); the hierarchical structure says how modules relate (via levels of abstraction). The two often coincide, but they are not the same thing.

The paper closes with five specific rules for decomposition, each derived from the KWIC analysis:

  1. Data structures with their access and modification procedures belong in a single module. Do not let multiple modules manipulate the same data structure directly.

  2. The calling sequence for a routine and the routine itself belong in the same module. If calling a function requires format conversion (e.g., integer to string), that conversion code belongs with the function, not with the caller.

  3. Control block formats should be hidden within a control block module. Queue headers, buffer descriptors, and similar bookkeeping structures change frequently and should not be part of inter-module interfaces.

  4. Character codes, alphabetic orderings, and similar encoding decisions should be isolated in their own module. These are among the most variable design decisions in a system.

  5. Processing sequence should be isolated in a single module. The order in which items are processed is highly variable and should be changeable without affecting the modules that process individual items.

These five criteria are specific instances of the general principle: hide what changes behind an interface that does not.

The paper’s argument is carried entirely by one example and three evaluation dimensions. There are no theorems, no proofs, no formal notation. The KWIC problem is small enough that a single programmer could implement either decomposition in an afternoon. And yet the paper fundamentally altered how software systems are designed, because the argument generalizes: every system of any size faces the same choice between decomposing by processing steps and decomposing by design decisions, and the second choice produces better systems on exactly the dimensions that matter for long-lived, team-developed software.

The paper also demonstrates something about the relationship between empirical argument and formal proof. Parnas convinced the field by showing two implementations of the same problem and comparing them concretely. Hamilton convinced the formal methods community by proving the same insight as theorems. Neither approach supersedes the other. Parnas’s example is accessible to any programmer. Hamilton’s axioms are necessary for automated verification. The insight — that module boundaries should follow information boundaries, not processing boundaries — belongs to both.