Parnas (1972): On the Criteria for Decomposing Systems into Modules
This six-page paper changed how software engineers think about modules. Before Parnas, the accepted practice was to decompose a system by drawing a flowchart and making each major processing step a module. Parnas demonstrated, with a single concrete example, that this produces systems where every module depends on every other module’s internal data representation — making the system difficult to change, difficult to develop in parallel, and difficult to understand.
The alternative he proposed was deceptively simple: each module should hide a design decision. Not a processing step, not a subroutine — a decision about data representation or algorithm that is likely to change. The module’s interface exposes operations, not structure. If the hidden decision changes, only that module changes.
Four years later, Margaret Hamilton would formalize this insight into mathematics. Her Access axiom — a module can only read or write data that its position in the hierarchy grants it access to — is Parnas’s “secret” restated as a provable structural constraint. Parnas argued by example that information hiding works. Hamilton proved by axiom that it must.
The KWIC Index Example
Section titled “The KWIC Index Example”Parnas chose a problem small enough to fit in a few paragraphs yet large enough to admit genuinely different decompositions: a KWIC (Key Word In Context) index production system.
Specification:
- Input is a set of lines; each line is an ordered set of words; each word is an ordered set of characters
- Each line can be “circularly shifted” — remove the first word and append it at the end
- Output is all circular shifts of all lines, sorted alphabetically
For the input line “abcd efgh ijkl mnop”, the circular shifts are:
| Shift |
|---|
| abcd efgh ijkl mnop |
| efgh ijkl mnop abcd |
| ijkl mnop abcd efgh |
| mnop abcd efgh ijkl |
The problem is trivial to implement. The question is how to divide the implementation into modules.
Decomposition 1: Processing Steps
Section titled “Decomposition 1: Processing Steps”The conventional approach decomposes by flowchart — each module corresponds to a step in the processing pipeline:
| Module | Responsibility |
|---|---|
| Input | Read data lines, store them in memory |
| Circular Shift | Produce an index array listing the first character position and original line for each circular shift |
| Alphabetizing | Sort the index array alphabetically |
| Output | Print the sorted circular shifts using the index array and stored lines |
| Master Control | Sequence the above |
Every module knows how lines are stored in memory. Every module knows the format of the index array. The Input module decides the storage representation, and every subsequent module depends on that decision.
Decomposition 2: Design Decisions
Section titled “Decomposition 2: Design Decisions”Parnas’s alternative decomposes by information hiding — each module hides a design decision:
| Module | Interface | Hidden Decision |
|---|---|---|
| Line Storage | CHAR(l,w,c) — get the th character of the th word of line ; SETCHAR — store a character; WORDS(l) — count words in line | How lines are stored (packed array, linked list, disk-backed, etc.) |
| Input | Reads input, calls Line Storage | Input format |
| Circular Shifter | CSCHAR(l,w,c) — get the th character of the th word of the th circular shift; CSSetup — initialize | How circular shifts are represented (virtual index computation vs. physical copies) |
| Alphabetizer | ITH(i) — return the index of the th shift in alphabetical order; ALSetup — sort | Sorting algorithm |
| Output | Print the sorted shifts | Output format |
| Master Control | Sequence modules | Control flow |
The critical difference: no module other than Line Storage knows how lines are represented in memory. Circular Shifter does not copy shifted lines — it computes shift indices on the fly through its CSCHAR function. Alphabetizer does not rearrange data — it produces an ordering. Each module’s interface is a set of operations, not a data structure.
Why Decomposition 2 Wins
Section titled “Why Decomposition 2 Wins”Parnas evaluates on three dimensions:
Changeability
Section titled “Changeability”Consider changing how lines are stored in memory — say, from storing all lines in core to packing four characters per word, or reading from disk on demand.
- Decomposition 1: Every module must change. Input stores differently, Circular Shift reads differently, Alphabetizing compares differently, Output formats differently.
- Decomposition 2: Only Line Storage changes. Every other module calls
CHAR(l,w,c)and gets the same character regardless of the underlying representation.
Consider changing the input format:
- Both decompositions: only the Input module changes. This is a wash — both decompositions hide this decision in the same place.
The asymmetry is the point. Decomposition 1 localizes some changes but not others. Decomposition 2 localizes every change to the module that owns the corresponding decision.
Independent Development
Section titled “Independent Development”In Decomposition 1, the interface between modules is the shared data representation itself — array layouts, index formats, character packing conventions. Every team must agree on these representations before any module can be implemented.
In Decomposition 2, the interface is a set of function signatures. Teams can implement modules in parallel, substituting stubs for modules that are not yet complete. The Line Storage module could be rewritten from a linked list to a B-tree without any other module knowing.
Comprehensibility
Section titled “Comprehensibility”In Decomposition 1, understanding any module requires understanding the data representation shared across all modules. In Decomposition 2, understanding a module requires understanding only its interface and the interfaces of the modules it calls. The internal representation is invisible.
The Criterion
Section titled “The Criterion”Parnas’s conclusion is a single sentence that became one of the most quoted in software engineering:
“We propose instead that one begins with a list of difficult design decisions or design decisions which are likely to change. Each module is then designed to hide such a decision from the others.”
This is the secret principle: every module has a secret — a design decision that only that module knows. The module’s interface is chosen so that the secret can change without affecting any other module. The designer’s job is to identify which decisions are likely to change and to draw module boundaries around them.
Hierarchical Structure
Section titled “Hierarchical Structure”Parnas observes that Decomposition 2 naturally forms a hierarchy:
- Level 2: Line Storage (lowest, used by everyone)
- Level 3: Input and Circular Shifter (use Line Storage)
- Level 4: Output and Alphabetizer (use Circular Shifter)
- Master Control at the top
Higher levels use services of lower levels. Lower levels can be extracted and reused independently — levels 1 and 2 alone could support a different application entirely.
But Parnas is careful to separate two ideas. Hierarchical structure and information-hiding decomposition are independent properties. A conventional flowchart decomposition could also produce a hierarchy. The information-hiding criterion says what each module should contain (a design decision); the hierarchical structure says how modules relate (via levels of abstraction). The two often coincide, but they are not the same thing.
The Five General Criteria
Section titled “The Five General Criteria”The paper closes with five specific rules for decomposition, each derived from the KWIC analysis:
-
Data structures with their access and modification procedures belong in a single module. Do not let multiple modules manipulate the same data structure directly.
-
The calling sequence for a routine and the routine itself belong in the same module. If calling a function requires format conversion (e.g., integer to string), that conversion code belongs with the function, not with the caller.
-
Control block formats should be hidden within a control block module. Queue headers, buffer descriptors, and similar bookkeeping structures change frequently and should not be part of inter-module interfaces.
-
Character codes, alphabetic orderings, and similar encoding decisions should be isolated in their own module. These are among the most variable design decisions in a system.
-
Processing sequence should be isolated in a single module. The order in which items are processed is highly variable and should be changeable without affecting the modules that process individual items.
These five criteria are specific instances of the general principle: hide what changes behind an interface that does not.
What This Paper Teaches
Section titled “What This Paper Teaches”The paper’s argument is carried entirely by one example and three evaluation dimensions. There are no theorems, no proofs, no formal notation. The KWIC problem is small enough that a single programmer could implement either decomposition in an afternoon. And yet the paper fundamentally altered how software systems are designed, because the argument generalizes: every system of any size faces the same choice between decomposing by processing steps and decomposing by design decisions, and the second choice produces better systems on exactly the dimensions that matter for long-lived, team-developed software.
The paper also demonstrates something about the relationship between empirical argument and formal proof. Parnas convinced the field by showing two implementations of the same problem and comparing them concretely. Hamilton convinced the formal methods community by proving the same insight as theorems. Neither approach supersedes the other. Parnas’s example is accessible to any programmer. Hamilton’s axioms are necessary for automated verification. The insight — that module boundaries should follow information boundaries, not processing boundaries — belongs to both.