Wirth's Mark Errors
1 Mark Errors
In processing an input line of text, a program we are working on discovered errors. Each error is recorded as a pair (p, e) of numbers. We wish to point to the p-th character that caused the error, and print the error number e next to it. Some times a single position in the input line triggers multiple errors. You are given a list of these pairs, and the input line. We need a procedure that prints this line, and the error numbers below it, properly located. Use no more lines than necessary. [This is an Exercise from Niklaus Wirth, Systematic Programming book, 197x]
Spec by Example 1
error list :(4,1), (7,2), (10, 2), (20, 2), (23, 2). column numbers :1234567890123456789012345 line of text : Thos broplem is peenuds. line of errors : ^1 ^2 ^2 ^2 ^2
Spec by Example 2
error list :(5, 9), (13, 9), (13, 787), (18,94), (18, 126), (25,73), (continued) :(30, 9), (30, 23), (30, 26), (39, 9), (40, 742). column numbers :12345678901234567890123456789012345678901234 line of text : if x > 0 anf y +) then 0 := x else f(x.y) ; line of errors : ^9 ^9 ^94,126^73 ^9,23,26 ^9 line of errors : ^787 ^742
[We can never specify software using examples. But, this is popular because it quickly communicates essential features. Where it typically fails is in the "edge" cases.]
2 "Solutions"
- Several "solutions" are presented. The code given here is expected to solve a simpler version (read the comments) S1 of the above problem. Your task: Figure out which ones, if any, are correct.
- Note that this code uses
cout
to constructout[]
. What is output cannot then be changed later. - The problem can be further simplified if we build
out[]
in memory, and print it all at once.
3 Teaching
- This problem can be used in teaching. For different purposes.
- Develop the requirements analysis, and rigorous specs for the Mark Error problem as described in Section 1.
- Develop pre- post- conditions for each solution, and loop invariants for every loop. Even incorrect programs have all these!
- What was the thinking behind these "solutions"?
- Can you figure out which ones go into an infinite loop without testing it?
- Is this a "debugging" problem?
- Having identified the "correct" solutions, develop it further to solve the specs of Section 1. Mark Errors. Look up: Agile Iterative Development.
- Can this be done in one-pass? Always?
4 Requirements Clarifications
- Q: Is the array pos[] as in "solutions" sorted (ltor)? Are these indices guaranteed to be those of txt? A: Yes, and yes.
- Q: Is txt[] just one line? A: Yes. The txt[] has "ordinary" characters (only); no '\t', '\n', and such.
- Q: How large are the numbers? A: Assume that the numbers are unsigned 16-bit. C++ ushort will do.
- Make (further) reasonable/ sympathetic assumptions. This is not a problem cooked up to raise your awareness of requirements analyses. It is a "develop a correct solution to what is stated" problem.
5 Simpler Version S1 of the Problem
- The following three lines are extracted from the source code given.
int pos[] = { 3, 6, 9, 19, 22 }; char txt[] = "Thes broplem is peenuds."; // example input char out[] = " * * * * *"; // expected output
- S1: We should have out[y] == '*' for every y given in pos[]. All the rest of out[] should be blanks. Not rigorously stated, but pretty good.
- Like we did in the Dutch National Flag problem, use this as the
post condition of the
markErrorPos
method, and use a weakened version of the post as a loop invariant for the code yet to be designed.
6 A Solution for S1 that is/ better-be Correct-by-Design
6.1 Make S1 Rigorous
- Assume that the starting index of arrays is a 1, not 0 (even though this is not Pascal). To keep it less confusing, we are conforming to the code given. But, the indices i, j, k used below are unrelated to those of the code.
- Weakest Pre Condition P
- Def L == (sizeof(txt)/sizeof(char))
- Def n == (sizeof(pos)/sizeof(int))
- sorted(pos), which is-defined-as pos[ 1] < pos [ 2] < … < pos[n].
- 0 < pos[i] < L+1 for all indices i of pos[].
- For ease of use, let us call the conjunction (AND-ing) of all the above as P.
- Post Condition Q
- Think of out[] as modeling the stdout produced by our markErrorPos().
- out[j] is either ' ' (a blank) or '*', for all its indices j.
- out[pos[i]] == '*' for all indices i of pos[].
- For ease of use, let us call the conjunction (AND-ing) of all the above as Q.
- To keep things simple, we will continue to assume that pos[], txt[] and out[] are global arrays. Later, we will model as follows: {out[] := markErrorPos(txt, pos);}
- The spec S1 is {P} markErrorPos() {Q}
6.2 A Loop Invariant from S1
- Based on our "experience", we are expecting to design an iterative algorithm for markErrorPos().
- Take the Q above, and revise it as Q(k), a function of k, where k
is a valid index of pos[]. Item 3 of Q above is weakened:
- out[pos[i]] == '*' for indices i of pos[], 0 < i < k
- Imagine that out[] starts out as an empty string, and grows.
- Only out[1 .. pos[i]] is defined (produced) so far.
- Based on this Q(k), we formulate the following.
6.3 Tentative Algorithm
out := ""; z := length-of(out); n := length-of(pos); for (k := 1; k < n; k++) { assert Q(k) && 0 < k < n && 0 <= z == length-of(out) < pos[k]; while (z < pos[k]) { assert Q(k) && 0 <= k < n && 0 <= z == length-of(out) < pos[k]; out += " "; z ++; } out += "*"; z ++; }
Written in a VHLL of our own making.
7 Full Version Spec S2
This section presents the spec S2 for the full version of the problem as described in Section 1 above.
- Read the symbol =: as "yields".
- Exercise: Define itostr. Ex: itostr(742) =: "742", no leading zeros, only digits 0 to 9, input assumed to be decimal.
- Exercise: Define toString. Ex: toString(40, 742) =: b39 + "^" + itostr(742); b stands for a blank, b39 is a string of 39 spaces, + is overloaded as string catenation.
7.1 Overlays
- We use the idea of overlaid overhead projection transparencies of yesteryear.
- overlay(s, t) =: u, where s, t and u are all strings.
- Intuitively, t is overlaid on s to yield u. (Just for now, assume that t is longer than equal in length). Only where t[i] is a blank will the s[i] show through in u[i].
- length-of(u) is max (length-of(s), length-of(t))
- For the purpose of the defs below:
- s[i] is b if i > length-of(s) ; starting index is 1.
- t[i] is b if i > length-of(t) ; starting index is 1.
- u[i] is s[i] if t[i] == b
- u[i] is t[i] if t[i] != b
7.2 markErrorPos
- markErrorPos(txt, pos) =: a string
- out is overlay(s, t), where s is tostring(pos[ 1].p, and t is markErrorPos(txt, pos[ 2..n])
7.3 Commentary
- This is not declarative, is it?
- This looks like a program written in functional style, no?
- Did we assume that strings end with a NUL character?
- Did we take care of multiple lines of mark errors?
- When we do need multiple lines, how to make sure we use a minimal number?
- If we were to implement this spec as-is, it will be pretty inefficient. This is a good quality of a spec!
8 References
- Niklaus Wirth, Systematic Programming: An Introduction (book), Prentice Hall, 1973. 208 pages. ISBN-10: 0138803692; ISBN-13: 978-0138803698