Software Comprehension
Table of Contents
Abstract: What does it take to understand the source code of [large] programs that you did not participate in? Do you have the src code? Do you have its RSDI-docs?
1 Collected Quotes
- "Here is a not uncommon scenario in many workplaces. A neophyte […] programmer is assigned to maintain, debug, or enhance an application. The atmosphere is sink or swim, the system is complex, the code is sophisticated, the documentation is scant, and the programmer is bewildered. Questions slowly take shape. What, exactly, am I supposed to do? What part(s) of the application need my attention? Will a change to program X affect program Y? And, most critically, where do I start? What the poor programmer needs is a strategy for comprehending the program, then finding the sweet spots in the code as efficiently as possible." [from http://www2.sas.com/proceedings/sugi27/p068-27.pdf ]
- "Legacy systems have one or several of the following attributes: they were implemented many years ago, their technology became obsolete, their structure deteriorated, they represent a large investment, they contain business rules not recorded elsewhere, they cannot be easily replaced, or the original authors are not available. Software comprehension typically consumes more than a half of the difficult effort of maintaining legacy systems." [from Rajlich, ICSE, 1997]
- "I have never inherited a codebase I liked." Anonymous Developer.
2 Comprehension How To
- What are the (expected) givens?
- Source code files
- RSDI-docs
- Requirements Document
- Specs Document
- Design Document
- Implementation Document
- Testing Document
- Even in the best-run projects, any thing more than source code files, is a bonanza. Contribute such examples!
- Learn enough about the domain of the software:
- E.g., Compilers. You cannot expect to understand a compiler project without understanding parsing, code generation, etc. literature.
- This is true even for simple things, e.g., Tic-Tac-Toe.
- Browse the Empirical Software Engineering journal, special issue on Program Comprehension, Volume 18, Issue 2, April 2013 http://link.springer.com/journal/10664/18/2/page/1 Read the Preface.
- Generate "Design Documents"
- using whatever tools, but must be "dependable" (produce usable info)
- inserting prose annotations
- progressively improving prose annotations into verifiable/executable assertions
- Recall that assertions can be written as executable boolean methods in the implementation language without side-effects.
- Code refactoring
- Often done to "improve"
- Here do it to "understand", by breaking into pieces
3 Browsing Tools
- Browsing == Code Inspection
- Stand alone (tiny) tools: {ctags, etags, javadoc, …}
- http://doxygen.org Open source documentation system for software written in C++, C#, Java, Python, IDL, C and more. Can generate Class relationship diagrams and file relationships.
- https://www.google.com/search?q=static-analysis-plugins-for-intellij
- https://www.google.com/search?q=static-analysis-plugins-for-eclipse
- Source Navigator can display relationships between classes, functions, members, and display call trees mapping unknown source code for enhancement or maintenance tasks. For C/C++. 2014. Development stopped? http://sourcenav.sourceforge.net/
- 1 http://lxr.linux.no/ 2 https://elixir.bootlin.com/linux/latest/source 3 https://code.woboq.org/linux/linux/ Linux source code browsers
- https://www.gnu.org/software/global/links.html 'Source code reading' related sites
- Commercial Tools: JArchitect, NDepend, … [Free Trials?][Search for links]
- P. Anderson; M. Zarins, The CodeSurfer Software Understanding Platform Program Comprehension, 2005. IWPC 2005. Proceedings. 13th International Workshop on (January 2005), 2005, pg. 147-148. Reference. http://www.grammatech.com/research/technologies/codesurfer [Commercial; free trial]
4 Reverse Engineering
- Reversing binary files is termed Reverse Code Engineering, or RCE. Often used in malware analysis.
- Obfuscation is used to deter both reverse engineering and re-engineering.
- Canfora, et al., see Refs. Required Reading
- Tool: IDA https://www.hex-rays.com/products/ida/ Runs on Linux, Mac OS X, or Windows. "IDA has become the de-facto standard for the analysis of hostile code, …" [Commercial; free download and trial.] [free educational licenses]
- Tool: Ghidra https://github.com/NationalSecurityAgency/ghidra NSA Ghidra Software Reverse Engineering Framework, 2020 active FOSS. Runs on Linux, Mac OS X, or Windows.
5 Design Extraction
- Extracting design details from src code.
- Source code is reverse-engineered back to … design … specs.
6 Operations on Source Code
- Program Slicing: What could have affected this variable’s value? A program slice with respect to a given variable, v, is a set of variables the values of which can influence that of v. https://en.wikipedia.org/wiki/Program_slicing
- Ripple analysis: If a given statement is modified, where does it affect? And, how?
- Mutation. http://sites.utexas.edu/august/files/2020/08/ASEDEMO2018.pdf SRCIROR: A Toolset for Mutation Testing of C Source Code and LLVM Intermediate Representation
7 Obfuscation
- [dictionary def] Obfuscate: tr.v. -cated, -cating, -cates.
- To render obscure.
- To darken.
- To confuse: his emotions obfuscated his judgment. [LLat. obfuscare, to darken : ob(intensive) + Lat. fuscare, to darken < fuscus, dark.] -obfuscation n. obfuscatory adj
- There are companies that practice obfuscation to thwart reverse engineering.
- http://www.ioccc.org/ The International Obfuscated C Code Contest
- To write the most Obscure/Obfuscated C program within the rules.
- To show the importance of programming style, in an ironic way.
- To stress C compilers with unusual code.
- To illustrate some of the subtleties of the C language.
- To provide a safe forum for poor C code. :-)
- Source code of winning programs is included.
- 27th Contest happened in 2020 http://www.ioccc.org/2020/whowon.html
- Google search for java bytecode obfuscator
8 Good/ Responsible Software Development
- To help future readers of your software.
- Design-by-Contract
- Standard Literate Programming tools on Linux: cweb, cweave, ctangle; noweb; …
- http://www.ssw.uni-linz.ac.at/Research/Projects/RevLitProg/ " a system which allows … selective browsing. Zoom in at interesting points or jump to other locations according to control flow or other semantic relationships. This is the approach of hypertext. … Reverse Literate Programming …"
- https://dzone.com/articles/literate-programming-life Literate Programming Life Cycle 2010
9 Books on (Well-Documented) Specific Programs
- Knuth's Computers & Typesetting, https://www-cs-faculty.stanford.edu/~knuth/abcde.html TeX: The Program, and Metafont. The .web files of these can be legitimately downloaded. [PDF http://visualmatheditor.equatheque.net/doc/texbook.pdf legit?]
- Tanenbaum's example OS Design: Minix; Operating Systems Design and Implementation (3rd Edition), ISBN-13: 978-0131429383
- Operating Systems: Principles and Practice, Second Edition, by Thomas Anderson and Michael Dahlin. https://xinu.cs.purdue.edu/
- Jim Welsh and Atholl Hay: A Model Implementation of Standard Pascal, Prentice Hall, 1986
- Niklaus Wirth, et al. http://www.projectoberon.com/ "Project Oberon is a design for a complete desktop computer system from scratch. … Project Oberon: The Design of an Operating System, a Compiler, and a Computer – written by the designers, Niklaus Wirth and Jürg Gutknecht." https://people.inf.ethz.ch/wirth/
10 Reading List
- Many refs are embedded in the above.
- http://www.program-comprehension.org/ Recommended Visit. For awareness.
- Marouane Kessentini, W. Kessentini, H. Sahraoui, M. Boukadoum and A. Ouni, "Design Defects Detection and Correction by Example," 2011 IEEE 19th International Conference on Program Comprehension, Kingston, ON, 2011, pp. 81-90, doi: 10.1109/ICPC.2011.22. Required Reading.
- Russell Wood, "Assisted Software Comprehension", A Project Report, June 2003; {Is this a BS/ MS thesis from Imperial College London?} Reference.
- Gerardo Canfora, Massimiliano Di Penta, Luigi Cerulo, "Software Reverse Engineering: Achievements and Challenges", Communications of the ACM, Volume 54 Issue 4, Pages 142-151, 2011. Required Reading.