C Interfaces and Implementations

Preface

Programmers are inundated with information about application programming interfaces, or APIs. Yet, while most programmers use APIs and the libraries that implement them in almost every application they write, relatively few create and disseminate new, widely applicable, APIs. Indeed, programmers seem to prefer to “roll their own” instead of searching for a library that might meet their needs, perhaps because it is easier to write application-specific code than to craft well-designed APIs.

I’m as guilty as the next programmer: lcc, a compiler for ANSI/ISO C written by Chris Fraser and myself, was built from the ground up. (lcc is described in C.W. Fraser and D. R. Hanson, A Retargetable C Compiler: Design and Implementation, Addison-Wesley, 1995.) A compiler exemplifies the kind of application for which it possible to use standard interfaces and to create interfaces that are useful elsewhere. Examples include interfaces for memory management, string and symbol tables, and list manipulation. But lcc uses only a few routines from the standard C library, and almost none of its code can be used directly in other applications.

This book advocates a design methodology based on interfaces and their implementations, and it illustrates this methodology by describing 24 interfaces and their implementations in detail. These interfaces span a large part of the computing spectrum and include data structures, arithmetic, string processing, and concurrent programming. The implementations aren’t toys—they’re designed for use in production code. As described below, the source code is freely available.

There’s little support in the C programming language for the interface-based design methodology. Object-oriented languages, like C++ and Modula-3, have language features that encourage the separation of an interface from its implementation. Interface-based design is independent of any particular language, but it does require more programmer willpower and vigilance in languages like C, because it’s too easy to pollute an interface with implicit knowledge of its implementation and vice versa.

Once mastered, however, interface-based design can speed development time by building upon a foundation of general-purpose interfaces that can serve many applications. The foundation class libraries in some C++ environments are examples of this effect. Increased reuse of existing software—libraries of interface implementations—reduces initial development costs. It also reduces maintenance costs, because more of an application rests on well-tested implementations of general-purpose interfaces.

The 24 interfaces come from several sources, and all have been revised for this book. Some of the interfaces for data structures—abstract data types—originated in lcc code, and in implementations of the Icon programming language done in the late 1970s and early 1980s (see R. E. Griswold and M. T. Griswold, The Icon Programming Language, Prentice Hall, 1990). Others come from the published work of other programmers; the “Further Reading” sections at the end of each chapter give the details.

Some of the interfaces are for data structures, but this is not a data structures book, per se. The emphasis is more on algorithm engineering—packaging data structures for general use in applications—than on data-structure algorithms. Good interface design does rely on appropriate data structures and efficient algorithms, however, so this book complements traditional data structure and algorithms texts like Robert Sedgewick’s Algorithms in C (Addison-Wesley, 1998).

Most chapters describe one interface and its implementation; a few describe related interfaces. The "Interface" section in each chapter gives a concise, detailed description of the interface alone. For programmers interested only in the interfaces, these sections form a reference manual. A few chapters include “Example” sections, which illustrate the use of one or more interfaces in simple applications.

The “Implementation” section in each chapter is a detailed tour of the code that implements the chapter’s interface. In a few cases, more than one implementation for the same interface is described, which illustrates an advantage of interface-based design. These sections are most useful for those modifying or extending an interface or designing related interfaces. Many of the exercises explore design and implementation alternatives. It should not be necessary to read an “Implementation” section in order to understand how to use an interface.

The interfaces, examples, and implementations are presented as literate programs; that is, the source code is interleaved with its explanation in an order that best suits understanding the code. The code is extracted automatically from the text files for this book and assembled into the order dictated by the C programming language. Other book-length examples of literate programming in C include A Retargetable C Compiler and The Stanford GraphBase: A Platform for Combinatorial Computing by D.E. Knuth (Addison-Wesley, 1993).

Organization

The material in this book falls into the following broad categories:

Foundations: 1. Introduction
2. Interfaces and Implementations
4. Exceptions and Assertions
5. Memory Management
6. More Memory Management
Data Structures: 7. Lists
8. Tables
9. Sets
10. Arrays
11. Sequences
12. Rings
13. Bit Vectors
Strings: 3. Atoms
14. Formatting
15. Low-Level Strings
16. High-Level Strings
Arithmetic: 17. Extended-Precision Arithmetic
18. Arbitrary-Precision Arithmetic
19. Multiple-Precision Arithmetic
Threads: 20. Threads

Most readers will benefit from reading all of Chapters 1 through 4, because these chapters form the framework for the rest of the book. The remaining chapters can be read in any order, although some of the later chapters refer to their predecessors.

Chapter 1 covers literate programming and issues of programming style and efficiency. Chapter 2 motivates and describes the interface-based design methodology, defines the relevant terminology, and tours two simple interfaces and their implementations. Chapter 3 describes the prototypical Atom interface, which is the simplest production-quality interface in this book. [Download/view Chapter 3, an Adobe Acrobat PDF file (52K).] Chapter 4 introduces exceptions and assertions, which are used in every interface. Chapters 5 and 6 describe the memory management interfaces used by almost all the implementations. The rest of the chapters each describe an interface and its implementation.

Instructional Use

I assume that readers understand C at the level covered in undergraduate introductory programming courses, and have a working understanding of fundamental data structures at the level presented in texts like Algorithms in C. At Princeton, the material in this book is used in systems programming courses from the sophomore to first-year graduate levels. Many of the interfaces use advanced C programming techniques, such as opaque pointers and pointers to pointers, and thus serve as nontrivial examples of those techniques, which are useful in systems programming and data structure courses.

This book can be used for courses in several ways, the simplest being in project-oriented courses. In a compiler course, for example, students often build a compiler for a toy language. Substantial projects are common in graphics courses as well. Many of the interfaces can simplify the projects in these kinds of courses by eliminating some of the grunt programming needed to get such projects off the ground. This usage helps students realize the enormous savings that reuse can bring to a project, and it often induces them to try interface-based design for their own parts of the project. This latter effect is particularly valuable in team projects, because that’s a way of life in the “real world.”

Interfaces and implementations are the focus of Princeton’s sophomore-level systems programming course. Assignments require students to be interface clients, implementors, and designers. In one assignment, for example, I distribute Section 8.1’s Table interface, the object code for its implementation, and the specifications for Section 8.2’s word frequency program, wf. The students must implement wf using only my object code for Table. In the next assignment, they get the object code for wf, and they must implement Table. Sometimes, I reverse these assignments, but both orders are eye-openers for most students. They are unaccustomed to having only object code for major parts of their program, and these assignments are usually their first exposure to the semiformal notation used in interfaces and program specification.

Initial assignments also introduce checked runtime errors and assertions as integral parts of interface specifications. Again, it takes a few assignments before students begin to appreciate the value of these concepts. I forbid "unannounced" crashes; that is, crashes that are not announced by an assertion failure diagnostic. Programs that crash get a grade of zero. This penalty may seem unduly harsh, but it gets the students’ attention. They also gain an appreciation of the advantages of safe languages, like ML and Modula-3, in which unannounced crashes are impossible. (This grading policy is less harsh than it sounds, because in multipart assignments, only the offending part is penalized, and different assignments have different weights. I’ve given many zeros, but none has ever caused a course grade to shift by a whole point.)

Once students have a few interfaces under their belts, later assignments ask them to design new interfaces and to live with their design choices. For example, one of Andrew Appel’s favorite assignments is a primality testing program. Students work in groups to design the interfaces for the arbitrary-precision arithmetic that is needed for this assignment. The results are similar to the interfaces described in Chapters 17 through 19. Different groups design interfaces, and a postassignment comparison of these interfaces, in which the groups critique on one anothers’ work, is always quite revealing. Kai Li accomplishes similar goals with a semester-long project that builds an X-based editor using the Tcl/Tk system (J.K. Ousterhout, Tcl and the Tk Toolkit, Addison-Wesley, 1994) and editor-specific interfaces designed and implemented by the students. Tk itself provides another good example of interface-based design.

In advanced courses, I usually package assignments as interfaces and give the students free rein to revise and improve on them, and even to change the goals of the assignment. Giving them a starting point reduces the time required for assignment, and allowing substantial changes encourages creative students to explore alternatives. The unsuccessful alternatives are often more educational than the successful ones. Students invariably go down the wrong road, and they pay for it with greatly increased development time. When, in hindsight, they understand their mistakes, they come to appreciate that designing good interfaces is hard, but worth the effort, and they almost always become converts to interface-based design.

Acknowledgments

I have been using some of the interfaces in this book for my own research projects and in courses at the University of Arizona and Princeton University since the late 1970s. Students in these courses have been guinea pigs for my drafts of these interfaces. Their feedback over the years has been an important contribution to both the code in this book and its explanation. The Princeton students in several offerings of COS 217 and COS 596 deserve special thanks, because they suffered unknowingly through the drafts of most of what’s in this book.

Interfaces are a way of life at Digital’s System Research Center (SRC), and my 1992 and 1993 summers at SRC working on the Modula-3 project erased any doubts I may have harbored about the efficacy of this approach. My thanks to SRC for supporting my visits, and to Bill Kalsow, Eric Muller, and Greg Nelson for many illuminating discussions.

My thanks to IDA’s Centers for Communications Research in Princeton and La Jolla for their support during the summer of 1994 and during my 1995–96 sabbatical. The CCRs provided ideal hideouts at which to plan and complete this book.

Technical interactions with colleagues and students have contributed to this book in many ways. Even seemingly unrelated discussions have provoked improvements in my code and in its explanation. Thanks to Andrew Appel, Greg Astfalk, Jack Davidson, John Ellis, Mary Fernández, Chris Fraser, Alex Gounares, Kai Li, Jacob Navia, Maylee Noah, Rob Pike, Bill Plauger, John Reppy, Anne Rogers, and Richard Stevens. Careful readings of my code and prose by Rex Jaeschke, Brian Kernighan, Taj Khattra, Richard O’Keefe, Norman Ramsey, and David Spuler made a significant contribution to the quality of both.