artiface/artiface/ ProgrammingParadigms
Programming Paradigms
There are a number of programming paradigms that have been developed over the years. We focus on three of the main ones: Imperative, Object Oriented and Functional. It should be noted that few languages are purely one of these, and indeed these paradigms are more about how one organises their code, rather than necessarily a feature of a particular language: You can write in C in an object oriented style, a functional style (to some degree), as well as the imperative style that is natural in C.
Programming Styles
There are three main programming styles we want to cover: imperative, object oriented, and functional. Imperative emphasises dictating to the machine how a computation is to be performed, breaking things down into procedures which call each other to accomplish various tasks. The form of imperative programming we illustrate here is often called procedural, due to how everything is factored into procedures. One could also consider a non-procedural paradigm where everything is a combination of instructions and goto statements, much like how code ultimately ends up when translated into machine code. In order to illustrate the progression from low-level to higher level language, we will start at the machine level, the proceed to assembly language and non-procedural programs. We then continue with procedural imperative programming in the style of C, then illustrate the object oriented paradigm as applied to organising C code. After this we move on to how C++ makes support for such organisation explicit. Next we discuss the issues of mutable state and why immutable data and minimising side-effects have become popular. Finally we come to purely functional programming, in the style of Haskell.
Machine Language
At the level of the CPU, everything is basically an array of (binary) numbers: instructions are binary numbers as are all data. An instruction is a collection of bits which states the operation to be performed and usually where data is to be taken from, and where it is to be put. For example, there are instructions for moving data (in chunks of 8, 16, 32 or 64 bits) from memory to the CPU’s registers. There are instructions which compare values, perform arithmetic upon them, and so on. There are instructions to manage control flow, by instructing the CPU as to which instruction to execute next. Finally there are instructions which manipulate the state of the CPU in important ways. As such, a program is a list of instructions stored as binary numbers.
Programming without Procedures
A typical CPU has little concept of procedures. Indeed their support for procedures is usually limited to Call and Return operations. Without using procedures, our programs would look like the following pseudocode.
1. Store 0 in register A.
2. Store 42 in register B.
3. Store 10 in register C.
4. Load byte from address A into register D.
5. Store byte from register D in address B.
6. Increment A by 1.
7. Increment B by 1.
8. Decrement C by 1.
9. If C is not zero, go to step 4.
10. Halt
What this program does is to copy 10 characters from one area of memory to another, using a programming construct known as a loop. Loops are fundamental to most forms of programming (functional programming being the exception, where recursion is used instead). In the early days of computer programming, this was how things were done, and indeed if one reads the famous series The Art of Computer Programming, by Knuth, one will still see algorithms described in these terms.
Procedures
It becomes tedious to repeat ourselves. If there are multiple places in a program where we need to copy a number of bytes from one place in memory to another, we don’t want to have to write out a looping construct like above each and every time we want to copy data. The first means of software reuse is the procedure, where we write a sequence of instructions once, as a subroutine, and call it each time we want to do a copy. At the machine level there is a little housework to be done when doing this: if we want to use registers to store intermediate values, the caller of a procedure expects to find those registers as they left them. Thus we must save the values of registers somewhere, and restore them when we are done. The usual place on any modern CPU is called the stack, which is an area of memory that functions like a stack of plates: one puts information on the top of the stack, and retrieves it from the top of the stack, following the principle of Last In, First Out.Every programming language, bar assembly, does this for the programmer automatically, and behind the scenes. To illustrate at a machine language level how this looks, we shall illustrate with another short pseudocode program.
Procedure main
1. Store 0 in register A (the copy procedure will take the source location from this register)
2. Store 42 in register B (it will take the target location from this register)
3. Store 10 in register C (this register stores the number of bytes)
4. Call procedure memory_copy
Procedure memory_copy
1. Push the value of register D onto the stack
2. Load byte from address A into register D.
3. Store byte from register D in address B.
4. Increment A by 1.
5. Increment B by 1.
6. Decrement C by 1.
7. If C is not zero, go to step 2.
8. Load value of register D from stack.
9. Return to caller
The Call instruction will store the address of the next instruction on the stack, and then jump to the location of memory_copy
The first instruction of memory_copy
writes the value of D to the area of memory pointed to by the stack pointer, and then moves the stack pointer to where the next position in the stack. (Or something like this, such as moving the stack pointer first — as long as one is consistent in how this is done. Usually a CPU will have explicit instructions for pushing to, and popping from a stack.)
We shall assume that C does not need to be saved. At a low level, one needs a convention as to which registers need to be saved by the caller, and which need to be saved by the callee.
We need to store the whole of register D onto the stack, no matter its size: if it is a 64 bit register, 8 bytes need to be stored; if it is a 32 bit register, 4 bytes need to be stored.
This housekeeping is part of what is known as a calling convention.
The Return to caller
step loads the address of the next instruction from the stack and jumps there.
Usually a CPU will have an explicit instruction for doing just this.
Since procedures are so fundamental, essentially every modern language has support for them, from the venerable gosub of old BASIC interpreters, so the functions of more modern languages such as C or Javascript.
In C, for example, the above program would be rendered as
void memory_copy(char* source, char* target, int count)
{
for(int i=0; i<count; ++i)
{
*target = *source;
++target;
++source;
}
}
int main()
{
...
memory_copy(location1, location2, 10);
}
In both cases, we are giving explicit directions as to how things are to be done.
Object Oriented Programming
When a program gets larger, the organisation of the procedures in a program becomes an issue: what to call them, for example, and how to organise data. As such, conventions such as the following tend to be introduced. This is illustrated using C.
struct Car { // the struct is a means of organising data
char make[16];
char model[16];
int production_year;
float top_speed;
};
char* Car_getMake(struct Car* car)
{
return car->make;
}
void Car_setMake(struct Car* car, const char* make, int make_length)
{
if( make_length > 16 ) { error(“make too long”); }
int i;
for( i=0; i<make_length; i++ ) { // copy the data
car->make[i] = make[i];
}
for( ; i < 16; i++ ) { // zero the rest of the make field
car->make[i] = 0;
}
}
As such, we are organising our data into well-defined structures, and naming our procedures using a convention where the name of the structure comes first, followed by a name describing what the procedure does to it. This is the beginning of what is known as object oriented programming. It is such a common paradigm that many languages provide explicit support for programming in this style (such as C++), if not explicitly mandating it (such as Java). For example, in C++, the above code would become something like
class Car {
char make[16];
char model[16];
int production_year;
float top_speed;
public:
char* getMake()
{
return make;
}
void setMake(char* newMake)
{
// copy data from newMake into the make field of the Car object
}
};
As such, this allows a program to be broken up into well defined chunks of code and data, usually each having one precisely defined job to do. The art of breaking a larger program into well-defined classes is object oriented design. A major benefit of this is that, once the code for a well-defined, general purpose object is written, it can be readily reused. In addition, this style of programming fits naturally with our everyday intuitions: you would eat and apple, but not drive an apple; and you would drive an car and not eat it! As such, we associate to an object a collection of things (called methods) that we could sensibly do to it.
Inheritance
A common feature of object oriented languages is known as inheritance. This is another means of organising objects and code, and permits the programmer to deal with more general and more specific operations in a natural way. A way to illustrate inheritance is that a Ford Escort is a Car, but a Car is not necessarily a Ford Escort. Thus a class FordEscort would inherit from a class Car. Anything we can do with a Car, we can do with a FordEscort, but there are perhaps things specific to a FordEscort we could do which would not make sense with other Cars. (Or perhaps information about it such as its make and model information, which is common to all FordEscort’s, but not common to all Car’s).
Inheritance in modern programming languages comes in two forms: interface inheritance, and implementation inheritance. Interface inheritance is where the list of things we can do with e.g. a FordEscort, derive from some other class, such as Car. The interface specifies what you can do with an object, but not how to do it.
On the other hand, implementation inheritance, is a matter of one class using code from a class it derives from. So for example, the way we drive our FordEscort is the same as driving any other car, so the Drive method would be defined in class Car, but on the other hand the procedure to FixEngine, would be specific to a FordEscort (even in the interface of Car demands that any car have a means to FixEngine.)
Functional Programming
Modern computing has its roots deeply entrenched in mathematics. Mathematical computations were the first job of programmable computing, and data management followed shortly thereafter. Indeed before the Second World War, the term computer would mean essentially a human mathematician (and early computers were referred to as electromechanical computers).
Features of Mathematical Functions
Now functions in mathematics have the feature that, given the same input, they produce the same output. If f(x) = 2x, then f(2) always equals 4. This is an useful feature of functions in programming: if a function’s output is defined by its input, and not where it occurs in a program, or the state of the program at the point where we evaluate that function, it is much easier for a programmer to reason about what that function will do. So this is the first feature of functions: inputs, and inputs alone, determine outputs.
A second feature of functions in their mathematical sense is that, aside from evaluating their inputs and producing an output, they do nothing else. In computer programming parlance, functions have no side effects. Again this makes it much easier for a programmer to reason about what a function does, and more importantly, what it does not do.
Obviously this is not desirable for all functions: a good example where we do not want functional behaviour is a random number generator: each time we call Random(), we want to get something different, and independent of what we got from Random() earlier.
Immutable Data
A feature of most modern functional programming languages is immutable data. That is, once an object has been defined, it is not changed: if we want to change it, we make another object. This is again important when it comes to reasoning about a program’s behaviour, and especially in the context of modern multi-threaded programs and multi-core CPUs. Essentially immutable data means that a programmer can look to where an object’s data was defined, and know that it was not modified anywhere else.
The notion that a piece of data will not be modified finds use well beyond the spheres of functional programming: the C language has the const keyword for precisely this. Now in C/C++ there are essentially two different meanings of const, depending on how it is used. If a variable is an integer, say
const int x = 2;
Then this means that, while in its scope, x will always be 2. On the other hand, if a variable is a pointer to an object (that is, its value is the address of something in memory), then
const MyObject* anObject = x;
means that the contents of the object will not be modified, but the variable may be reassigned. That is,
const MyObject* anObject = x;
object = y;
is not an error. The address pointed to can be modified, but the data pointed to cannot. On the other hand,
MyObject const *anObject = x;
expresses the constraint that the address that the the variable anObject points to cannot be changed, but the data contained in that object can. In languages such as Javascript, the const keyword means this latter meaning.
Higher Order Functions
A feature of modern functional programming languages, which also finds its place in languages such as Python and Javascript, is that of higher order functions. That is, functions that take functions as parameters. To illustrate, in this case using Javascript’s map method,
let a = [1, 2, 3, 4, 5]
let f = x => x*x // f is a function that squares its input
let b = a.map(f) // apply f to each element of a
As such, modern Javascript would be impossible without the ability to do this.
Functional Programming Languages
So functional languages have the features that data, once defined, is immutable, that functions inputs determine their outputs, and that functions do nothing aside from evaluate their inputs to produce their outputs. The language Haskell exemplifies this, and while it does allow things like input/output (where clearly a function that reads from a file produces output that depends upon what is stored on disk, not merely the filename), it makes it difficult to do anything aside from program purely in this functional style.
Languages such as Haskell arose from the desire to improve the ability to reason about a program at the point where it is compiled into machine code. Haskell has a strong and expressive type system, which makes it harder to write mistakes which then find their way into production code: the best way to fix bugs is not to write them in the first place, and a strong type system helps ensure that bugs get detected early and don’t make their way into a final product.
Conclusion
The aim here was to provide a brief overview of the dominant paradigms of programming in modern computing. Fully describing these paradigms would naturally, for each of them, require a number of textbooks worth of material, but I hope that this overview paints a useful, if broad, picture.