I've read quite a lot of student code over the years. Some
of it is quite good. A lot of it is full of predictable
badnesses. These notes attempt to enumerate some of this
badness, so that you may avoid it.
This is pretty C-centric in its current form. However, many of its lessons are multilingual. This is a living document, so don't expect it to be the same next time you look at it. In particular, the numbering may change if something gets inserted in the middle. It is revisioned automatically by Drupal, so feel free to point at a particular revision if you care.
Lacks Author Name and/or Copyright Notice
If a license applies, make sure that's there too.
Doesn't Pass '-Wall'
Until you're very old and very wise, just treat
warnings as errors. By then, you'll know enough to keep
doing it.
Broken Indentation
Just don't go there. I'm serious. Your code work products
should never contain an ASCII tab character; get your
editor to help you.
Apparently, this needs elaboration. I get a lot of code submissions from students that have severely broken indentation. About half of these are apparently students who just don't care about indentation. I've found I don't care about grading their code, either. The rest appear to be from students whose editor used different assumptions than mine about how tabs work. My editor uses the same assumptions my CP/M box did in 1977: a tab moves right to the next multiple-of-eight column.
It is certainly possible to configure your editor to match that assumption closely enough that I won't get broken-looking indentation from you. But why risk it? If you ship me a file without any tab characters, you can pretty much guarantee that I'm seeing the same indentation you did. I've found the safest way to guarantee that you will ship me a file without any tab characters, even in the face of last-second edits, is to tell your editor to always appropriately convert tabs to spaces as it saves files. Then we will both agree that you are consistently indented, and I can grade your work.
(Similar comments apply to open source work and other code, of course. "Be liberal in what you accept, and conservative in what you send." I'm not sure how liberal I can figure out how to be, so you need to be as conservative as possible to help me out.)
Copy-and-Paste Errors
Sometimes we copy and paste. This is often a bad idea; a
unit or macro abstraction would have been better. If it's
infeasible (as it is, rarely) to go there, at least
proofread the living daylights out of your copy-pasted
code, and then ask a friend to do it. This is a popular
source of stupid and intractibly difficult bugs.
Deep Nesting
Nesting depth is one of the few reliable simple
indicators of bad code. In general, if you're indented
more than three levels, probably something is wrong.
How to avoid this? Sometimes, just simply restructure
the code. For example,
if (condition1) {
if (condition2) {
if (condition3) {
if (condition4) {
/* something */
}
}
}
}
would be much better as
if (condition1 && condition2 && condition3 && condition4)
/* something */
Other times, you may have to find a way to pull some code
out into a separate unit. It is not unheard-of, for
example, to replace a deep nest of 'for' loops with a
nice simple recursion.
Way Too Much Code
There's a really strong correlation between bad code and
lots of code. In general, if you have a good architecture,
detailed design and pseudocode, things come out short as
well as clean. If you have a lot of code, you probably
fouled up one of those things too.
Way Too General Code
Quit trying to "boil the ocean". You will produce way too
much code, and it will be quite hard to debug. Worse yet,
it will be full of bugs, because it will not have been
tested properly. If you do the simplest thing that can
work in your specific application, you have a chance.
Parameters, Not Identifiers
Never
if (x == 1) then
thing1();
else if (x == 2) then
thing2();
/* continued */
but instead
thing(x);
If you have a digit in an identifier, it usually means
one of three bad things: (1) it should have been a
parameter; (2) it should have been an array index; (3)
it's a pun. If you get rid of all of these misuses of
identifier digits you will have few digits left.
Using 'if' For 'switch'
It is too bad that C-like languages generally don't allow
arbitrary expressions to be switched on. (Nickle does).
That said, if you are switching on numeric constants,
then by all means use your language's 'switch'
construct.
Using 'switch' For Something Strange
There are all-kinds of dumb uses of 'switch'. It's
supposed to be used for switching, but people use it
as a sequencing method ("The 'loop-switch' sequence")
and in even stranger ways.
for (i = 0; i < 5; i++) {
switch (i) {
case 0:
first_thing();
break;
case 1:
second_thing();
break;
/* etc */
Perhaps my least favorite use is as a replacement for
array lookups. Write
int a[3] = {4, 7, 11};
for (i = 0; i < 3; i++)
if (x == a[i])
break;
rather than
switch(x) {
case 4: i = 0; break;
case 7: i = 1; break;
case 11: i = 2; break;
default: i = 3; break;
}
Hopefully the reasons are obvious.
Never write
switch(x) {
case 2: i = 0; break;
case 3: i = 1; break;
case 4: i = 2; break;
case 5: i = 3; break;
case 6: i = 4; break;
case 7: i = 5; break;
}
instead of
i = x - 2;
It's really easy to forget the "break" in switch statements,
and sometimes you want to leave it out on purpose. In the
latter case, put a /* fall through */ comment so that
your intent is clear.
Boolean Expressions and Basic Logic
Not
!((x <= 1 || y <= 1) && x != 1 && y != 1)
but
x >= 1 && y >= 1
However, be careful and be good at basic logic. Dont let
your "simplification" change the meaning of the
program. For example, the above simplification is wrong.
Code Tangles
It is quite possible to write "spaghetti code" without
using goto. If your flow control is complicated,
it is probably wrong. Look for opportunities to factor
the code, and especially to introduce units to make
things clearer. Go back to the detailed design.
Arrays and Structures. Please
For some reason, students seem to be afraid of arrays
and structures. Especially multi-dimensional arrays.
Especially using arrays and structures together.
Never write
int ax1[10], ax2[10], ay1[10], ay2[10];
for (i = 0; i < 10; i++) {
ax1[i] = 0;
ax2[i] = 0;
ay1[i] = 0;
ay2[i] = 0;
}
Instead, say
struct {
int x, y;
} a[2][10];
for (i = 0; i < 2; i++) {
for (j = 0; j < 10; j++) {
a[i][j].x = 0;
a[i][j].y = 0;
}
}
You'll find this structure so much nicer to work with.
Weird Dead Code
When code isn't being used anymore, just remove
it. You're using an SCMS (or better be), so it isn't
lost. It's just out of the way and now no one has to read
it anymore.
I Hate C, So I'll Fix It
Yes, C has a macro preprocessor. It probably shouldn't;
it's an extremely blunt tool inserted in the days when
C compilers were terrible.
Try not to use the macro preprocessor for anything
clever. In particular, things like the eternal
#define BEGIN {
#define END }
are hateful and awful and sad.
I feel strongly about understanding that the name of the
null pointer in C is 0, and the name of the null
character is '\0'. You need no macros here: the
language is well ahead of you on this one.
"My CS 100 Teacher Told Me..."
a) "Never use 'goto'". Indeed, this one has become so
ingrained that most modern students ask "What's a 'goto'?"
Well, a "goto" is just a way of doing non-local control
flow. Here's some actual industry rules about goto, drawn
from practical experience: (1) Downward goto only, (2)
goto only within the same unit, (3) goto only when you
can't conveniently use a loop structure, (4) error
handling is the most common use of goto.
Of course, goto is also a reasonable way to implement a state
machine. If you have one of those (carefully designed), go
nuts with goto---it will actually make the state machine
more readable!
b) "Never use 'magic constants'". If I'm going to be
writing a program to analyze poker hands, 417 is a magic
constant. It should be given an identifying symbolic
name, especially if it will be used many times and
there's a good chance it might need to change everywhere.
52, on the other hand, is entirely non-magic, as are
4 and 13. These numbers readily identify themselves, and
will never have to be changed unless the code is being
massively refactored anyway. Don't be afraid.
c) "Never use 'break' or 'continue." This is just
crazy. It's highly idiomatic to write
while (1) {
/* stuff */
if (condition)
break;
/* more stuff */
}
Kernighan and Ritchie call this a "loop-and-a-half" in
The C Programming Language, and it's used all the time
in the real world.
It pays to be a little careful with break and continue,
but they should be used when appropriate.
d) "Never early 'return'." That's such terrible advice I'm
tempted to just move on. The whole point of having a unit
is to be able to encapsulate a computation. When the
computation is over, get out.
Why this "rule", then? Because you need to be
careful. Make sure you've done any cleanup that needs
doing before you return. Make sure you aren't returning
before doing something that needs doing. This is called
"programming", and I can recommend it.
The thing about early return is that done properly it can
reduce code complexity dramatically. In particular, it
decreases nesting depth.
Crazily Overlong Or Overshort Units
Empirically, there's a wide range of viable lengths for your
procedures and functions. However, if you have a function
that's much over 100 lines, or a bunch of one-liners, you
should stare hard at them and apply the principles of coupling
and cohesion; probably the code is mis-factored.