You are here

Things I see in student code

I've read quite a lot of student code over the years. Some of it is quite good. A lot of it is full of predictable badnesses. These notes attempt to enumerate some of this badness, so that you may avoid it.

This is pretty C-centric in its current form. However, many of its lessons are multilingual. This is a living document, so don't expect it to be the same next time you look at it. In particular, the numbering may change if something gets inserted in the middle. It is revisioned automatically by Drupal, so feel free to point at a particular revision if you care.

  1. Lacks Author Name and/or Copyright Notice

    If a license applies, make sure that's there too.

  2. Doesn't Pass '-Wall'

    Until you're very old and very wise, just treat warnings as errors. By then, you'll know enough to keep doing it.

  3. Broken Indentation

    Just don't go there. I'm serious. Your code work products should never contain an ASCII tab character; get your editor to help you.

    Apparently, this needs elaboration. I get a lot of code submissions from students that have severely broken indentation. About half of these are apparently students who just don't care about indentation. I've found I don't care about grading their code, either. The rest appear to be from students whose editor used different assumptions than mine about how tabs work. My editor uses the same assumptions my CP/M box did in 1977: a tab moves right to the next multiple-of-eight column.

    It is certainly possible to configure your editor to match that assumption closely enough that I won't get broken-looking indentation from you. But why risk it? If you ship me a file without any tab characters, you can pretty much guarantee that I'm seeing the same indentation you did. I've found the safest way to guarantee that you will ship me a file without any tab characters, even in the face of last-second edits, is to tell your editor to always appropriately convert tabs to spaces as it saves files. Then we will both agree that you are consistently indented, and I can grade your work.

    (Similar comments apply to open source work and other code, of course. "Be liberal in what you accept, and conservative in what you send." I'm not sure how liberal I can figure out how to be, so you need to be as conservative as possible to help me out.)

  4. Copy-and-Paste Errors

    Sometimes we copy and paste. This is often a bad idea; a unit or macro abstraction would have been better. If it's infeasible (as it is, rarely) to go there, at least proofread the living daylights out of your copy-pasted code, and then ask a friend to do it. This is a popular source of stupid and intractibly difficult bugs.

  5. Deep Nesting

    Nesting depth is one of the few reliable simple indicators of bad code. In general, if you're indented more than three levels, probably something is wrong.

    How to avoid this? Sometimes, just simply restructure the code. For example,

        if (condition1) {
           if (condition2) {
               if (condition3) {
                   if (condition4) {
                       /* something */

    would be much better as

        if (condition1 && condition2 && condition3 && condition4)
            /* something */

    Other times, you may have to find a way to pull some code out into a separate unit. It is not unheard-of, for example, to replace a deep nest of 'for' loops with a nice simple recursion.

  6. Way Too Much Code

    There's a really strong correlation between bad code and lots of code. In general, if you have a good architecture, detailed design and pseudocode, things come out short as well as clean. If you have a lot of code, you probably fouled up one of those things too.

  7. Way Too General Code

    Quit trying to "boil the ocean". You will produce way too much code, and it will be quite hard to debug. Worse yet, it will be full of bugs, because it will not have been tested properly. If you do the simplest thing that can work in your specific application, you have a chance.

  8. Parameters, Not Identifiers


        if (x == 1) then
        else if (x == 2) then
        /* continued */

    but instead


    If you have a digit in an identifier, it usually means one of three bad things: (1) it should have been a parameter; (2) it should have been an array index; (3) it's a pun. If you get rid of all of these misuses of identifier digits you will have few digits left.

  9. Using 'if' For 'switch'

    It is too bad that C-like languages generally don't allow arbitrary expressions to be switched on. (Nickle does). That said, if you are switching on numeric constants, then by all means use your language's 'switch' construct.

  10. Using 'switch' For Something Strange

    There are all-kinds of dumb uses of 'switch'. It's supposed to be used for switching, but people use it as a sequencing method ("The 'loop-switch' sequence") and in even stranger ways.

           for (i = 0; i < 5; i++) {
               switch (i) {
               case 0:
               case 1:
               /* etc */

    Perhaps my least favorite use is as a replacement for array lookups. Write

        int a[3] = {4, 7, 11};
        for (i = 0; i < 3; i++)
           if (x == a[i])

    rather than

        switch(x) {
        case 4:  i = 0; break;
        case 7:  i = 1; break;
        case 11: i = 2; break;
        default: i = 3; break;

    Hopefully the reasons are obvious.

    Never write

        switch(x) {
        case 2:  i = 0; break;
        case 3:  i = 1; break;
        case 4:  i = 2; break;
        case 5:  i = 3; break;
        case 6:  i = 4; break;
        case 7:  i = 5; break;

    instead of

        i = x - 2;

    It's really easy to forget the "break" in switch statements, and sometimes you want to leave it out on purpose. In the latter case, put a /* fall through */ comment so that your intent is clear.

  11. Boolean Expressions and Basic Logic


        !((x <= 1 || y <= 1) && x != 1 && y != 1)


        x >= 1 && y >= 1

    However, be careful and be good at basic logic. Dont let your "simplification" change the meaning of the program. For example, the above simplification is wrong.

  12. Code Tangles

    It is quite possible to write "spaghetti code" without using goto. If your flow control is complicated, it is probably wrong. Look for opportunities to factor the code, and especially to introduce units to make things clearer. Go back to the detailed design.

  13. Arrays and Structures. Please

    For some reason, students seem to be afraid of arrays and structures. Especially multi-dimensional arrays. Especially using arrays and structures together.

    Never write

        int ax1[10], ax2[10], ay1[10], ay2[10];
        for (i = 0; i < 10; i++) {
            ax1[i] = 0;
            ax2[i] = 0;
            ay1[i] = 0;
            ay2[i] = 0;

    Instead, say

        struct {
            int x, y;
        } a[2][10];
        for (i = 0; i < 2; i++) {
            for (j = 0; j < 10; j++) {
                a[i][j].x = 0;
                a[i][j].y = 0;

    You'll find this structure so much nicer to work with.

  14. Weird Dead Code

    When code isn't being used anymore, just remove it. You're using an SCMS (or better be), so it isn't lost. It's just out of the way and now no one has to read it anymore.

  15. I Hate C, So I'll Fix It

    Yes, C has a macro preprocessor. It probably shouldn't; it's an extremely blunt tool inserted in the days when C compilers were terrible.

    Try not to use the macro preprocessor for anything clever. In particular, things like the eternal

        #define BEGIN {
        #define END }

    are hateful and awful and sad.

    I feel strongly about understanding that the name of the null pointer in C is 0, and the name of the null character is '\0'. You need no macros here: the language is well ahead of you on this one.

  16. "My CS 100 Teacher Told Me..."

    a) "Never use 'goto'". Indeed, this one has become so ingrained that most modern students ask "What's a 'goto'?" Well, a "goto" is just a way of doing non-local control flow. Here's some actual industry rules about goto, drawn from practical experience: (1) Downward goto only, (2) goto only within the same unit, (3) goto only when you can't conveniently use a loop structure, (4) error handling is the most common use of goto.

    Of course, goto is also a reasonable way to implement a state machine. If you have one of those (carefully designed), go nuts with goto---it will actually make the state machine more readable!

    b) "Never use 'magic constants'". If I'm going to be writing a program to analyze poker hands, 417 is a magic constant. It should be given an identifying symbolic name, especially if it will be used many times and there's a good chance it might need to change everywhere. 52, on the other hand, is entirely non-magic, as are 4 and 13. These numbers readily identify themselves, and will never have to be changed unless the code is being massively refactored anyway. Don't be afraid.

    c) "Never use 'break' or 'continue." This is just crazy. It's highly idiomatic to write

        while (1) {
          /* stuff */
          if (condition)
          /* more stuff */

    Kernighan and Ritchie call this a "loop-and-a-half" in The C Programming Language, and it's used all the time in the real world.

    It pays to be a little careful with break and continue, but they should be used when appropriate.

    d) "Never early 'return'." That's such terrible advice I'm tempted to just move on. The whole point of having a unit is to be able to encapsulate a computation. When the computation is over, get out.

    Why this "rule", then? Because you need to be careful. Make sure you've done any cleanup that needs doing before you return. Make sure you aren't returning before doing something that needs doing. This is called "programming", and I can recommend it.

    The thing about early return is that done properly it can reduce code complexity dramatically. In particular, it decreases nesting depth.

  17. Crazily Overlong Or Overshort Units

    Empirically, there's a wide range of viable lengths for your procedures and functions. However, if you have a function that's much over 100 lines, or a bunch of one-liners, you should stare hard at them and apply the principles of coupling and cohesion; probably the code is mis-factored.

Acknowledgements: My undergraduate software engineering students contributed a lot to this document, as did the commenters at FOB. Thanks much to all.



The C while loop requires no do. Also, your recommendations 1-3 for goto seem to be informed by 4. Jumping backwards to a retry label is imo a valid use for goto, because the while (true) { ... continue; ... } idiom is just a "non-goto" way to write the same thing anyways.

Agreed on the rest.

"The C while loop requires no do." Indeed. I have no idea what I was smoking when I wrote this. The "true" is a little mysterious also. Fixed.

When there's an idiomatic way to do something and a non-idiomatic way, you should always choose the former. I agree that "never goto backward" is too strong, but the exceptions are so few that I leave the students to discover them for themselves.

Thanks much for your comments! Fob

1 is stupid and unimportant.

Uh, ok. Care to elaborate a bit? Fob

No it's not!

Great views on common programming and applies to a broad set of languages (after all, a language is just a tool and not the goal).

In the book of Kernighan and Ritchie is the c bible, but even in here are a lot of common structures and strategies which are usefull for all languages.

Clean programming is important especially for use webdeveloper who use "scriptkiddie"-languages (PHP, JavaScript and some might argue about PERL or Python).

Holy fuck you're condescending. As you yourself stated, a language is just the tool and never the end, as such, who are you to call a language a "scriptkiddie-language"? Christ.

NULL != 0

Indeed, NULL is not the same is 0. But for most purposes, 0 is better.

In the modern ANSI/ISO C library specification, NULL is "an implementation-defined null pointer constant." In my experience, NULL is most often defined to be (void *)0. Now the ANSI/ISO C language specification also requires that the integer constant 0 must always be usable as a universal null pointer constant equivalent to (void *)0. So why do we have NULL?

As best as I can determine/remember, it's primarily for a fairly obscure reason. In the old K&R days when we had no function prototypes, and subsequently for those still using K&R function declarations, the implicit conversion of a constant 0 parameter to the appropriate pointer type might (on some obscure architectures in a headwind) be performed incorrectly. As long as you are using ANSI/ISO function prototypes (and you are) this is not an issue.

It is also the case that if NULL is defined as (void *)0 (and who knows if this is true in your environment) it will give you slightly more typechecking. You will get a warning from

        extern void f(int);

in this situation. On the other hand, you can't actually rely on this warning, since NULL may just be defined to be 0. Sigh.

Now, the ANSI/ISO C library specification requires you to use NULL in certain places. You should always follow standards, so…yeah. In your own code, I recommend just using 0 where you mean NULL. It's what the language designers intended, and makes your code clearer and more reasonable.

Don't try to fix your programming language. If you don't like how C does things, I can highly recommend Haskell. Fob

Meh. NULL is used so much by so many developers that anyone writing a compiler would be insane to not make it behave as is commonly expected.

If you happen to be stuck working on an insane implementation (of which I have not encountered in 15 years of software development, including archaic embedded devices), then by all means stick to whichever works. Otherwise it really makes no difference.

My only quibble with using 0 is that my eye always sticks on it since it looks like a magic number.

Heh. In current Linux/glibc stddef.h we have

    #undef NULL
    #if defined(__cplusplus)
    #define NULL 0
    #define NULL ((void *)0)

So you can't even likely tell how NULL is defined on a single machine. (Bonus points for explaining to me why __cplusplus is magic here.)

I mean, yes, you're right, NULL always works just fine; I didn't mean to imply that it does not. Since 0 also always works fine I'd rather use the language's idea of a null pointer value than the library's.

Yeah, the problem with 20 years of training people to use NULL instead of 0 is that now most people can't easily read the language as intended.

As you say, it's not a big deal one way or the other. But hey, that's the joy of an author monologue---I can put my opinions front and center.

Thanks much for your comments. Fob

I like most of your points. I'm wondering, what are your thoughts on 16c and 16d with regard to McCabe's Cyclomatic Complexity?

Doug McCaughan @duggler

In general, I think the evidence is that nesting depth is better than Cyclomatic Complexity. Keep in mind that the Cyclomatic Complexity of a piece of code is simply the number of branches in that code (minus one, heh). Straight-line code with tests in it, like my triangle example in the comments above, isn't likely problematic even if there's a lot of tests. It's when the tests interact that there is generally trouble. Fob

What's wrong with tabs for indentation? I would say broken indentation in is a big no no, but why does the way you get there matter? If you editor isn't smart enough to work with both you need a new editor.

always, always use ascii tab character. Spaces are a huge pain in the a**. Here's why:

Different programmers use a different number of spaces (some like 3 -- some 4 -- some 2). If a bunch of programmers work on a file, they all use their own method, and you get indentation jambalaya.

If somebody uses tabs, then it will always look right to me because I can set (in my editor) how many screen lines a tab represents. So when they open up the file, they can see their 3 columns of space, and I can see my 2 columns of space per tab (or whatever).

I've seen pro-space people cite this as an advantage of spaces -- they can force me to view the code the same way they saw it. You shouldn't be making ascii art with your code -- if you have to do that, something is wrong.

Exactly. The bugaboo is the mixing of tabs with spaces. For example, some editors turn 4 spaces into a tab. Using only tabs is just fine.

That, incidentally, is the problem with Python. I don't mind enforced indentation. I rather like it in other people's code. But Python says it is ok to use (space)(space)(space)(space)(tab). Ugh! Python should reject any code which uses both tabs and spaces for indentation.

Of course, in a browser textbox (like this one) TAB goes to the next HTML element. It can also be tricky to search-and-replace in some editors. There are definitely pros and cons.

I use tabs, and I wish the rest of the world did. It standardizes indentation, unless you are programming in something archaic like pico.

BTW your captcha is makign me want to rip off my own testicles.

I'm 46 years old and have been programming since I was 12. I use editors much, much more archaic than Pico. Anyway, see my revised "tabs" section for rationale.

Sorry about the captchas. They are probably a bit more nightmarish than they need to be; I'll crank them back a bit. If you can give me more details about the troubles you've been having, it will be helpful. However, you'll probably want to wait until I've cranked back the captchas a bit.

Thanks much for your comments. Fob

As you say, you shouldn't be making ASCII art with your code. The problem comes when folks use editors that disagree what a tab means. For example,

         /*3*/   code for case 3
                 more code for case 3, that should line up

If the file containing this has a tab for just the second line, I hope it's an 8-space tab. If it has a tab after the comment for just the first line, I hope that it tabs to the next multiple-of-8 like ASCII classic tabs do. If it's both, then I hope that the tab goes to the next multiple-of-n for some n > 5 (probably 8), and that the overall block is indented to make that work.

Jamie Zawinski (among others) has a nice article covering these issues that I will let speak for me.

The issue of coders with different styles is a red herring. If you are working in someone else's code, you should always try to make your modifications invisible, as if they had always been there. This certainly extends to using the same indentation rules they are using. I've taught this in class, but it didn't seem like it was part of this document, so it's not in here.

In any case, thanks much for your comments! Fob

Actually I don't see any reasons to discuss the problem which can be fixed pressing single shortcut in your favorite source code editor. You'll have either tabs or spaces, your favorite braces style and other things just as good as you would write this code by yourself. If your editor cannot do this, maybe it was written back in 1977. Just throw it away and install something modern.

Personally I prefer tabs-only for the reasons mentioned above. Also it makes cursor navigation easier, because single pressing of left/right arrow key will move through whole tab just like I want. My current employer enforces to use spaces, but actually I don't care much: my IDE thinks for me in this question, converting tabs to spaces.

While I do agree with most points #16 and #8 left me scratching my head. Namely points c and d.

I think point c contradicts #12 a bit. Simplify/change your loop invariant and go without break and continue.

As for "early return", I try not use them. Make life a lot easier when I try to read and understand a function. Helps read the code in one pass from top to bottom.

Could you please talk more on point 8. I do not see how parameter changes things. In the example provided one has to branch on parameter value sooner or later.


There's a style of programming that I highly advocate, that goes under a name something like "successive strengthening of invariants". In this style, you try to pick pieces off the invariant and handle them, leaving a simpler invariant for the rest of the unit. For example

        int triangle_area(int a, int b, int c) {
            int t;
            if (a <= 0 || b <= 0 || c <= 0)
                return -1;
            if (a + b >= c || b + c >= a || c + a >= b)
                return -1;
            t = a + b + c;
            return floor(sqrt(t * (t - 2 * a) * (t - 2 * b) * (t - 2 * c)) / 4);

After each return statement, the invariant on the input parameters is strengthened. By the time you get to the actual computation, you can be confident it will succeed. IMHO, doing something wacky with conditionals and result variables would not improve this code. Non-toy examples work even better in my experience.

I'm not sure what the confusion is with regard to parameterization. The toy example may not be so compelling, but hopefully the idea is clear: write one unit to handle all the cases, rather than one unit per case. Generally this will result in clearer, more maintainable code---especially since the unit-per-case is typically done with copy-paste coding, whereas the single-unit may not need to break into cases at all, being equational.

Thanks for your comments! Fob

Thanks for this list, it's quite helpful!

I'm a novice programmer who really enjoys the whole process. It's clear that you have a lot of experience with brand-new folks because this is what my code looked like in the beginning! I'm always trying to learn and get better and using the "less code" mantra as the path to enlightenment.

  1. gotos. I think your advice is bang on for goto - ONLY FOR ERROR HANDLING - but if you're using C++ instead of C, then use exceptions if it's allowed at your company!

I just rewrote a bunch of my code which had a lot of error handling to use exceptions and the code shrunk by over 20% - but more important, the signature of my interface was made quite a bit easier.

  1. switch statements. I've written a huge amount of C and C++ code in the last 2+ decades, and I've noticed a surprisingly large amount of trouble regarding switch statements. Typical problems include:
  • forgetting the break/return at the end of a rarely-used case.
  • a new condition appearing which is not a simple case and you have to rewrite it all as an "if" statement.
  • the type of the variable from an integral type to something else, requiring a rewrite.
  • even major editors have difficulty with switch indentation in some cases (emacs, my great love, gets 'em wrong if you have namespaces set up not to indent - sure there's a way around it but haven't figured it out yet).
  • harder to read since switch statements are probably the second or third rarest control statement (after do/while and goto?)

My response to this is to make everything if statements as I write, and then, if needs by, convert them to switch statements at the very end.

Note that the efficiency different is minuscule.

  1. You're missing a key, key point - LONG ROUTINES are bad. If an individual function or procedure is more than one page, it's almost certainly too long and should be split up.

Great article, though, should be given to any C programmer.

Other than error handling, the other place I commonly use goto is in state machines. Makes them very readable Smile .

Exceptions are often a good thing for error handling and dynamic-unwind. There are also issues with them, of course. So…yeah.

I don't think there's any deep reason to avoid switch statements, but they certainly are error-prone. I'll add a note about deliberately-missing "break" vs the other kind, thanks!

Yes, too-long routines or too-many too-short routines are both a pack of trouble. I'll add a note to this effect. There was an interesting study a while back that found that there was no correlation between defect rates and unit size except at either extreme---contradicting years of advice to the effect that shorter routines were always better.

Thanks much for the input! Fob

This is an excellent article. The thing I like about it the most is your writing style, which is straight-forward and not too condescending.

Well done, sir.

thanks, im a student and this is helpful

In your Array and Structures examples the < symbols have been HTML encoded, and then passed into pre tag so they don't render the < and instead display the html encoded &lt ("ampersand l t", not sure if this will be encoded via the comments UI)

Yeah, I think this is a bug in...well...something. I can't figure out why the earlier examples work but this one doesn't. Anyway, kludged it by removing the blank lines in the examples; makes them slightly uglier, but solves the immediate problem. More investigation is needed. Fob

For #13, I think that

struct { int x, y; } a[2][10];

is unwieldy and unintuitive. I would never do that. Just my 2 cents.

A struct helps to add clear semantic meaning to a value. This example doesn't mean much, but how about something like this:

struct {Time open, close; } hours[7];

Then, you can easily reference hours[SUNDAY].open, for instance.

It's way better than the alternative the author cited. Your data structures should match the thing you're modeling. If the code you quote scares you, if you find it unintuitive, you're not ready to handle that kind of programming task, and should go back to school.

Despite its evilness, the preprocessor is an indispensable tool for situations requiring either lots of code generation, or idioms not natively supported by the language that would be difficult and error prone to implement otherwise.

However, such endeavours must not be undertaken by neophytes. The preprocessor is hellishly unforgiving, and writing macros without nasty side effects is a black art requiring COMPLETE understanding not only of the C spec, but also the compiler(s) you intend to support and possibly even the architecture you intend the code to run on.

Actually, this is beginning to sound a lot like the short answer "don't do it".

There are really only a few simple tricks you need to know to write clean CPP macros. The problem is more that there really isn't much of a use case for them in 2010. As you say, maybe bulk code generation is one of them; we've all written the 50-line macro that's expanded 10 different ways for performance.

I advise against creating "idioms not natively supported by the language" unless you're a highly skilled language designer yourself. I have been designing and helping to implement programming languages for more than 25 years, and I hesitate to try to "enhance" C with macros.

Mostly, inline functions plus reasonable enums and constant declarations plus optimizing compilers that kill dead code plus etc mean that in 2010 you can implement your ideas directly in C. That's a good thing. Like the ?: operator, the CPP macro is mostly an anachronism from a simpler time. Fob

Although authorship is obvious at the beginning of the code, it's often much less clear how the licensing goes for code done for university courses.

Yeah, a license is really only needed if you're going to be publishing your code. That's why I didn't put it on the mandatory list. For student assignments, at least here in Oregon, you own the copyright, but you implicitly grant the instructor certain rights by submitting your work for grading---no explicit license is necessary.

I've had students try to limit my rights to their homework with explicit licenses. Unless it's something reasonable and harmless, I just hand them their work back and say "thanks for playing". Fob

"You’re using an SCMS (or better be)", so the various authors of the code are held there. Why clutter the code with more bumpf ?

Not all languages will optimise a multiple condition statement so that the first false stops the other conditions being processed if (SimpleCalculation && ExpensiveCalculation)  /* something */ So for in that case you might want to use a nested if rather than multiple conditions. if (SimpleCalculation) {  if (ExpensiveCalculation) {   /* something */   } }

===8<=== int a[3] = {4, 7, 11}; for (i = 0; i < 3; i++) if (x == a[i]) break; ===8<=== You've just replaced constant or logarithmic complexity with linear one. Decent compiler will create hash-table of pointers for switch cases or at least sort them and do binary search, so with long switch you will have much less comparisons. This doesn't matter for given example due to its shortness, but avoid generalizations in this advice. You may also consider to create lookup table by yourself like: int a[]=(3,3,3,3,0,3,3,1,3,3,3,2); return x<0||x>11?3:a[x]; Looks messy, but maybe faster, and good comment will make things clear.

Thank you very much, this is really very helpful. Smile

Your student, Mohammad Elshiemy


I have seen this kind of coding styles several times when I was the lab assistant. Usually this case happens when the student tries to copy the homework from the other student who has already finished the assignment.

Of course, if they copy-paste it just like that, you will realise instantly that they are cheating. So, what the students do was to copy-paste the codes AND play around with the loops, branching, etc. to make it difficult to read, and (most importantly) more difficult to compare.