Tech

Random bits of technology

User trap of the week: Android SDK sqlite3 libreadline

This user trap was a nice one. I noticed a month or so ago that the "libreadline" support in my "sqlite3" executable on my home box was no longer working. For the uninitiated, this means that I couldn't use line history and line editing when working on database queries. Really annoying.

Tonight I decided I would fix it, dammit. An hour or so of careful investigation later I finally noticed that some joker had stuck a copy of sqlite3 without readline support in the Android SDK "tools" directory; my path stuff was configured to find that one first.

For the record, that's really user-hostile behavior. Don't stick spare copies of standard utilities where I will run into them. Yes, I probably should have my path in the other order, or sooner or later a hostile 'ls' is going to get me. Oh well.

Update: I've been working for probably 10 hours on optimizing a particular sqlite3 query which was inexplicably slow (a minute or so), which is what started all this. Guess what? With the current Debian sqlite3, it runs instantly! Dammit. Friend of Bart

User trap of the week: GIMP Script-Fu

When I was rebuilding my crosshatch plugin for the GIMP as Script-Fu Scheme a week or two ago, I ran into a beautiful little problem. It seemed that sometimes I needed to take the car (the "head of list") of the result of some Script-Fu functions for no apparent reason. In fact, the more I poked at it, the more I couldn't figure out when I didn't need to…

The 'Sleep Sort' meme

People keep pointing me at Sleep Sort in email and IRL; it is apparently a popular topic right now. To Sleep Sort an array of integers, you spawn one process per array element. Each process sleeps t seconds, where t is the value of its array element, and then outputs t

Proposals for Open Source Bridge

I'm submitting a talk to  Open Source Bridge - June 21–24, 2011 - Portland, OR

I have put up the first of my session proposals for the Open Source Bridge Conference. For those of you who don't know, Open Source Bridge will happen for the third year here in Portland Oregon June 21-24. [Update 2011/04/17: Corrected date as suggested by an anonymous reader. Somehow I had 2010 info on here before. Oops—hope I didn't confuse anyone.] It is a most awesome event, and I can highly recommend it to anyone and everyone.


My proposals this year are:

  1. A Tangled Tale
  2. Seven Habits Of Highly Obnoxious Trolls
  3. Build a DSL *Fast* With M4
  4. <Your Favorite Programming Language> Loses
Please feel free to mark and comment them at the OSBridge site.

Hope to see you all there! Friend of Bart


Things I see in student code

I've read quite a lot of student code over the years. Some of it is quite good. A lot of it is full of predictable badnesses. These notes attempt to enumerate some of this badness, so that you may avoid it.

This is pretty C-centric in its current form. However, many of its lessons are multilingual. This is a living document, so don't expect it to be the same next time you look at it. In particular, the numbering may change if something gets inserted in the middle. It is revisioned automatically by Drupal, so feel free to point at a particular revision if you care.

  1. Lacks Author Name and/or Copyright Notice

    If a license applies, make sure that's there too.

  2. Doesn't Pass '-Wall'

    Until you're very old and very wise, just treat warnings as errors. By then, you'll know enough to keep doing it.

  3. Broken Indentation

    Just don't go there. I'm serious. Your code work products should never contain an ASCII tab character; get your editor to help you.

    Apparently, this needs elaboration. I get a lot of code submissions from students that have severely broken indentation. About half of these are apparently students who just don't care about indentation. I've found I don't care about grading their code, either. The rest appear to be from students whose editor used different assumptions than mine about how tabs work. My editor uses the same assumptions my CP/M box did in 1977: a tab moves right to the next multiple-of-eight column.

    It is certainly possible to configure your editor to match that assumption closely enough that I won't get broken-looking indentation from you. But why risk it? If you ship me a file without any tab characters, you can pretty much guarantee that I'm seeing the same indentation you did. I've found the safest way to guarantee that you will ship me a file without any tab characters, even in the face of last-second edits, is to tell your editor to always appropriately convert tabs to spaces as it saves files. Then we will both agree that you are consistently indented, and I can grade your work.

    (Similar comments apply to open source work and other code, of course. "Be liberal in what you accept, and conservative in what you send." I'm not sure how liberal I can figure out how to be, so you need to be as conservative as possible to help me out.)

  4. Copy-and-Paste Errors

    Sometimes we copy and paste. This is often a bad idea; a unit or macro abstraction would have been better. If it's infeasible (as it is, rarely) to go there, at least proofread the living daylights out of your copy-pasted code, and then ask a friend to do it. This is a popular source of stupid and intractibly difficult bugs.

  5. Deep Nesting

    Nesting depth is one of the few reliable simple indicators of bad code. In general, if you're indented more than three levels, probably something is wrong.

    How to avoid this? Sometimes, just simply restructure the code. For example,

        if (condition1) {
           if (condition2) {
               if (condition3) {
                   if (condition4) {
                       /* something */
                   }
               }
           }
        }
    

    would be much better as

        if (condition1 && condition2 && condition3 && condition4)
            /* something */
    

    Other times, you may have to find a way to pull some code out into a separate unit. It is not unheard-of, for example, to replace a deep nest of 'for' loops with a nice simple recursion.

  6. Way Too Much Code

    There's a really strong correlation between bad code and lots of code. In general, if you have a good architecture, detailed design and pseudocode, things come out short as well as clean. If you have a lot of code, you probably fouled up one of those things too.

  7. Way Too General Code

    Quit trying to "boil the ocean". You will produce way too much code, and it will be quite hard to debug. Worse yet, it will be full of bugs, because it will not have been tested properly. If you do the simplest thing that can work in your specific application, you have a chance.

  8. Parameters, Not Identifiers

    Never

        if (x == 1) then
            thing1();
        else if (x == 2) then
            thing2();
        /* continued */
    

    but instead

        thing(x);
    

    If you have a digit in an identifier, it usually means one of three bad things: (1) it should have been a parameter; (2) it should have been an array index; (3) it's a pun. If you get rid of all of these misuses of identifier digits you will have few digits left.

  9. Using 'if' For 'switch'

    It is too bad that C-like languages generally don't allow arbitrary expressions to be switched on. (Nickle does). That said, if you are switching on numeric constants, then by all means use your language's 'switch' construct.

  10. Using 'switch' For Something Strange

    There are all-kinds of dumb uses of 'switch'. It's supposed to be used for switching, but people use it as a sequencing method ("The 'loop-switch' sequence") and in even stranger ways.

           for (i = 0; i < 5; i++) {
               switch (i) {
               case 0:
                    first_thing();
                    break;
               case 1:
                    second_thing();
                    break;
               /* etc */
    

    Perhaps my least favorite use is as a replacement for array lookups. Write

        int a[3] = {4, 7, 11};
        for (i = 0; i < 3; i++)
           if (x == a[i])
               break;
    

    rather than

        switch(x) {
        case 4:  i = 0; break;
        case 7:  i = 1; break;
        case 11: i = 2; break;
        default: i = 3; break;
        }
    

    Hopefully the reasons are obvious.

    Never write

        switch(x) {
        case 2:  i = 0; break;
        case 3:  i = 1; break;
        case 4:  i = 2; break;
        case 5:  i = 3; break;
        case 6:  i = 4; break;
        case 7:  i = 5; break;
        }
    

    instead of

        i = x - 2;
    

    It's really easy to forget the "break" in switch statements, and sometimes you want to leave it out on purpose. In the latter case, put a /* fall through */ comment so that your intent is clear.

  11. Boolean Expressions and Basic Logic

    Not

        !((x <= 1 || y <= 1) && x != 1 && y != 1)
    

    but

        x >= 1 && y >= 1
    

    However, be careful and be good at basic logic. Dont let your "simplification" change the meaning of the program. For example, the above simplification is wrong.

  12. Code Tangles

    It is quite possible to write "spaghetti code" without using goto. If your flow control is complicated, it is probably wrong. Look for opportunities to factor the code, and especially to introduce units to make things clearer. Go back to the detailed design.

  13. Arrays and Structures. Please

    For some reason, students seem to be afraid of arrays and structures. Especially multi-dimensional arrays. Especially using arrays and structures together.

    Never write

        int ax1[10], ax2[10], ay1[10], ay2[10];
        for (i = 0; i < 10; i++) {
            ax1[i] = 0;
            ax2[i] = 0;
            ay1[i] = 0;
            ay2[i] = 0;
        }
    

    Instead, say

        struct {
            int x, y;
        } a[2][10];
        for (i = 0; i < 2; i++) {
            for (j = 0; j < 10; j++) {
                a[i][j].x = 0;
                a[i][j].y = 0;
            }
        }
    

    You'll find this structure so much nicer to work with.

  14. Weird Dead Code

    When code isn't being used anymore, just remove it. You're using an SCMS (or better be), so it isn't lost. It's just out of the way and now no one has to read it anymore.

  15. I Hate C, So I'll Fix It

    Yes, C has a macro preprocessor. It probably shouldn't; it's an extremely blunt tool inserted in the days when C compilers were terrible.

    Try not to use the macro preprocessor for anything clever. In particular, things like the eternal

        #define BEGIN {
        #define END }
    

    are hateful and awful and sad.

    I feel strongly about understanding that the name of the null pointer in C is 0, and the name of the null character is '\0'. You need no macros here: the language is well ahead of you on this one.

  16. "My CS 100 Teacher Told Me..."

    a) "Never use 'goto'". Indeed, this one has become so ingrained that most modern students ask "What's a 'goto'?" Well, a "goto" is just a way of doing non-local control flow. Here's some actual industry rules about goto, drawn from practical experience: (1) Downward goto only, (2) goto only within the same unit, (3) goto only when you can't conveniently use a loop structure, (4) error handling is the most common use of goto.

    Of course, goto is also a reasonable way to implement a state machine. If you have one of those (carefully designed), go nuts with goto---it will actually make the state machine more readable!

    b) "Never use 'magic constants'". If I'm going to be writing a program to analyze poker hands, 417 is a magic constant. It should be given an identifying symbolic name, especially if it will be used many times and there's a good chance it might need to change everywhere. 52, on the other hand, is entirely non-magic, as are 4 and 13. These numbers readily identify themselves, and will never have to be changed unless the code is being massively refactored anyway. Don't be afraid.

    c) "Never use 'break' or 'continue." This is just crazy. It's highly idiomatic to write

        while (1) {
          /* stuff */
          if (condition)
            break;
          /* more stuff */
        }
    

    Kernighan and Ritchie call this a "loop-and-a-half" in The C Programming Language, and it's used all the time in the real world.

    It pays to be a little careful with break and continue, but they should be used when appropriate.

    d) "Never early 'return'." That's such terrible advice I'm tempted to just move on. The whole point of having a unit is to be able to encapsulate a computation. When the computation is over, get out.

    Why this "rule", then? Because you need to be careful. Make sure you've done any cleanup that needs doing before you return. Make sure you aren't returning before doing something that needs doing. This is called "programming", and I can recommend it.

    The thing about early return is that done properly it can reduce code complexity dramatically. In particular, it decreases nesting depth.

  17. Crazily Overlong Or Overshort Units

    Empirically, there's a wide range of viable lengths for your procedures and functions. However, if you have a function that's much over 100 lines, or a bunch of one-liners, you should stare hard at them and apply the principles of coupling and cohesion; probably the code is mis-factored.

Acknowledgements: My undergraduate software engineering students contributed a lot to this document, as did the commenters at FOB. Thanks much to all.

Why cut-copy-paste is hard

[This note is primarily a response to a recent LWN thread. I'm posting it here because it got a bit long for a discussion forum.]

Several years ago, I joined the X.Org Foundation Board of Directors. One of my campaign pledges was to organize a project to finally "fix" X cut-copy-paste (CCP).

It has long been acknowledged by anyone paying attention that the user experience around CCP on the X desktop is horrible (although it has improved a bit in the last couple of years). I'm smart, I could fix it, right?

Uh, no. Here's some reasons why CCP is hard, especially on X…

HOWTO: Rotated Halftones in Inkscape

There are a few good tutorials on the web about how to get a (mediocre) halftone screen in Inkscape (1, 2). (The GIMP, by contrast, has a special halftoning plugin that does a nice job. I wish that Inkscape had one of those.)

One thing that none of these tutorials tell you is how to rotate the halftone screen. This turns out to be important, since unrotated screens of different colors do not overlay well.

In this tutorial, I summarize the basic technique for halftoning in Inkscape, and suggest a way to do rotation…

Notes on getting Debian Gnome Bluetooth working

This is a rought-draft page that I'm making to take some notes on how I got the Debian Gnome Bluetooth support to work this time around. Bluez and its supporting utilities are an endless source of frustration and perhaps even of nightmare, so it's probably a good idea to have some notes…

Random non-repeating sequences

While doing some research for an upcoming paper, I came upon a 2008 blog post describing a special case of the following interesting problem:

Given an alphabet of n symbols, construct a uniformly-selected sequence of m of these symbols, with the sequence containing no duplicate symbols.

[Updated 2009/12/14] The difficulty arises when both m and n are large. The obvious method is a rejection method: repeatedly pick a random symbol from the alphabet and check whether you've picked it already. If not, append it to your target number. The problem is that the duplicate check seems to require log m time even if you code it cleverly, giving an asymptotic performance of O(m log m). Hash tables can help some, but ultimately you're going to waste a lot of time checking for the unlikely case that you've generated a duplicate.

There were various solutions given in the comments, but none of them were optimal. A spoiler follows…

Everything broke at once

I woke up yesterday morning in Tuscon. A lovely day…

Syndicate content