An utterly awful math test

My boy just got the practice version of the Oregon Standardized Math Test. This Sample Test is intended for Grade 3.

The test is a putrid example of how bad these standardized tests are. As near as I can tell, it's a combination of testers being proud of how well they can trick third graders, and utter ignorance of basic mathematical principles. Without further ado, I present the most obnoxious questions…


2. The distance between Portland, Oregon and Detroit, Michigan is most often measured by which unit of measurement?
A. Meters
B. Kilometers
C. Centimeters
D. Millimeters

How about "E. Miles"? All the other choices are in the noise compared to this one, as some simple Googling will indicate. This question is a great example of a false choice.

4. Bill earned 2 dollars for washing the floor. Mike earned 3 dollars for washing the floor and cleaning the windows. How are chores and money related?
A. The more chores done, the more money the person made.
B. The fewer chores done, the more money the person made.
C. No matter how many chores were done, each person earned the same.
D. Doing difficult chores did not help a person earn money.

The intended answer is, of course, A. But D is impossible to judge. Which ones were the "difficult" chores? Were they listed here? Does "help a person earn money" mean that more difficult chores have a bigger payoff? If so, is washing the floor more difficult than cleaning the windows? D should have been left off, or replaced with something better.

8. It has not rained on July 12 for 107 years. How likely is it to rain on July 12 of this year?
A. It is not likely to rain.
B. It will not rain.
C. It probably will rain.
D. It will rain.

The intended answer A is difficult to distinguish from B, since it depends on causality. Clearly, it has rained in some location every July 12 for the last 1000 years; thus, location must be the key cause of the reported infrequent rain. If we are on the moon, or at the North Pole, B would surely be the correct answer. If we're in Death Valley, answer A is correct.

10. Which of the following choices is a translation?
A.
B.
C.
D.

Completely dependent on the definition of an arcane technical term that is totally inappropriate for third graders to worry about. Unless they're doing affine geometry, I guess.

13.

How many students chose blue as their favorite color?
A. 4
B. 8
C. 10
D. 15

Apparently the testers hate the testees. Why in heck are all the "students" shown in blue? (This is literally the only color anywhere in the test!) And why would two "students" be represented by a single student icon? 10 or 20, maybe, but two? This amounts to a trick question.

14. Alija gets money each day for doing jobs at home.

SunMonTueWedThurFriSat
0.751.502.253.003.754.50?

If this pattern continues, how much will he earn on Saturday?
A. $0.75
B. $4.75
C. $5.00
D. $5.25

(Note the multicultural homage here. Hopefully most third-graders will recognize "Alija" as a proper name even at the beginning of a sentence, where capitalization can't help them; further, they will hopefully recognize it as a male name, lest they fail to match it with the male pronoun in the latter part of the question.)

This is the worst question so far. If you answered D, give yourself the "I understand what these idiots are doing" prize, as that is the official correct answer. However, the answer to the question as worded should be A, as it seems clear from context that Alija is earning 75¢ per day for his labors, not an amount that increases by 75¢ per day. The ambiguity could have been eliminated by giving a header for the table row, which would be good practice anyhow, or by changing the question to "how much will Alija have earned for the week?"

Truly, unforgivably awful.

22. The best unit to measure the amount of water in a swimming pool is
A. milligram
B. milliliter
C. liter
D. kiloliter

Maybe you thought you had the gestalt of these tests by now. But if you picked D, you are still missing it. The correct answer as given in the key is C.

Never mind that a typical swimming pool might have a volume of about 50 kiloliters. Never mind that the first Google hit on the query "swimming pool volume kiloliters" is a Texas school test that in answer 10 gives cubic meters (equal, if you will recall, to kiloliters) as the "correct" answer. You should pick liters, you idiot. What were you thinking?

24.

The two congruent triangles can be rearranged to form which of the following figures?
A. rectangle
B. triangle
C. parallelogram
D. all of the above

OK, it's easy to see how to get a rectangle here. The angled parallelogram is hard to see—neither my wife nor I saw it at first—but of course a rectangle is technically a parallelogram, so you wouldn't have to figure out the hard thing. Making a big triangle out of the little triangles is impossible without reflection. It took me a while to recall that reflected triangles are congruent, and I'm still not sure that "rearrange" should include reflection in addition to the usual translation and rotation. Given the triangles as shown, I think that none of the answers given are uniquely correct.


Overall, then, I would give this test a score of 17/25. It will be interesting to see what scores the students actually get on this test, and what they miss.

What could be done to avoid this kind of fiasco? First and foremost, actual practicing math and science professors with doctorates should be checking these answers. I didn't do anything special to find these problems: just read the stupid test.

Failing getting the test right, there are a couple of things that could still help. The constraint of providing four possible answers per question should be relaxed. Most of the problems probably could have been avoided by not trying to force extra answers into the space to fill out the quota. In addition, not all "wrong" answers should be weighted equally. Indeed, it would be OK to give some "wrong" answers full credit. Making the test reasonable should be the first priority, but I'm guessing that these additional measures would help to improve the accuracy of the test results in the presence of test errors.

I hope my boy gets the right questions "wrong" on the actual test next year.


Update 2008-02-28: Apparently this got linked from Reddit. The comments there, as opposed to here, are almost uniformly critical of my analysis.

The only thing I would like to answer is the repeated allegations that my son did poorly on this test. First of all, it was clearly labelled a practice test. Second, my son hadn't taken it at the time I orginally posted this. Third, he got a score of "exceeded" when he did take it: the second highest score in his class.

I couldn't give a darn about the test. I'm just afraid for our ability to teach mathematical concepts. Friend of Bart

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

After 8 months

Coming back to this article after 8 months, I have to give the award to question 13.

  • We are presented with a table that has botched column headings, but apparently has little people to represent the number of people who prefer a given color.

  • After a moment's thought we realize that even the green people are blue, so the blueness of the little people is just a confusing detail. Fortunately, the row labeled "BLUE" is at the top of the list.

  • We then note that there are four little people in the row labeled "BLUE" and answer 4.

  • We catch ourselves at the last second, and note that the bottom of the graph says that "one picture of a little person equals two people". We scratch our heads and change the answer to 8.

  • We then try to figure out why the creator of the table chose to use one icon to represent each pair of people. Perhaps, we hypothesize, there was not room in the rows of the table for the maximum-length row of people at 1::1.

  • Our hypothesis is dashed when we realize that the author left exactly enough whitespace on the right to make using a 1::1 representation possible. This can't be a coincidence: there's room for exactly 10 little people in the green row with 5 icons in it. The author must have originally created the table at 1::1, divided each row in half, and then put the legend at the bottom post facto.

  • We wander away crying.

Runner-up

My runner-up, though, is question 8, for a different reason. All the other questions are somewhat uninterestingly broken. No one is likely to be particularly damaged by being taught a "wrong" answer.

Question 8, though, tries to capture important concepts involving the relationship between chance and causality, concepts that need to be extremely well understood by anyone hoping to have a career in science or engineering—and completely botches it.

Getting the "right answer" to question 8 teaches a child that one can always determine probabilities by induction rather than causality, that likelihoods and probabilities are the same, that an inductive likelihood is never zero, and thus that the "mathematically correct" answer has little to do with the real-world correct answer.

It's much, much better than whacking the kid in the face with a stick, but it's still not a very nice thing to do to a young human.

And they did not fix much

I'll pile on. Accepting that for such a widely distributed test, care should be taken for issues such as wording and cultural knowledge, I tally 11 of 25 questions which should be improved.

I'll append email I sent to the ODE people a couple of years ago after my son started finding erros on the practice test while the Simpsons was on. This did serve to open our discussion of second-guessing a test. Don't worry about accuracy: give them the answer they are looking for. I cringe to teach my kids such stuff.

ODE did eventually make a partial fix to problem 24 with the triangles. As we saw the exam in 2005, the answers to choose from included "trapezoid" with "all of the above" being the desired answer.

================ What I sent in 2005 =========== From: Rik Smoody Rik2@smoo.com Date: 2005 December 13 12:02:27 AM PST To: Cathy.Brown@state.or.us, ode.frontdesk@ode.state.or.us Subject: Errata in sample test

I looked at the sample 3rd grade math test found at http://www.ode.state.or.us/teachlearn/testing/samples/2004_06/mathsmptestgr3.pdf with my son. We were dismayed to find several errors in the test and in the answers. Notwithstanding that the directions say to decide on the BEST answer, the test should be more carefully edited.

Problem 12: Which unit of measurement would best weigh a butterfly? The available answers are two units of MASS and two of distance. A gram is NOT a unit of weight, but of mass. A metric unit of weight is a Newton. The test-writers fell for a popular mistake. To be fair, you can buy scales calibrated in either newtons or grams in Japan. I don't know about the rest of the civilized world. It should be no problem as long as people do not take those scales off this planet. Cool I won't relay what the older kid, who is studying physics, said about this problem.

Similarly, Prob 17 does give one unit of weight, the ton, although it would not be well-scaled for weighing a single fireplace log. The sole unit of weight is not the desired answer in the answer key.

Problem 15: Both answer A and B are projections of a cylinder. It requires a kid to guess which answer a test writer would want them to answer. This error cannot be fixed by simple wording. Replace one of the diagrams with something which cannot be a cylinder, or require both A and B to be marked.

Problem 2: "The distance between Portland, Oregon and Detroit, Michigan is most often measured by which unit of measurement?" The distance between Portland and Detroit is indubitably most often measured in "miles", followed by "hours" (and we know that is not distance, but that's what people use to answer "How far to...?"). While I applaud educating towards metric measurement, the question should still have a TRUE answer. Reword to something like "Of the following units, which is most often used to measure the distance..." or "The distance between Portland, Oregon and Detroit, Michigan is most often measured by which of the following units of measurement?" and the answer would at least be valid. Observe that this is cultural knowledge, not SI unit standard (see #22 below)

Problem 22 stresses uncommon common knowledge with appropriate-scaled units. The appropriate-sized unit among those offered to use for a swimming pool is indeed a kiloliter. The only justification for preferring the desired answer of liter is that for some reason, people who measure in liters seldom use kiloliters, but state a larger, rounded number of liters, probably for reasons of inertia. Or they use cubic meters. I do not want to complain too much on this one, as I think it better to stick with SI units and let the numbers handle the magnitude.

Problem 24 has a couple of problems. If the triangles are intended to be right triangles, they should be labelled as such, and/or said to be so in the problem. Students should NOT be encouraged to just eyeball a diagram as a basis for mathematical information. ASSUMING they are right triangles, they can be joined to form • a rectangle by joining hypotenuses after simple roatation • a kite by joining hypotenuses after flipping one of the triangles • a triangle by joining a pair of corresponding legs such that the right angles meet, or • a parallelogram by flipping one of the triangles and then joining corresponding legs. No other arrangement yields a 4-sided figure. If the triangles are not right triangles, only kites and parallelograms can be constructed as above. The only interpretation which allows a trapezoid is if you consider rectangles to be special forms of trapezoids.

The stated answer "D, all of the above" is just wrong.

Score: 6 non-trivial errors out of 25 problems: a kid with superior knowledge would likely score only 19 due to disagreeing with the wrong answers or careless wording, and thus barely exceed the standards.

Please try to do better.

metric

All the metric questions were straight forward. Miles is only used by a very small number of countries. We use liters and kilmometers on a daily basis so all our students would have no problems with those questions.I'm from Canada and I found the test pretty good. But, hey maybe we think differently?

Swimming pool volume?

The swimming pool volume question was straightforward for you? I'm surprised to hear this. Certainly Google supports the view that kiloliters would be a reasonable answer: it and its twin sister cubic meters are well-represented on the net.

(Heh. I once lost a HS debate because the judges agreed with my opponent that I needed to cite a source for the statement that an aqueous solution of 1mg/L is the same concentration as 1 part per million.)

Thanks much for your comments!

Re: No. 24. It seems to be

Re: No. 24. It seems to be assumed that the corner that looks like 90 degrees is 90 degrees. Since it's not marked as 90 degrees, this is assumption. If it is only very close to 90, then you can't make a rectangle, nor a triangle, but only a parallelogram. If it is 90 degrees, then you can.

True enough

I think this does fall somewhat to the level of nit-picking, though. It would have been easy to put the traditional right-angle symbol in the corner, and mark the equivalent sides, but perhaps the students haven't been taught these notations yet?

did seinfeld write this

did seinfeld write this blog? this is a blog about absolutely nothing of value. you're that smart-ass kid who wasted everyone's time in school with their asinine questions. these questions are all completely reasonable. you fail.

So many questions

OK, I admit it. I'm genuinely interested in your opinion.

I'd love it if you'd tell us a little bit about your background. Where are you in life, and what do you see in your future? What experiences have you had with and around education that inspired your comment? Why did my blog entry apparently disturb you?

As a University CS Professor, I'm much more likely to talk to people like the other commenters on this blog than people like yourself. I'm fascinated to hear more.

Horrible

The worst thing about this is that it actually penalizes the smarter students, who are able to pick up on the ambiguities or inaccuracies of the questions and become trapped by them. The students who are lucky enough to be stupid in the exact same way that the test-makers were stupid will have an advantage!

The end result is to teach not-so-smart students that they are smart, and teach smart students that school is an insane waste of time.

Yep. Lets see, with a B.S.

Yep. Lets see, with a B.S. in Mathematics, M.S. Computer Science I had a hard time with those questions. They are not well formed questions.

If I had kids I'd be sorely tempted to home-school.

PS - They're teaching transformations in grade 3? Really?

Number 4

I would have picked B for #4 since the first person is getting $2/chore where the second person is getting $1.50/chore, so on a per chore basis, the first person is paid more for doing less work.

Reading not Math

To me the worst thing about this test is that it seems to be more reading comprehension than math, and a poorly done reading comprehension test at that. Personally I think the eroding of American education has been intentional. It's too egregious and ongoing to have been by chance.

On the other hand, I've been

On the other hand, I've been coaching my kids through this level of math, and aside from the translation question, all of the answers were straightforward given the way this material is presented to kids. Personally, I think the test reflects more the idiocy of math education rather than the test, which is the same style of question that most of the kids will have been taught with.

The pool question is particularly apt, since I would be quite surprised if even 10% of the kids answered kL (and even then, most of those would have been random guesses rather than reasoned answers), even though that is certainly the best answer. The question is there to test if the kids have a rough understanding of what metric measures correspond to what english measures (at this grade level anyway), not that they understand metric measures as such at all. That's for later. They expect the kids to answer in the following way - what is the english unit measure I would use - what is the metric measure that compares to that. So for pools, the logic is gallons -> liters. The distance question another version of this miles ~ kilometers.

As to having science PhD's review the test (which was of course written by people with PhD's in education), the next thing you know people will be insisting that science majors teach science in school rather than education majors.

I agree that it's as much the education as the test

My boy got a strong score on the computer version of the test yesterday. As you say, he is smart enough to echo back more or less what he is told. I just think it's sad that there isn't more emphasis on the kind of mathematical reasoning that is needed to solve these kind of problems. I suspect that this is because the people creating the materials don't understand mathematical reasoning.

Really?

This is not that difficult. Smart kids would get them right.

You've missed the point

No one should necessarily "get them right". They're poorly-posed or trick questions.

I correctly guessed what answer the testers were looking for on five of these eight. But teaching kids that math is a kind of mind reading, like a mentalist act, seems to me to be pretty damaging to their chances of understanding math and enjoying it.

Awful Math Test

It's worst then you think. In problem 4 add the consistent sentence: Gail earned $5 for washing a window. Then the answer is clearly B. In problem 10, if D is a translation, then so are a, b, and c. The fact that they are also rotations is irrelevant. It would be nice if the heading was over the right column in 13. In problem 14 the answer is clearly zero since Saturday is a day of rest for Alija. And as for problem 22, when did Oregon start using water meters that measure liters instead of gallons.

Nice catch on 13

I can't believe I missed the bungled column headings in question 13. I was too busy concentrating on the other mistakes in that question. Can't any of the test creators make a decent table? It's not that hard.Friend of Bart

Innumerate

Let's see, back in 1976 I scored 770 on the SAT Math section and 77 on the SAT Math 2 Achievement Test, which allowed me to qualify to attend MIT.

With some similar quibbles as yours, I "failed" every question.

Question #4, with its gaping open-endedness, struck me as a possible rate problem: a dollar an hour and two hours for the floor, whereas one hour for the windows? Who can tell from the scanty description.

JJB

Post new comment

CAPTCHA
This question is for testing whether you are a human visitor to prevent automated spam submissions.
Image CAPTCHA
Copy the characters (respecting upper/lower case) from the image.
Syndicate content