The Petrie Multiplier

One of the memes currently floating around Facebook, and therefore probably the rest of the interwebs, is the Petrie Multiplier. A friend of mine pointed it out to me, and also pointed out a followup piece on a Petrie Multiplier variant simulation. It is worth pointing out that, as far as I can tell, the eponymous Karen Petrie created the model but neither named nor published it; thus it hardly seems fair to ascribe blame to her for any of its problems. And IMHO it has problems… The Petrie Multiplier model specifically purports to at least partially explain the perceived and (to some degree) measured frequency of sexist attacks against women in tech, using a purely statistical approach that ascribes statistically identical behavior to men and women. The basic idea, as far as I can tell, is something like this:

Assume a population distribution high in males.
Assume that men and women make sexist remarks against the opposite gender at the same low-ish rate (but none against their own gender).
Assume that men and women enter pairwise conversations randomly without regard to gender: that is, assume that conversational pairs are chosen uniform-randomly from the space of pairs of possible participants. (The model can be extended to larger conversational groups, but it doesn't seem to change the conclusions much, and it's a pain to deal with mathematically.)
Note that since a given woman encounters a man in conversation much more frequently than a given man encounters a woman, a woman is much more likely than a man to hear at least some given number of sexist remarks in a fixed number of conversations, even given assumption 2.

OK, let me pause to say some things here. First, I absolutely believe that the problem of male sexism against women in tech is real, pervasive and serious. I believe this based on first-hand encounters, on first-hand stories from my female friends, and on the published accounts of many women who have been victimized. I also absolutely believe that our gender ratios are terrible: indeed I have done some work to try to improve this situation.

That said, you may have guessed that I'm not too fond of the Petrie Multiplier model as an explanation or prescription for this important problem. I have a bunch of reasons for this.

First and foremost, mathematical or simulation models are worthless until they are tied to empirical measurements of the real-world phenomena being modeled. So far as I am aware, there is no such work on Petrie Multipliers. I should just stop there.

Foolishly continuing, I think many of the assumptions of the model (assuming I understood it correctly above) are highly questionable. I doubt that either men or women enter conversations in an entirely gender-independent fashion. I find it very likely that both men and women make occasional sexist attacks against their own gender. I doubt that a uniform-weighted model in which "all men and women are equally randomly sexist" is even a vague approximation to what's going on in the real world. I could probably come up with other such problems

Finally, upon consideration the whole premise seems kind of crazy. Of all the defenses I've ever heard of male sexism in tech, I can't recall one person saying "hey, women are sexist against us too." Maybe I just don't get around enough.

However, such abstract critiques are somewhat moot. Let's have some fun with the Petrie Multiplier model by exploring some of the math underlying it: at least that way we'll learn some math, and we will be led to an interesting conclusion that probably isn't what its creators intended. Whee!

(Sadly, there's no obvious easy way to do decent math in HTML in 2014. This is insane, but it's where we are. So bear with me as I do stupid HTML math. If your browser doesn't do subscripts and some standard HTML entities, you lose: sorry.)

OK, today we are going to be interested in some probabilities:

p_m is the probability that a given member of the population is male.

p_sm is the probability that a given male in the population says something sexist in a given conversation.

p_sf is the probability that a given female in the population says something sexist in a given conversation.

p_hm(i, n) is the probability that a given male in the population hears something sexist during exactly i out of n conversations.

p_hf(i, n) is the probability that a given male in the population hears something sexist during exactly i out of n conversations.

Our job is to find p_hm(i, n) and p_hf(i, n) as functions of p_m, p_sm and p_sf.

I am not a mathematician; thus, I suspect I'm missing some obvious possibilities. For one thing, in the limit as the number of conversations gets large, I suspect that there is some continuous approximation. For another thing, I suspect there's an easy closed-form Bayesian model for all of this. Meh. I'm a computer scientist, and I know a binomial model when I see one.

OK, let's write down the probabilities here. First, let's hit some simple cases. The probability that a female hears one sexist attack in one conversation is easily computed.

p_hf(1, 1) = p_m · p_sm

By the properties of probability this means that

p_hf(0, 1) = 1 - p_m · p_sm

The probability that someone hears more sexist attacks than conversations is zero (in this model), as is the probability that someone hears a negative number of sexist attacks.

p_hf(i, n) = 0 [i < 0 or i > n]

OK, let's take these as base cases for a nice recursive definition of p_hf. For the recursive case, note that if a female has heard i attacks in n conversations, it is because either she heard i attacks in the first n - 1 conversations and none in the nth, or she heard i - 1 attacks in the first n - 1 conversations as well as one in the nth.

p_hf(i, n) = p_hf(0, 1) · p_hf(i, n - 1) + p_hf(1, 1) · p_hf(i - 1, n - 1)

Whew. A parallel construction applies to p_hm. I won't repeat it here.

OK, rather than try to solve this function to get a closed form, I'm just going to give up and write a computer program that can evaluate it. You can find it on my Github as petrie.py. I wrote it in Python in order to make it easy to memoize it for performance (needed for large n) and because I'm teaching a Python class right now.

Let's just run it and see what things look like. We'll start with the model from the Gent's Petrie Multiplier example: p_m=0.8, p_sm=0.2 and p_sf=0.2. For comparability, we'll use n=50 as the conversation count. Here's the output. For each row, the column is the probability of i sexist attacks heard, starting with 0. Actually, that's not right: I've gone a step further and plotted the cumulative probability, so each column is the probability of ≤i sexist remarks heard, starting with 0.

    i =     0    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16
    phm:  .130 .400 .677 .861 .951 .986 .996 .999 .000
    phf:  .000 .002 .009 .031 .081 .168 .292 .441 .593 .728 .834 .907 .952 .978 .990 .996 .999

Note that we aren't directly comparable with the original, since we've used a somewhat different model than Gent/Petrie. In particular, we don't count multiple sexist attacks per conversation; we also don't have a particular number of sexist remarks, so we're calculating the more general probability.

Still, the conclusions of the Petrie Multiplier model still seem to hold. The fiftieth percentile for males is somewhere between 1 and 2, while for females it is somewhere between 7 and 8. This is even worse than the quadratic behavior reported there.

However (and this is where it gets interesting), let's think about what this model is telling us. Specifically, it is telling us that conversational sexism is (in this model) primarily a function of gender ratio and that the principal way to fix it is to somehow force more gender equality in conversational spaces. I don't like that conclusion much, since it comes with an obvious corollary: improving male behavior dramatically won't (in this model) have a great effect on conversational sexism. This is a Bayes' Rule sort of effect, and may not be obvious at first glance, so let's try another simulation. This time we'll cut male sexism by a factor of more than 3, to p_sm=0.06, while holding everything else constant and see what effect that has on our problem.

    i =     0    1    2    3    4    5    6    7    8
    phm:  .130 .400 .677 .861 .951 .986 .996 .999
    phf:  .085 .301 .567 .782 .909 .968 .990 .998 .999

Yes, the situation has improved quite noticeably. However, the females are still substantially worse off than the males. The cynic's message: even if we could achieve a perhaps-unattainable 3x reduction in an obnoxious and rare behavior, we wouldn't have succeeded in closing the sexism gap…so behave as you like!

That's all I have time for tonight. (More, really.) Flame me as you will, or just provide insightful critique. (B)