Wednesday, October 18, 2006

More on the Lancet study on deaths in Iraq

When you have a set of expectations, and you check them against reality, what do you do when reality is wildly different from what you expect?

Well, you look over what you did... you make sure the measurements were rock solid. Then, if they are, you accept that your estimate is the best you can do. You discard your expectations, not reality.

As I explained in an earlier post, the survey that suggested 650,000 excess deaths in Iraq was done using good methods. Clusters of people were investigated at random, meaning that any given cluster of people was as likely to be investigated as any other cluster. It found that about 4% of them had died in the 40 months past the invasion, instead of the nearly 1.6% that was expected.

There are a lot of expectations that were shattered by this, and there were a lot of questions raised, but this is the best estimate we have, the only one that could possibly give an accurate count of the number of deaths in Iraq.

There have been a lot of people who've raised a lot of objections about the report. People have pointed to Iraq Body Count's numbers, and asked how they could be off by more than a factor of 12. Similarly, people have asked, if over 80% of the deaths recorded in the survey had death certificates, what's happened to all of them? Why doesn't the Iraq government know about them?

Really, this ends up coming down to the same question. Iraq Body Count obtains its numbers from news sources, and checks them via surveys of hospitals and morgues, relying on certified deaths. They can reasonably be expected to know about any deaths that the Iraqi government knows about, and vice versa.

So what happened? Why doesn't the Iraq government know about these deaths? Why can't the Iraqi government say that this survey is a solid estimate, if it is?

Well, if this was America, and we were dealing with deaths from disease, and the government was willing to let these numbers be known, then I'd be surprised if the government didn't have a decent idea about the number of deaths. But it's not America; it's Iraq. And it's not deaths from disease, it's deaths from violence, and the number of deaths point to a huge amount of instability. Nations that have enough bean-counters to track every single death certificate also tend to have enough police to prevent this level of slaughter. If you had to run Iraq, and had to choose between security or tracking death certificates, where would your expend the most energy?

Of course, this isn't proof that the death certificate tracking is terrible in Iraq. Nevertheless, it puts us in a slightly different position.

We've looked at reality, and it's different from our expectation, raised by death certificates counted by the government and statistics gathered by Iraq Body Watch. Unless we knew, with certainty, that the death certificate tracking was rock solid, we'd have to be willing to discard the expectations. If we knew that deaths were tracked with great precision in Iraq, and the tracking agency's estimates were far off, then we'd want to investigate both this study and the certificate tracking, and see which one was better.

However, in this case, we know that the study is sound, and we have damn good reason to believe that death certificate tracking isn't. Until our knowledge of how death certificates are tracked in Iraq changes, the proper position to take is to assume that the study is correct, and the other estimates are not.

However, there would be a way to challenge this. Let's suppose that Iraq Body Count has found names and locations for 60% of the people whose deaths it has tracked. If we grabbed 500 deaths that IBC would want to count, and IBC could account for approximately 300 of them, then we'd know that IBC has a pretty good tally, and we'd have to dig very deeply to figure out what's gone wrong. If, on the other hand, IBC could only account for about 30 of them, it would indicate that the study is right: IBC's estimates are off by an order of magnitude.

Let me point something out: the only way we could find our random sample of 500 deaths would be to use the methods that were used in this study: going to random locations, and asking random households about deaths that had occurred. Any other method would not yield a truly random sampling of deaths.

Ok, I'll even begin the discussion.

Based on the Lancet methodology, would you recommend one begin the error analysis of the data based upon a Gaussian Distribution, a Binomial Distibution, a Poisson Distribution, or one that I haven't suggested and why?
You can begin any discussion you want, but you will no longer get a response.

For those who are curious, Kevin is referring to three methods of modeling situations. The Gaussian distribution is the "bell curve" you've all heard about. The binomial distribution is the distribution that is actually occurring; each person has a particular chance of dying, or living, over a period of time, and has either lived, or died. The study can be viewed as an attempt to find the chance that a particular person had died. (In fact, it's base measurement was of the crude death rate, not of probability of living or dying. The two are, of course, directly related.)

The Poisson distribution is related to both, but is often used to model rare events, with an implicit assumption that two events can't occur at once. It's the limit of a binomial distribution taken with np (number of trials times probability of success)= lambda (the parameter of the Poisson distribution) as n goes to infinity.

The binomial distribution is very close to the Guassian distribution, to the point that it's accepted practice to substitute one for the other under certain circumstances. A meaningful danger of doing this is that the Guassian distribution is unbounded on both sides, and the binomial distribution is not... it is bounded by 0 (no trials 'succeeded'), and n (the number of trials).

In this case, when looking at the average rates of death over 47 clusters, one expects the mean death rate to be roughly equivalent to the Guassian distribution, because of the central limit theorem... the mean of 47 random variables will tend towards a guassian distribution. We expect a very good fit, given the random variables in question, and the number of samples.

Kevin is likely to complain that I haven't mentioned how I would "begin the error analysis". He's free to do so; I have more important things to do with my life than play quiz kid to someone whose respect I do not value.
Bravo, maybe you do have some college--that's the first hint of education you've shown.
Post a Comment

<< Home

This page is powered by Blogger. Isn't yours?

Weblog Commenting and Trackback by