Thorns, odds, and the impossible

A few years ago I was walking on a disused path in some woods near where I live, when I noticed a small branch that seemed to have attached itself to my foot. When I looked more closely, I saw about an inch of thorn sticking up through my boot just in front of the ball of my foot.

This was a genuine official hiking boot with about an inch and a half to two inches of combined Vibram outsole and orthotic insole, and a Gore-Tex and nylon upper. Needless to say, I was dumbfounded. How had a thorn managed to penetrate all that? More to the point, how had it penetrated my foot without my feeling it?

I pulled out the thorn; it was easily four or five inches long and almost a quarter of an inch thick at its base. A honey locust, I figured, although I hadn’t seen one with thorns quite that big. As I tossed the branch away from the path, it occurred to me that I’d better get a look at my injury before too long. A little way further up the path I found a convenient log, sat down, and gingerly removed the shoe, fully expecting to see a slowly expanding patch of red where the thorn had come through. When I looked, I realized why I hadn’t felt anything.

The damned thing had passed precisely between my big toe and its neighbor as far back as it could without hitting flesh. When I say precisely, I mean there was no evidence of its passage whatsoever — not blood, not broken skin, not so much as a minor scratch.

What, as they say, were the odds of that happening? Well, I maintain that, since it had actually occurred, the odds must have been 100%.

You could calculate the odds as a hypothetical exercise, taking into account such variables as the average number of dead branches small enough to go unnoticed on a disused path, the percentage of those likely to have huge thorns, the probability of such a thorn lying at the precise angle required to use the force of a footfall to penetrate a sturdy shoe. You’d also have to take into account the width of the path, the length of my stride, the size of the shoe, the total area of the sole, and so on. Then you could come up with some number, which would surely be vanishingly small.

And yet, it happened. You might be familiar with the concept of the black swan, popularized in a book of that title by Nassim Nicholas Taleb. I’m not interested in the failings of statistics which do not take all of the significant variables into account; it may be true that probability calculations can be improved by using the proper data. My point is that even when all knowable variables are taken into account, you can still end up on the wrong side of the conclusion.

Why is that? It’s simple. Statistics are descriptive, not predictive. They describe in detail past events in similar contexts to the one you’re interested in. In the end, any conclusion you draw is based on inductive reasoning, which by its nature is vulnerable to data gaps. When an event actually occurs, such as my adventure with the thorn, it becomes data, and statistical inference is irrelevant to it. The question, “What are the odds of that?” is pointless.

Does that mean that judging risk on the basis of probability is useless? Not at all. But it is why the severity of a negative outcome is so important in the decision process.

If I have a 10% chance of spilling wine on my shirt, that’s not going to stop me from drinking some. But if I have a 10% chance of dying if I get Covid-19, that’s a different story.

Disease by the numbers

Credit: tumsasedgars

Every morning I check the Covid-19 stats for the state and county I live in. Every day the numbers get bigger and the picture grimmer, even when things are improving. How can this be?

First of all, let me dispel the notion that I want to downplay the danger. Far from it; I fully support efforts to get people to wear masks in public places, to avoid large groups, and to keep a reasonable distance apart when interacting. I support those measures being made mandatory when necessary. I hear people say that they’ve “done their time” in lockdown, and that it seemed to them that the threat turned out to be much less than the government let on.  Setting aside for the moment the question of what motivation there would be for the government to impose lockdown, except to keep the pandemic under control, these people miss the obvious fact that the measures they complain about are exactly why the direst predictions never materialized.

But those issues have been dissected and debated abundantly; there’s no reason to add my 2 bits beyond what I have already written.  My interest here is in information and the extent to which it is useful.

Keeping a running total of infections doesn’t seem to be very useful. You might find it helpful if your motive is to keep the sense of crisis alive, but even that is questionable. There is no shortage of published articles on crisis fatigue. At a certain point, there’s just an overload, and the human alert system just shuts down. Eat, drink, and be merry, as the saying goes, for tomorrow we die. 

We need a way to assess how many people are actually infectious at any given time. In my county, for example, just over a thousand cases have been reported since the beginning in March. Something over sixty have died.  But I can’t easily find how many of those cases have recovered.

So, out of that thousand, you can subtract the deaths, which are statistically miniscule. Wouldn’t it be nice if you could also subtract the number of recovered, and therefore no longer infectious, cases?  I know those numbers are available, but why can’t we see a number for current cases that are actually a potential threat? Shouldn’t that number be front and center?

Guns, revisited

A few days ago, I received a response to my post about guns, disagreeing with my general premise that the easy availability and general plethora of guns in America was responsible for a large part of the gun violence in this country. My immediate reaction was to fire off a reply reiterating my view.  Then, a bit later, I thought I should add some statistics to bolster that view, and I found a web site that gave me exactly what I needed, challenges to the major arguments against gun control, with statistics and citations for the studies generating them. One thing nagged at me, though: the site was Mother Jones, an openly partisan site for left-leaning ideas. I decided to do a little more research, just to be on the safe side.

But a curious thing happened. The deeper I dug, the less clear things became. I don’t mean I was tempted to change my views, I mean I was having trouble finding truly convincing support for either side of the argument. Don’t get me wrong; there was no shortage of sites claiming to have the definitive facts on the subject. If I wanted a page that proved beyond a doubt that guns are the problem, it was easy to find it. The problem was that it was equally easy to find a site that proved beyond a doubt that far from being the problem, guns were the solution. There was unmitigated cherry picking on both sides. For example, one site noted that Finland, which has the fifth largest number of guns per capita in the world, also has an extremely low rate of gun crime. It neglected to point out that it also has a very rigorous system of gun registration and control. Another site repeated the often cited statistic that there have been 181 school shootings since Columbine, but seemed to have counted 120 events that either were not at schools or did not involve guns, leaving only 61. You might say that even 61 is unacceptable, and you’d be right, but this kind of misrepresentation only weakens the credibility of the source.

Even the seemingly unimpeachable was no help. Pro-gun sites often cited the statistic that in the last ten years, gun ownership has gone up, while gun crime has gone down; anti-gun sites have data that show that where gun ownership has dramatically increased in the country, so has gun crime. Which of these statistics is true?

Both, it turns out. But the problem with both is an old bugbear of statistics: correlation is not necessarily the same as causation. In the former instance, if you break down the ten year span, it is difficult to line up instances of gun ownership with lower crime in local settings. In the latter, it isn’t clear which came first, the increase in violence or the rise in gun ownership.

So, what to do? Is it even possible to find a source that is impartial? In the end, I did find one, FactCheck.org. This is a group of journalists dedicated to checking the statements of politicians for truth, and they spare no one, regardless of political affiliation. Of course, they only check the statements of politicians, but this issue is so politically charged that there was no shortage of relevant information. Their gun page is full of statements checked by researching academic studies, government statistics and news sources. After analyzing all of the statements concerning guns and gun violence, they came to a startling conclusion:

Given currently available statistics, it is impossible to determine unequivocally what, if any, effect the number of guns in America has on gun crime.

Nobody, it turns out, keeps the kind of records needed for definitive conclusions. The FBI, the most reliable source, keeps extensive records on gun violence, but dismisses justifiable incidents, which they broadly define as an incident in which the shooter felt an immediate threat. This, most obviously, rules out almost all shootings by police, but also by civilians, whether their perception was accurate or not, and whether they were telling the truth or not. George Zimmerman’s killing of Trayvon Martin, for instance, would not be included in this kind of statistic. Other records are incomplete, and need to be correlated with each other to make sense, often with the result that studies with vastly different levels of thoroughness, even competence, are compared.

Furthermore, there is no good database on accidental gun shootings; this must be gleaned from news sources. There is no comparison of how gun regulation affects these issues; most studies focus on simple ownership of guns. Finally, there is no clear study of how gun ownership affects the mental state of people involved with confrontations, and how often these escalate into violence.

Personally, I still lean strongly toward gun regulation; I don’t see the value of allowing just anyone to keep guns, and I don’t see why guns should not be registered, and I find it hard to ignore the case of Finland. I am also aware that, in spite of all the rhetoric about crime and safety, the biggest factor in the minds of many activist gun owners is government; that’s why there’s so much emphasis on “taking our guns away,” which has never been a serious proposal by gun control advocates.   Simply, they fear that without guns, and lots of them, the government will take away their freedom. The implication is clear. They consider armed insurrection a viable option. Forgive me if I find that chilling.

Still, some good, reliable statistical information on these issues is sorely needed.  In the end, it is no longer acceptable, if it ever was, to find a site we trust and just go with whatever they are saying. There is no site which is reliably impartial all the time, on all issues, and the data are simply not available. We, as a country, need to collect the kinds of data that can lead to better conclusions, and we need to commission better studies. The reports need to be transparent, and include a full discussion of methods and sources.

“But I can show you ten sites with exactly that kind of information!” you may be saying. I know you can. And so can the guy on the other side, who you think is an idiot.

This should be disturbing to both sides of the debate.


ADDENDUM: About statistics

When you’re looking at stats, there are two rules to keep in mind. The first is GIGO, garbage in, garbage out. The reliability of a statistic is no better than the data used to generate it. The second is the old Interrogation Rule, if you torture data enough, they will tell you what you want to hear. This second rule applies not only to conscious efforts to distort reality, but to unconscious factors like confirmation bias as well. It has long been noted that if you begin a statistical study hoping for, or even just expecting, a particular outcome, the chances of getting it are excellent.

So what to do? Do poor old lay people like ourselves just throw up our hands and despair of ever being able to evaluate statistics? Not at all, there are simple ways to do this. Unfortunately, they are not easy. One way is to read what opponents trying to debunk a study say about it. If you want a rigorous argument, your enemy is really your best friend, because they will point out the weaknesses unerringly. You also have to learn to ask questions yourself. Have any possible factors been left out? Are there gaps in the sample? If so, it doesn’t mean you have to disregard the study, but it does mean you need to find corroborating studies, preferably using a different data set, but at least one that is explicit about the nature of the data, and what kinds of statistical methods were used.