As a scientist, what should I do when I encounter a seemingly fundamental problem that also seems strangely unfamiliar? Is it unfamiliar because I am up to something really new, or am I re-discovering something that has been around for centuries, and I have just missed it?
This is a short story about an exploration that began with such a problem, and led to this manuscript. It began one day when I was pondering the idea of probability:
Scientists often estimate probability (P) of an event, where the event can be a disease transmission, species extinction, volcano eruption, particle detection, you name it. If the P is estimated, there has to be some uncertainty about the estimate -- every estimate is imperfect, otherwise it is not an estimate. I felt that the uncertainty about P has to be bounded: I couldn't imagine a high estimate of P, say P=0.99, associated with high uncertainty about it. The reason is that an increased uncertainty about P means broader distribution of probability density of P, which means that the mean (expected value) of the probability density shifts towards P=0.5. Inversely, as P approaches 0 or 1, the maximum possible uncertainty about P must decrease. Is there a way to calculate the bounds of uncertainty exactly? Have anybody already calculated this?
First, I did some research. Wikipedia: nothing. I hit the books: Jaynes's Probability theory, Johnson et al's Continuous univariate distributions, statistical textbooks, all the usual suspects, nothing. Web of Science: A bunch of seemingly related papers, but no exact hit. ArXiv: More seemingly related semi-legible papers by physicists and statisticians, but not exactly what I was looking for. After some time I put the search on hold.
Then one day, for a completely different reason, I was skimming through John Harte's Maximum Entropy Theory and Ecology, and it struck me. Maybe there is a MaxEnt function that can give the maximum uncertainty about P, given that we only have the value of P and nothing else. Back to the web and, finally, I found something useful by UConn mathematician Keith Conrad: a MaxEnt probability density function on a bounded interval with known mean.
I adjusted the function for my problem, and I started to implement it in R, drafting some documentation on side, and I realized that I have actually started working on a paper!
Then the doubts came. I discussed the problem with my colleagues, but I hadn't learnt much -- many well-intentioned suggestions but no advice hit the core of the problem, which is: Is the idea any good? And anyway, as a wee ecologists with little formal mathematical education, can I actually dare to enter the realm of probability theory where the demi-gods have spoken pure math for centuries?
Then I took the courage and sent my draft to two sharp minds that I admire for the depth of their formal thinking (and for their long beards): Arnost Sizling the mathematician, and Bob O'Hara the biostatistician. For a week nothing happened, then I received two quite different and stimulating opinions, a semi-sceptical review from Arnost, and a semi-optimistic email from Bob. I tried to incorporate their comments, but I am still unsure about the whole thing.
I don't have a clue whether and where such thing should be published. It feels too long and complex for a blog post, yet somehow trivial for a full-length research paper; it is perhaps too simple for a statistical journal, yet too technical for a biological journal.
And so I've decided to pre-print it on PeerJ for everybody to see. I guess that it could be the perfect place for it. If it is really new then it is citable and I can claim my primacy. If it is all bullshit, then people will tell me (hopefully), or they will perhaps collaborate and help me to improve the paper, becoming co-authors. It is an experiment, and I am curious to see what happens next.