Thursday, February 7, 2008

Scientists and citations

A few years ago, some less than scrupulous students got caught cheating at the university I attended. It was a multiple choice test, the kind Bush and the proponents of No Child Left Behind seem to think can measure anything other than the ability to take a multiple choice test...but I digress.
The students got who cheated took the test in the evening section of the class and go mostly wrong answers. The problem was nearly all of their wrong answers were the same. A little more investigating found that they had gotten ahold of the morning test, assumed it was the same as the evening test and used those answers, exposing their cheating.
A paper I ran across recently by M.V. Simkin and V.P. Roychowdhury used similar logic to test if scientists actually read the papers they cite in their own articles, or just read reviews of subjects and cited the articles the reviewer used as if they had read the actual literature. Their method of catching the non-reading scientists is similar to catching the test cheaters... look for the same mistake repeated. M.V. Simkin and V.P. Roychowdhury specifically looked at the errors in the citation of page numbers...a fairly easy mistake to make when manually entering a citation but one that is rarely exactly the same as someone else.
The practice was found to be rampant enough that mistakes have been copied multiple times. Simkin and Roychowdhury found one article, popularly cited but poorly read, was miscited a total of 78 times. Using the distribution of the repeated miscites they created a model that predicts the percent of people that cite the original paper is as low as 20%.
There are several caveats to this hypothesis however, such as the fact that some mistakes are more common than others, such as inverting numbers etc. These mistakes could be made more than once and that by different people and would be counted as a copy mistake.
This paper came out in 2002. Nowadays, with the used of bibliography programs like Endnote, which automatically download the correct citations from a database, these mistakes are much less common making it would be hard to repeat this test. In any case, read the paper for yourself and don't trust my review. Heres the (correct) citation:

M.V. Simkin and V.P. Roychowdhury, "Read before you cite!", Complex Systems, 14 (2003) 269