Lower standards and mathematical shenanigans

In the 1st Danish report, patients are derived by increasing inaccuracy. Of course, if I paint a larger bull’s-eye over the target, I will have more “hits.”  Likewise, if I lower the standards for patient selection, I will find more “patients” … because I make more unacknowledged errors.

Both the Regeneron and Danish reports inserted inflated groups into the underlying data, which spiked prevalence results. In the Regeneron example, an anomalous concentration of genuine carriers was inserted into the results. (See their explanation on Ascertainment bias and their inclusion of Lancaster, PA.) With the 1st Danish report, the larger instance of inflation consisted of false positives. How this inflation took place in this Danish report was suspicious.

The basis for the industry’s Authoritative report was purported to be this 1st report.The authors used lowered standards of diagnosis to reach a prevalence of 1:137. This tripled the established prevalence estimate.[1]

It was previously held among the scientific community that FH has a prevalence of 1:500. It is also widely known that in clinical scoring systems mistakes are often made. A passing score only tells a doctor that those above a given cutoff point look like they might be carriers of a mutation. However, many of those with a passing score cannot actually be found to carry a mutation when subjected to genetic testing. 

The standard DLCN clinical scoring system categorizes results as follows:[2]

Classification of HeFH Score
Definite >8
Probable 6-8
Possible 3-5

Damgaard, et al,[3] performed a genetic study of those who scored in the top clinical category and found that roughly 1 in 3 could not be found to have a mutation, 37% of those categorized as “definite HeFH.” That’s with a score of over 8. What happens if our cutoff is 6?

Classification of HeFH Percent which were not found to have a known mutation after molecular testing
Definite 37%
Probable 65%
Possible 78%

As we move down the scoring system to a cutoff of 6, loosening standards, the second lower category showed 2 in whom a mutation was not found for every mutation found. That’s 2 out of 3. As far as a pharmaceutical company would be concerned, a move to scores between 6 and 8 flips the risk-benefit ratio on its head. There are more patients without an identified mutation than there are those with identified mutations. If this data holds and if risk-benefit ratios are of some value, why would I choose to add in the “Probable” category? Why use a cutoff of 6 and not >8? … or at least provide some explanation for opting for lower standards?

And the next lower category, “Possible,” has a failure rate of 78%. The 1st report did not just lump the first two categories together, which, using Damgaard’s results, would then average 50%; the report also took a slice from this lower category with its 78% failure rate and blended it in, declaring the prevalence rate of 1:137.  As previously pointed out, this tripled the established estimate.

  • If we lower the standards for accuracy do we end up with more carriers of LDLR mutations or do we disproportionately inflate the count with false positives?

But let’s look at the way that this slice was included in the results. On page 1, we read: “Main Outcome Measures: FH (definite/probable) was defined as a Dutch Lipid Clinic Network score higher than 5.”  Higher than 5 is 6, and 6 is the traditional number. This is the first reference to 6 as the floor for the “Probable” category.

6 was printed but 5 used off-text

On page 2 of the 1st report, the cutoff point for the “Probable” category was said to be the industry standard, 6.[4] And we see that the category for “Possible” already has a claim to 5 for its ceiling, so the next category up must begin with 6. These are the second and third references to the number 6 as the floor for “Probable.”

The Supplementary Table 1, also puts the cutoff for “Probable” FH at 6. And Possible, again, has a ceiling of 5, leaving the floor for the next category up as 6. These are the fourth and fifth references.

Supplementary table clearly puts the cutoff for the Probable category at 6
Ambiguous listing of the cutoff point for the Probable category

But straddling pages 3 and 4, in the “Results,” this floor is both 5 and >5 in the same sentence. Which one was used for the report? Which one was in error? For a cutoff point, >5 should be 6, and 5, would be 5. But then again, the report combined “Definite” and “Probable” for the result of 1:137 and here the explicit floor for the “combined” categories is >5 … which is 6. This is the sixth reference for 6 as the floor, and this is the first and only reference to 5.  And we note that 5 is only mentioned with the floor for the uncombined “Probable.” This means that every reference for the combined result is to 6.  What would a knowledgeable professional think upon a casual reading of this text? … especially given that “6” is already the standard.

We again add to this the fact that the range for the next lower category, “Possible,” is again explicitly recorded as 3 – 5 points. And to say it again, since the ceiling for “Possible” is 5, the “probable” category cannot also use it without double-counting patients. By process of elimination, “Probable” must be 6. This is the seventh reference to 6.

When I initially read the 1st report, the Corrigendum did not yet exist. I took the single mention of 5 as the typo, and not the other seven references to 6. I am confident that other readers did too. But if this cutoff of 5 was just a typo, then the results would not have actually been derived from the cutoff of 5.  But they were. We later learn, after the Authoritative report cites this 1st report, that the actual cutoff was not the oft-mentioned 6, but the once-mentioned 5. Why? The standard cut-off was 6.

Here is the table from the correction:

The Corrigendum for the first Danish report clarifies what actually happened

They had printed “6” as their DLCN cutoff point, but actually used 5 on the data underlying the printed result. It wasn’t that the “6” was a typo; it’s that the greater part of their labor — crunching the data — used the wrong number: 5. 

Essentially, they put “cutoff 6 = prevalence of 1:137” on the table, while under the table they had arrived at 1:137 by using the cutoff point of 5.  

Was this an accident? Let’s review the facts. In the 1st report there were seven references to 6 being the floor for the “Probable” category. There was only one reference to 5. Why would we take that single reference as correct and the other seven as the typos? Especially when we remember that 6 is the well-known, widely published standard cutoff for “Probable.”  We expect an explanation for deviance from the standard.  The off-text calculation used 5 as the floor, and this is a break from the standard of 6, and so even if the number “6” were typed or implied incorrectly, 7 different times, we would still expect some compensatory references in the general text regarding the novel use of 5.For example, if a mechanic built a non-standard 7-cylinder engine but accidentally painted the expected “V8,” meaning 8 cylinders, on the hood of the finished car, we would still expect some discordant language in his description of the odd 7-cylinder engine he actually built — and who builds a 7-cylinder engine and doesn’t talk about the novelty? Likewise, there was no explanation in the 1st report for having broken with the DLCN standard and for using the nonstandard, unpublished 5 under the printed text of “6.” There were four authors.

Are we to believe that the choice of a non-standard cutoff point, with the key leverage over the mathematical outcome, endured a long, sustained error?  Who could spend the central part of a labor with the wrong number of cylinders? It doesn’t sound right: “The ‘V8’ I painted on the hood of the car was not the mistake. I accidentally built an unusual 7-cylinder engine.” As far as FH is concerned, the difference between 6 and 5 was the difference between doubling and trebling the established prevalence rate.

This is the state of affairs when the Authoritative report cites the 1st report as the source for a doubled FH prevalence.[5] In the Authoritative report, there is no specific mention of using the number 6 in the text while using 5 in the off-text math.

And this is why the results do notmatch up between the Authoritative report and its purported source, this 1st report: the Authoritative report simply cites this 1st report, but prints different results. The criteria used in one is not the criteria used in the other. The only remaining actual source for the Authoritative report, since the corrigendum does not yet exist, is a personal communication in a caption to an illustration with the lead author of the selfsame Authoritative report.[6] It is essentially a self-citation, by word of mouth. An “error” is vaguely acknowledged as a “slight overestimation.” The difference between the new denominator, 223, and the old denominator of 137 is 63%. 

There is however a later apology for having used the cutoff of 5 off-text while predominantly using 6 in the text. The authors issued a “Corrigendum.” In the move from 5 to 6 as the off-text cutoff, they thus took out the slice from the lowest of the three categories used, but still blended in the other two categories.

Corrigendum and apology

So after the caption in the Authoritative report, they wrote again, one year later in this Corrigendum:  “The consequence of this misclassification is a slight overestimation” and that “this doesn’t change the main conclusion of the paper.” And it’s true, the “conclusion” is still inflated with false positives. “Missclassification” sounds as if it were as simple as hitting the wrong key on a keyboard, but the error had been with the execution of the methodology, a sustained effort, not with data entry in a final draft of a report.  This is not insubstantial.  It is the difference between the momentary act of reporting and the sustained and purposeful act of “building.” One could expect a casual slip with the former, not with the latter.

The differences in prevalence estimates are not slight

On the right, I’ve put together a table representing the different mathematical outcomes to the different prevalence rates: (1) the standard minimum of 1:500 estimated by a Nobel Prize winner, (2) the Authors’ corrected 1:223, and (3) the 1st reports’ use of the nonstandard cutoff of 5: 1:137. These differences are not “slight.” The denominator 223 is 63% higher than 137, and nonetheless still doubles the established prevalence estimate of FH.

And what was the reason for lowering the clinical floor of the “Probable” category? Why one number and not the other?  If one lowers the standard, wouldn’t one take it for granted that there will be more errors? … more false positives in the result? Where is the accounting for this?  In shifting from 6 to 5 did the study find new mutation carriers or simply increase the error rate beyond acceptability?  But then even after making this “correction” don’t we still have the same problem? Without a concern for false positives does the mere exercise of blending the lower category in with the highest actually find more mutation carriers or have we simply inflated the result with a disproportionate number of  errors? If 5 wasn’t acceptable, why settle for the nonetheless low standard of 6 and not 8 or 9?  By definition, lowering a standard for accuracy increases errors.

If I lower standards are include more errors
  • Summary: Stepping backwards to view a tomato from 20 yards does not make it a red apple. Likewise, lowering standards for diagnosis does not increase a patient population. It inflates the count with false positives.

[1] This new estimate will be trimmed back to a “double” in the Authoritative report.

[2] Familial hypercholesterolaemia is underdiagnosed and undertreated in the general population: guidance for clinicians to prevent coronary heart disease; Consensus Statement of the European Atherosclerosis Society. Nordestgaard, et al.

[3] The relationship of molecular genetic to clinical diagnosis of familial hypercholesterolemia in a Danish population: Dorte Damgaard, et al.

[4] https://academic.oup.com/jcem/article/97/11/3956/2836467/Familial-Hypercholesterolemia-in-the-Danish?searchresult=1

[5] Although the 1st report actually trebles the rate, the caption in the Authoritative report “corrects” the number, resulting in a double, so to speak.

[6] Regarding these reports, see “Citation Kiting” on this page.