Two new papers in NEJM try to offer alternatives to use of race in creatinine eGFR models.

Background

See the NephJC summary for more details.

Our evolving understanding of labels related to race, ethnicity, and ancestry has resulted in a greater appreciation of the complexities and challenges of using race related data in biomedical research (Bonham et al, JAMA 2019). Race, as a complex social construct, is a composite measure that involves more than just genetics or skin pigmentation (Sen and Wasow, Ann Rev Pol Sci 2016). It includes elements such as diet, social status, power relations, class, wealth, etc. Unfortunately, biomedical research often fails to recognize the variability inherent in data related to race. Many clinical prediction models inappropriately use race solely as a surrogate for genetics, without consideration of the other elements of race affected by health disparities, and risk introducing a source of confounding bias that would go unrecognized even after internal validation. Thus, the clinical prediction tools using race, ethnicity, or ancestry data need to balance fit and internal validation against unmeasured confounders that limit generalization and external validity.

In nephrology, this has led to scrutiny of the CKD-EPI creatinine model used to estimate glomerular filtration rate. This creatinine eGFR formula, published in 2009, makes use of race in addition to age and gender. The presumption was that race as genetics could directly explain variation in serum creatinine, but unfortunately ignores other confounding elements of race (culinary traditions, occupations, socioeconomic status, etc) that can influence serum creatinine levels through effects on dietary intake and physical activity (Baxmann et al, cJASN 2008). As GFR is used for numerous clinical decision thresholds, there are appropriate concerns that the use of race may lead to a biased estimation which exacerbates renal health disparities for Black people. Calls have been made to remove the use of race in the creatine eGFR formula to avoid discrimination. But others have raised concerns that removing race may reduce the accuracy of the equations. Two papers published in the NEJM this month attempt to address these challenges with different approaches.

New Creatinine- and Cystatin C–Based Equations to Estimate GFR without Race

This paper evaluates different eGFR equations without race, by reusing the original 2009 and 2012 developmental dataset for modeling the CKD-EPI equations and validating them against a newer dataset created in 2021.

The names and abbreviations for the various models can get confusing, so it is worth doing a brief review. From the supplement of the paper, we first look at the development and validation of the three original, or “current”, CKD-EPI formulas. The current creatinine equation (eGFRcr) was developed and validated in 2009, followed by the cystatin-C equation (eGFRcys) and the creatinine-cystatin-C equation (eGFRcr-cys) in 2012. The eGFRCys was developed and validated using only age and sex (AS), whereas the eGFRCr uses age, sex, and race (ASR). The combined eGFR-cr-cys also uses age, sex, and race (ASR).

Although the current formulas were developed and validated using datasets from 2009 and 2012 respectively, with the advantage of time the paper now has access to a larger 2021 dataset. The goal of the paper was to compare various approaches attempting to remove race from the CKD-EPI equations, by measuring their accuracy (internal validity) using the 2021 dataset for validation.

The authors did so by proposing two different methods to remove race, resulting in several “new” equations (Table S2). The first method uses existing (or current) equations as developed with age, sex and race, but applies only the non-Black (NB) eGFR estimates to all patients regardless of race. This might seem odd (why develop a model with a race-variable and then ignore it afterwards?), but it is intended to reflect the reality of many current day clinical practices, where eGFR is still estimated based on the original/current eGFRcr equation using age/sex/race (ASR), but clinicians then choose to avoid race by using the non-Black eGFR for all patients (ASR-NB). The second approach was to redevelop the creatinine models with only Age and Sex (AS), and validate them for all patients.

This approach to create new equations was done for both the eGFRcr and the eGFRcr-cys equations. Of note, the original/current eGFRcys equation never used race to begin with, so “new” versions of this equation are not needed.

Comparing the accuracy of the new eGFR equations without race

Given the new equations, the next step requires evaluating the fit and internal validation of these various equations against the new 2021 validation dataset. Using directly measured GFR (mGFR) with iothalamate as the standard, the study provided three benchmarks to evaluate how well eGFR matched mGFR within and between each subgroup of Black and non-Black participants for each equation (Table 3).

The three benchmarks used to evaluate how well eGFR matched measured GFR:

BIAS: Median difference between measured GFR and eGFR
P30: Agreement within 30% of measured GFR
Classification: Percent agreement between measured GFR and eGFR categories

Of note, both new versions of the creatinine (eGFRcr) equation performed rather poorly. The Cystatin C (eGFRcys) equation fit best for the first benchmark (BIAS), whereas the combined Creatinine-Cystatin C (eGFRcr-cys) equation fit slightly better against the second (P30) and third (Correct Classification) benchmarks. The study then does an additional analysis to estimate population prevalence of CKD based on new equations (Table 4, not included). Unsurprisingly, equations that matched poorly with benchmarks for Classification resulted in larger variations for estimating population level CKD.

Limitations

For an article attempting to evaluate the use of race as a variable, there is a perplexing absence of any attempt to define race within the paper. The methods discuss race as a data group, between binary options of Black or white race. The discussion section admits that binary race categories “does not adequately represent the diversity within and among racial groups,” and were not “representative populations.” But the paper does not explore this further. Defining race is not just a teleologic exercise, it is an essential step necessary to address any of the myriad of potential confounders related to elements of race and serum creatinine measures.

This failure also results in the paper ignoring one of the biggest advantages of cystatin C based eGFR models. Namely, Cystatin C is a consistent measure unaffected by diet or activity. Where creatinine has a direct causal pathway for confounding bias by elements of race, cystatin C does not share the same pathways or magnitude of risk. It therefore offers much greater external validity and generalizability.

The authors note in the start of their discussion that the “National kidney disease organizations recommend replacement of current eGFR equations by equations that do not use race and that are accurate, inclusive, and standardized in every laboratory in the United States.” Their suggested solution is to use the new combined eGFR equation with creatinine and Cystatin C, because it had the best benchmarks during internal validation. They offer a convoluted argument that using alternatives such as the Cystatin C equations has less internal validity and may lead to erroneous CKD diagnosis on a population level, which is really more a criticism of how we define CKD (binary categorizing of continuous data) rather than an indictment of any of the models at hand.

The authors also ignore issues with overfitting. Black participants made up 31.5% of the 2009 development data set, 39.7% of the 2012 development data set, and 14.3% of the 2021 validation data set. The authors note that “We had a smaller number of Black participants than non-Black participants in the validation set, so the estimates of accuracy may be less precise in Black persons, and we had an insufficient representation of racial and ethnic groups other than Black and White.” This is problematic, because if the entire argument in favor of the combined creatinine and cystatin C equations is their better fit, but the fit is with an unrepresentative samples with potential confounders related to one of the predictors (creatinine), then in effect we are admitting that the equation may not be a good fit for a truly representative sample.

Summary

Ultimately, the authors are trying to do their best with what is available. Unfortunately, the existing datasets have too many unmeasured confounders to salvage any use of creatinine as a predictor for eGFR. No amount of advanced statistics or internal modeling can solve a problem with external validity that requires better and more comprehensive data. Fortunately, there is an alternative to creatinine. The study shows that Cystatin C provides a robust estimation of GFR. The authors note that it is within the accepted 30% margin of error for clinical use, and the population estimates of CKD burden do not differ dramatically from expected outcomes.

Race, Genetic Ancestry, and Estimating Kidney Function in CKD

The second paper attempts to compare creatinine eGFR equations against cystatin C eGFR equations using the most recently available data from the Chronic Renal Insufficiency Cohort (CRIC) study. The goal was to try and accurately estimate GFR without use of race. In addition to serum creatinine, serum cystatin C, and 24-hour urinary creatinine levels, the methods also included data on body-composition metrics (BMI, bioelectrical impedance, etc) along with reported and calculated daily dietary protein intake. The dataset also includes self reported race and genetic ancestry markers. The conceptual framework for the predictors in relation to GFR are outlined in Figure S2.

Defining Genetic Ancestry

The authors provide a new predictor in genetic ancestry. The genetic ancestry marker provides an estimate for a person's geographic origin by modeling genotype markers against a reference standard (Table S1):

“Genotyping was conducted using the Illumina HumanOmni1-Quad v1.0 microarray. A general admixture model was derived using individuals from the 1000 Genomes Project as the reference data. A cluster size of five was selected based on previous studies and verified by comparing the log-likelihood of candidate models to the CRIC data. The five clusters correspond to the five super-populations in the reference data that include African, American, European, East Asian and South Asian.”

Results of the genetic ancestry distribution (Figure S3) highlight the mix of geographic origins for self identified Black and non-Black individuals in the study dataset.

Ancestry and Creatinine

Ancestry data is reported incrementally per 10% change. Table S6 highlights differences of Black/non-Black race versus African ancestry. An important caveat is required when interpreting ancestry data: We cannot assume geographic ancestry equals genetics, or assume outcomes associated with geographic ancestry are causal. For example, self-reported black race (Table S6) on average weighted 9.32kg greater than non-Black race. This difference may be due to sampling error, or it may reflect true differences noted at the population level. Any such population level differences would be considered a health disparity, because of our understanding that differences are likely due to unmeasured social and physical hazards that negatively affect minorities, rather than any intrinsic differences due to race. Similarly, when we see that weight increases 1.15 kg per 10% higher African ancestry, we should keep in mind this is an observed association and not a causal analysis. Geographic ancestry is subject to the same confounders of health disparities as race, and population level differences should not be assumed as causal. For renal measurements, we should be cautious against population level assumptions that 13% higher creatinine in self-reported Black is caused by genetic elements of race, or 1.6% higher creatinine per 10% higher African ancestry is caused by genetic elements of geographic origin.

Some might find the avoidance of causal conclusions for genetic ancestry contradictory. After all, shouldn't ancestry improve genetic associations related to geographic origins? The answer to this depends in part on the outcome under prediction. We would expect genetic ancestry to improve prediction of sickle cell or G6PD over race, since both are associated with malaria found in specific geography, and ancestry offers less admixture of geographic origin than race. On the other hand, if I looked at predicting lead exposure, we would not expect any difference in prediction between race or ancestry, since the majority of hazards are related to health disparities rather than genetics.

Evaluating the performance of creatinine eGFR models with and without race and ancestry

The paper compares use of race or genetic ancestry in creatinine eGFR models, and finds that they perform equally well (table 2). The median difference between directly measured iothalamate GFR (iGFR) and estimated GFR (eGFR) was greatest when race was removed from the model. Adding race reduced the mean difference. Using ancestry instead of race also reduced the mean difference by a similar amount (the confidence intervals roughly overlap), though the point estimates using race are slightly better than those for ancestry.

Next, the paper evaluates if the creatinine eGFR model could replace race or ancestry with various body composition metrics. Unfortunately, although the metrics were able to improve eGFR prediction, they were not fully able to account for differences noted by race (12.8% -> 8.7% ) or ancestry (1.6 -> 1.1).

Evaluating the performance of Cystatin C eGFR models with and without race and ancestry

Cystatin C eGFR models performed best with age and sex. There was no meaningful improvement with use of race or ancestry (Table 4). Neither race nor African ancestry was independently associated with Cystatin C levels. The eGFR models using cystatin C without race had approximately equal accuracy (internal validity) as those using Creatinine with race (table 2).

Main Conclusion: Use cystatin C

The discussion section of this paper has a mix of some valid and some problematic points, but the authors final conclusion here is direct and fully supported by their data: “the use of serum cystatin C rather than serum creatinine for GFR estimation produced estimates of similar validity while eliminating the negative consequences of race-based approaches.”

Limitations

For the second time, we have an article evaluating the use of race data that fails to offer any definition of race. And, again, this failure leads to a complete absence of any discussion on the many elements of race, or an exploration of potential confounders related to elements of race and serum creatinine measures. For example, in figure S2 the paper outlines a causal diagram of how race, mediated by body composition and diet, affects serum creatinine levels. But if we define elements of race to include socioeconomic factors such as occupation, income, education, etc, then we immediately appreciate the need to consider these elements as confounders (in red below) that could affect our measures of serum creatinine. Given that the CRIC dataset includes information on educational levels, income, smoking status, etc, it seems incredible that the authors wasted an opportunity to adjust for the factors and evaluate any potential influence on serum creatinine levels. Instead, Table 3 is limited to adjusting purely “biologic” metrics.

More importantly, appreciation of these confounders would cause serious pause to our confidence in the external validity of eGFR based on serum creatinine. Instead we have conclusions as highlighted below:

“Although estimation of GFR based on the serum creatinine level can be imprecise at an individual patient level, our data do not support removing the race coefficient from serum creatinine–based GFR estimating equations because this would add systematic misclassification and further degrade the accuracy of GFR estimates, in particular among persons who identify as Black.”

In fairness, the authors are correct that use of race does improve accuracy, or internal validity, of the creatinine eGFR model. But without discussing risks of confounders to external validity, it is unfathomable that the paper would make any conclusions on predictive accuracy for populations or individuals outside the dataset. To make matters worse, this problem is further compounded in the discussion section when comparing ancestry data to race:

“We found that when the serum creatinine level was used to estimate the GFR, incorporation of genetic ancestry provided estimates of GFR similar to those based on race as reported by the participants. One advantage of using genetic ancestry information is avoidance of highlighting of race-based categorization that may exacerbate systemic discrimination in health care. Furthermore, it rids GFR estimation of categorical characterizations of race (“Black” and “non-Black”) that do not reflect ancestry admixture.”

Without any discussion of why race-based categorization may exacerbate systemic discrimation, the paper makes the bold claim that genetic ancestry data is somehow exempt from this problem. Presumably, as noted in the next sentence, race-based categorization is bad because it has ancestry admixture. This is a very unfortunate series of statements, and erroneous for numerous reasons. First, association between ancestry and serum creatinine (or any marker) can not be assumed causal, as ancestry can be subject to the same confounders that affect many elements of race. Second, the paper itself showed that genetic ancestry was not a better predictor of eGFR than race (its literally the first statement in the highlighted paragraph), so it remains unclear why the paper states being “rid” of ancestry admixture offer any value. Third, even if we assume that genetic ancestry was associated with some genetic causal effect on creatinine, we would expect some improvement in prediction when using this information. But as ancestry data offers no improvement in prediction over race, the paper's own data suggests that there is unlikely to be any genetic explanation for variability in creatinine levels.

The paper does backtrack somewhat in following paragraphs, agreeing that “replacing race with genetic ancestry data … may arouse concerns related to cost, privacy, and perpetuation of the incorrect notion that race reflects a specific biologic construct.” The failure to address any of the criticisms above do not take away from the paper's final conclusion that cystatin C remains the best option for eGFR, but it does miss an opportunity to further highlight the limitations with using creatinine. Furthermore, even though this is unaddressed in the paper, the results provided are possibly the strongest evidence yet against the notion that variability in creatinine is due to genetics.

Final Thoughts
Both these studies offer excellent and transparent methods, and the authors deserve credit for their work and analysis. Evaluating models without race is important for providing clinicians with tools that can be used confidently without concerns of exacerbating discrimination or health disparities. Ultimately, nephrology is fortunate they can sidestep the myriad of problems with creatinine measures, and instead use an alternative in Cystatin C. But, addressing data related to race, ancestry, and ethnicity should require a higher standard of editorial and peer review. It seems unfathomable that two papers could both fail to define race and discuss the elements of race affected by health disparities, resulting in muddled and uninformed inferences despite some excellent methods.

So, to recap, creatinine based models are problematic because they are

Developed and validated from unrepresentative populations,
Without appropriate adjustment from confounding elements of race (culinary traditions, occupations, socioeconomic status, etc), and
Lack sufficient external validity for generalization.

Use Cystatin C and avoid Creatinine eGFR models.

Commentary by Raj Mehta, MD, who is board certified in family medicine and clinical informatics. He earned his medical degree from the University of Florida College of Medicine. He is committed to using technology to improve how healthcare is delivered, and is a dedicated clinician and educator.