LI-RADS-based hepatocellular carcinoma risk mapping using contrast-enhanced MRI and self-configuring deep learning

Stollmayer, Róbert; Güven, Selda; Heidt, Christian Marcel; Schlamp, Kai; Kaposi, Pál Novák; von Stackelberg, Oyunbileg; Kauczor, Hans-Ulrich; Klauss, Miriam; Mayer, Philipp

doi:10.1186/s40644-025-00844-6

Research
Open access
Published: 17 March 2025

LI-RADS-based hepatocellular carcinoma risk mapping using contrast-enhanced MRI and self-configuring deep learning

Róbert Stollmayer^1,2,
Selda Güven³,
Christian Marcel Heidt¹,
Kai Schlamp⁴,
Pál Novák Kaposi²,
Oyunbileg von Stackelberg¹,
Hans-Ulrich Kauczor^1,5,
Miriam Klauss^1,5 &
…
Philipp Mayer^1,5

Cancer Imaging volume 25, Article number: 36 (2025) Cite this article

844 Accesses
Metrics details

Abstract

Background

Hepatocellular carcinoma (HCC) is often diagnosed using gadoxetate disodium-enhanced magnetic resonance imaging (EOB-MRI). Standardized reporting according to the Liver Imaging Reporting and Data System (LI-RADS) can improve Gd-MRI interpretation but is rather complex and time-consuming. These limitations could potentially be alleviated using recent deep learning-based segmentation and classification methods such as nnU-Net. The study aims to create and evaluate an automatic segmentation model for HCC risk assessment, according to LI-RADS v2018 using nnU-Net.

Methods

For this single-center retrospective study, 602 patients at risk for HCC were included, who had dynamic EOB-MRI examinations between 05/2005 and 09/2022, containing ≥ LR-3 lesion(s). Manual lesion segmentations in semantic segmentation masks as LR-3, LR-4, LR-5 or LR-M served as ground truth. A set of U-Net models with 14 input channels was trained using the nnU-Net framework for automatic segmentation. Lesion detection, LI-RADS classification, and instance segmentation metrics were calculated by post-processing the semantic segmentation outputs of the final model ensemble. For the external evaluation, a modified version of the LiverHccSeg dataset was used.

Results

The final training/internal test/external test cohorts included 383/219/16 patients. In the three cohorts, LI-RADS lesions (≥ LR-3 and LR-M) ≥ 10 mm were detected with sensitivities of 0.41–0.85/0.40–0.90/0.83 (LR-5: 0.85/0.90/0.83) and positive predictive values of 0.70–0.94/0.67–0.88/0.90 (LR-5: 0.94/0.88/0.90). F1 scores for LI-RADS classification of detected lesions ranged between 0.48–0.69/0.47–0.74/0.84 (LR-5: 0.69/0.74/0.84). Median per lesion Sørensen–Dice coefficients were between 0.61–0.74/0.52–0.77/0.84 (LR-5: 0.74/0.77/0.84).

Conclusion

Deep learning-based HCC risk assessment according to LI-RADS can be implemented as automatically generated tumor risk maps using out-of-the-box image segmentation tools with high detection performance for LR-5 lesions. Before translation into clinical practice, further improvements in automatic LI-RADS classification, for example through large multi-center studies, would be desirable.

Background

Hepatocellular carcinoma (HCC) is a leading cause of cancer-related death worldwide and early detection is pivotal. According to recent guidelines [1,2,3,4], the characteristic appearance of HCC with radiological imaging is sufficient for its diagnosis without the need for biopsy in patients who are at high risk for HCC and when there is curative intent. The characteristic vascular pattern of HCC with marked enhancement in the arterial phase and washout appearance in the later phases can be observed in dynamic contrast-enhanced (DCE) imaging studies, among which magnetic resonance imaging (MRI) has the highest sensitivity and specificity [5]. The use of hepatocyte-specific agents, such as gadoxetate disodium, further increases the per-lesion sensitivity of DCE-MRI, particularly for small HCCs [5], and may be useful in the prediction of histopathological features such as microvascular invasion [6].

Non-standardized imaging protocols, image interpretation, and reporting can lead to inadequate assessment of liver lesions and inaccurate communication of HCC risk [7]. To reduce inconsistencies standardized guidelines have been proposed [8]. The most popular system is the Liver Imaging Reporting And Data System (LI-RADS) [8].

Although standardized HCC imaging and diagnostics according to LI-RADS is now widely implemented in academic centers [8], its adoption by non-academic radiologists is lagging, partly due to its complexity [9]. Novice users and users in high-volume private practice may struggle with its use [10].

Deep learning (DL) methods could provide a solution in the form of automated tools for lesion detection, segmentation, and characterization [11]. Out-of-the-box DL frameworks, such as nnU-Net, lift model development workload, while providing state-of-the-art segmentation results [12]. This enables further development of tools for automated segmentation of anatomical structures and pathologies [13], which can be used for HCC diagnostics [11]. However, most previous segmentation-based studies have either only evaluated histologically confirmed HCC cases or proposed complex multistep pipelines which hinders the utility of these methods.

The current study aims to evaluate a simple, yet realistic approach, where the available scans of DCE-MRI examinations are automatically converted via nnU-Net into tumor risk maps, which can be used as an assistance tool for reporting, disease burden quantification, large-scale data annotation, and analysis, or as an always-available standardized reference of reporting quality in the clinical routine.

Methods

Patients

This study was approved by the Institutional Review Board and was performed in accordance with the 1964 Helsinki Declaration and its later amendments, informed consent was waived.

In the current retrospective single-center study, patients were identified via semi-automatic report search and filtering within the clinics Radiology Information System. The search included MRI examinations performed on patients at risk of developing HCC (reports mentioning cirrhosis, hepatitis B infection, or current or prior HCC) between February 1994 and September 2022 (Fig. 1). The resulting examinations were filtered further to include DCE-MRI examinations performed with gadoxetate disodium (EOB-MRI, Primovist®, Bayer Vital GmBH). Of these filtered examinations, one examination per patient (≥ 18 years old) containing the highest number of lesions, potentially categorizable as LR-3 or above according to LI-RADS v2018, was included. Patient exclusion criteria were cirrhosis due to congenital hepatic fibrosis or cirrhosis due to vascular causes.

MRI exclusion criteria were examinations not containing lesions ≥ LR-3 or LR-M; diffuse or multifocal HCC if approximate assessment of tumor margins not possible; unavailability of any of late arterial (AP), portal venous (PVP), hepatobiliary (HBP), or pre-contrast T1-weighted (NCE) phase; severe artifacts on AP. Cases with missing or noisy images of any other MRI sequence type described in the LI-RADS imaging protocol were not excluded.

MRI examinations

MRI examinations were performed using five different MRI scanners. Examinations used for training were acquired using two scanners. The patients in the test cohort were scanned with three different scanners (Table 1). MRI parameters are listed in Table 2.

Table 1 Patient, scanner, and lesion characteristics (all segmented areas) in the two cohorts

Full size table

Table 2 Magnetic resonance imaging parameters of the two cohorts per U-Net input channel

Full size table

Manual image segmentation

The filtered examinations were pseudonymized and exported from the Pictures Archiving and Communication System via ADIT (https://github.com/openradx/adit). Exported examinations were converted to NIfTI format, and co-registered to NCE scans using the 3D Slicer Elastix module [14]. 3D Slicer v5.1.0 [15] was used for manual image segmentation. Manual segmentation was performed by a radiology trainee (514 cases) with 3 years of experience in liver MRI analysis and a board-certified junior radiologist (88 cases) with 5 years of experience in abdominal imaging. Segmentations were proofread by a board-certified radiologist with 11 years of experience in abdominal imaging. All clinical information was available to the observers.

Segmentation was performed based on the co-registered images for each examination by marking lesions in a single semantic segmentation mask according to LI-RADS v2018. Lesions were manually classified as LR-3, LR-4, LR-5, or LR-M based on the co-registered contrast-enhanced scans by also considering ancillary LI-RADS features in all MRI scans. Subtraction images were available for lesion classification as they were reported to improve detection of AP hyperenhancement in EOB-MRI [16]. The major feature threshold growth was not used for categorization. Contrary to the original LI-RADS recommendations, pathologically proven tumors were also classified solely according to their MRI appearance. Observers were instructed to perform the manual segmentation of the lesions on the NCE or any of the DCE images, (while also taking into account lesion appearance on other MRI sequences) depending on which phase showed the clearest and most accurate lesion margins and least anatomic distortion [17]. As recommended in the LI-RADS manual (for size measurements), AP images were only used for segmentation if the lesion margins were not clearly visible on any other phase to avoid size overestimation due to corona enhancement or perilesional enhancement [17]. Lesions with enhancing capsules were commonly segmented on the portal venous phase which was reported to be the most accurate phase for detection of a capsule in EOB-MRI [18]. LR-3 were often segmented on the AP (non-rim AP hyperenhancement is the only major LI-RADS feature with prevalence ≥ 50% in LR-3 lesions) or HPB phase (HPB hypointensity occurs in ~ 20% of LR-3 lesions) [19].

A publicly available liver segmentation model [20] was used to create whole liver segmentations on the co-registered AP images to identify erroneous segmentations of lesions outside the liver boundaries and improve segmentation quality. The segmentations from the publicly available model were used as ground truth for the liver class during training.

The etiology of the chronic liver disease (e.g. alcohol, chronic virus hepatitis) is known to influence liver size, shape, and texture [21]. To generate a widely applicable model, cases with different etiologies of the chronic liver disease were pooled.

Model development

The final cohort was split into two datasets according to the corresponding subdepartment, where the scans were acquired. The larger dataset was used for training with nnU-Net, while the smaller dataset was used for internal testing. Images within each examination were split among 14 groups, each assigned to a U-Net input channel (Table 2). Missing images within one examination were replaced by images consisting of only zero values. Model creation, training, planning, and data preprocessing for training, such as augmentation were set by the nnU-Net pipeline without modification. Configurations along with learning curves are available as supplementary materials (Additional files 1–7).

External validation

An external evaluation was performed on the LiverHccSeg dataset [22]. Delayed phase (DEL) images were used instead of transitional phase (TRA) and HBP images when an extracellular contrast agent was used. Tumors were re-categorized according to LI-RADS v2018 based on the available NIfTI files, and all DICOM series fitting to an input channel were included. Examinations with inadequate image quality were excluded.

Statistical evaluation

To allow for a more accurate interpretation of the results, model performance was evaluated in segmentation (semantic and instance segmentation), lesion detection, and the LI-RADS classification of detected lesions. A simplified flowchart of how each of these tasks is evaluated based on the semantic segmentation masks that the U-Net creates is shown in Fig. 2.

Segmentation

Sørensen–Dice coefficients (DSC) and concordance correlation coefficients (CCC) were calculated to measure segmentation quality and volume agreement. DSC was calculated on an examination and lesion level. Examination level DSC measured the spatial overlap between predicted segmentations and ground truth segmentations in one examination, calculated for cases where a manually marked lesion was present in the given LI-RADS category. Lesion level DSC measured the spatial overlap between predicted segmentations and ground truth segmentations for one certain LI-RADS category of the ground truth lesions, not taking into account the LI-RADS category of the predicted lesions. DSCs are reported as median (lower, upper quartile), CCCs as CCC value (lower, upper bound of 95% confidence interval).

Confusion matrices

LI-RADS categories between ground truth lesions and predicted segmentations were automatically compared. The predicted LI-RADs category were determined using the following rules. If predicted segmentations with more than one LI-RADS category overlapped with one ground truth lesion, the predicted segmentation which showed the largest overlap with the ground truth lesion determined the predicted LI-RADS category. A predicted cluster of voxels that does not overlap with a ground truth segmentation was considered a false-positive finding. If this cluster contained different voxels with more than one LI-RADS category, the largest portion of voxels assigned to one category determined the predicted LI-RADS category for this false-positive finding.

Based on these results confusion matrices are created (Fig. 2).

Detection

Sensitivity in the context of lesion detection refers to the portion of ground truth lesions with a certain LI-RADS category that overlapped with predicted lesions with any LI-RADS category, compared to all ground truth lesions with this certain LI-RADS category.

Positive predictive value (PPV) in the context of lesion detection refers to the portion of predicted lesions with a certain LI-RADS category that overlapped with any ground truth lesion irrespective of the ground truth LI-RADS category.

Classification

Classification metrics are calculated for ground truth lesions that were segmented by nnU-Net (predicted lesions). Sensitivity, specificity, negative and positive predictive values (NPV), F1 score, and Cohen’s kappa values are derived from the created confusion matrices (Fig. 2) along with bootstrapped confidence intervals (lower, upper bound of 95% confidence interval).

To assess the contribution of each input channel, the same evaluation process is repeated for each input by replacing the respective image with an image containing only zero values.

All lesion level metrics are calculated for lesions ≥ 10 mm. For additional information see the supplementary materials (Additional file 1.docx) [23,24,25].

Results

Study population

Out of 4275 patients identified, 602 were included in the analysis. Included examinations were performed between May 2005 and September 2022. The flowchart of inclusion and exclusion steps is shown in Fig. 1.

Patient, scanner, and lesion characteristics are described in Table 1. 1657 and 874 marked areas were automatically identified from the manual semantic segmentations in the training and test datasets, of which 416 and 220 were marked as LR-5. Stratification of lesions based on their largest axial diameter is shown in Figs. 3 and 4. Most patients had less severe (CHILD-A) cirrhosis, while all Child–Pugh score categories were present in both groups, as well as patients without cirrhosis. The summary of MRI parameters of the scans used is available in Table 2.

Semantic segmentation

For liver segmentation, in the internal test dataset, median DSC of 0.96 (0.92, 0.97) of the predicted segmentations compared to the segmentations from the public model and of 0.99 (0.98, 1.00) compared to the manually corrected outputs were calculated. In the training dataset, the median DSC between the predicted segmentations and the segmentations from the public model was 0.97 (0.95, 0.97).

For liver volume estimation, in the internal test dataset, a CCC of 0.73 (0.51, 0.85) was calculated between model predictions compared to liver segmentations acquired from the public model, and a CCC of 0.98 (0.96, 0.99) was achieved compared to manually corrected segmentations. In the training dataset, CCC for liver volume estimation was 0.85 (0.76, 0.90). The liver segmentations were not corrected manually due to the large number of cases in both datasets.

For liver lesion semantic segmentation, the highest overlap between ground truth and predicted segmentations was for LR-5 in the training and internal test cohorts (DSC_training = 0.72, DSC_test = 0.76) with CCC values of 0.86 and 0.94. In both cohorts, DSC and CCC values were markedly lower in the LR-3, LR-4, and LR-M categories (DSC ≤ 0.07, CCC ≤ 0.35).

Segmentation and volumetry metrics are presented in detail in Table 3 and Fig. 5.

Table 3 Semantic and instance segmentation metrics

Full size table

Instance segmentation

Lesions level median DSCs ranged between 0.61–0.74 in training and 0.52–0.77 in the internal test cohort. CCCs between the predicted and ground truth volume of lesions ranged between 0.28–0.91 for lesions detected in the training cohort, and accordingly 0.05–0.93 in the internal test cohort. In both cohorts, DSC and CCC were highest for LR-5. Segmentation and volumetry results are presented in detail in Table 3 and Fig. 5.

Lesion detection

The sensitivity in detection was highest for lesions manually segmented as LR-5 in the training and internal test datasets (sensitivity_training = 0.85, sensitivity_test = 0.90) and lowest for LR-3 (sensitivity_training = 0.41, sensitivity_test = 0.40). PPV was highest among lesions segmented by nnU-Net as LR-5 (PPV_training = 0.94, PPV_test = 0.88) and lowest among LR-3 (PPV_training = 0.70) and LR-M (PPV_test = 0.67). Ground truth lesions below 10 mm were almost never predicted by nnU-Net. Lesions detection metrics are listed in Table 4.

Table 4 Lesion detection and LI-RADS classification metrics

Full size table

LI-RADS classification of detected lesions

When comparing the LI-RADS category of the manually segmented ground truth lesions ≥ 10 mm and corresponding predicted lesions from the nnU-Net, sensitivity, and F1 values were highest for LR-5 lesions (sensitivity_training = 0.75, sensitivity_test = 0.80, F1_training = 0.69, F1_test = 0.74), while for other LI-RADS categories, the values ranged between 0.50–0.66 in the two cohorts. Specificity and NPV were high for all LI-RADS categories (Specificity ≥ 0.78, NPV ≥ 0.76) and highest for LR-M lesions (specificity_training = 0.97, specificity_test = 0.97, NPV_training = 0.95, NPV_test = 0.96). Kappa values were highest for LR-M and LR-5 lesions (κ_training = 0.62, κ_test = 0.56). Larger LR-5 lesions were more often categorized accurately and mislabeled predicted lesions were most frequently misclassified as the neighboring LI-RADS category (see Fig. 3). Classification metrics are listed in Table 4. Confusion matrices of the training and internal test datasets with example lesions from the internal test dataset with corresponding segmentations are shown in Fig. 3.

Occlusion sensitivity analysis

In the occlusion sensitivity analysis, we evaluated the contribution of each input channel by replacing each channel, one at a time, with an image of all zeros (Fig. 6). We then calculated the percent change within each metric for that given channel. The three most important inputs contributing to lesion detection (% change of sensitivity) were HBP (-85.3%), AP, NCE for LR-3; AP (-77.4%), HBP, NCE for LR-4; AP (-42.1%), HBP, PVP for LR-5 and AP (-48.5%), HBP, PVP for LR-M.

Lesion segmentation quality (instance segmentation DSC) was reduced by omission of HBP (-37.5), AP and NCE for LR-3; AP (-35.3%), PVP and HBP for LR-4; AP (-16,8%), HBP (-16.4%) and NCE for LR-5; and NCE, AP, T2H, HBP, PVP (ranging between -7.6% and -4.7%) for LR-M (the removal of the majority of the image groups increased DSC for LR-M lesions).

Based on the percentage change of F1 scores, HBP (-60.5%, -55.9%) and AP (-20.3%, -25.4%) had the highest impact on LR-3 and LR-4 lesion classification, followed by NCE for LR-3 and TRA for LR-4. The most influential group for LR-5 classification was AP (-14.1%), other groups showed minor contributions or increased the F1 score, which is possibly due to the reduction in the detection of LR-3 and LR-4 lesions. For LR-M, the most impactful groups were AP (-49.8%), PVP and NCE. The ranked changes for each metric are shown in Fig. 7. An example case is shown in Fig. 6.

External validation

One examination from the external test dataset was excluded due to inadequate image quality. Almost all lesions in the external cohort were categorized as LR-5. Sensitivity and PPV in lesion detection were 0.83 and 0.90, respectively. The F1 score for LI-RADS classification of predicted LR-5 lesions was 0.84. Per lesion, the median DSC was 0.84 (0.65, 0.87). Detailed results are shown in Tables 3 and 4 and Fig. 4.

Discussion

In the present study, an automatic DCE-MRI segmentation model for hepatocellular carcinoma (HCC) risk assessment was developed using nnU-Net. The model showed moderate agreement in the classification of LR-5 lesions compared to a gold standard expert read and excellent agreement in LR-5 volume prediction. Whole liver segmentation allowed for the exclusion of erroneously segmented lesions outside the liver boundaries. For this, the initial segmentations of a pre-trained liver segmentation model could be improved by further training with nnU-Net by including more images per examination. Co-registration of images made segmentations transferable to all included MRI sequences. By occluding the images, the contribution of each image group to the final lesion segmentation and classification was measured. The results from our segmentation model were validated using an external dataset composed of MRIs with extracellular and hepatocyte-specific contrast agents.

DL-based algorithms such as the one from the present study could potentially alleviate some of the limitations of LI-RADS [11]. Although LI-RADS reduced HCC reporting variability compared to non-standardized reporting, it did not eliminate it [7]. Interreader inconsistency is common, can have a strong impact on patient management, and partly be attributed to the complexity of LI-RADS [26]. Standardized LI-RADS assessment can be more time-consuming than narrative reporting [10]. The comparatively good performance of our segmentation model in the detection and segmentation of LR-5 lesions shows that DL-based algorithms could assist in lesion classification, especially for inexperienced radiologists in cases with widespread disease or high-volume reporting [10]. The kappa value of LR-5 lesions from our model versus expert opinion (0.56) was almost equal to the reported kappa of twenty untrained radiologists versus expert opinion (0.57), but lower than the kappa of the same twenty radiologists after a special LI-RADS training (0.77) [27]. In the present study, LR-3, LR-4 and LR-M lesions were more often discordant in detection and classification which is in line with discordances in the assignment of these categories by radiologists in previous studies [26, 27]. Notably, the performance of untrained radiologists for the assignment of LR-4 and LR-M lesions was within the same range as our model’s performance but improved after a special LI-RADS training [27]. The greater variability of the LR-3 and LR-4 categories can be explained by the larger amount of possible imaging feature combinations that can lead to LR-3/LR-4 assignments, especially when also considering ancillary features [28]. In the case of ambiguity in LI-RADS features, tie-breaking rules often lead to the categorization of equivocal lesions as LR-3/LR-4 [28]. Moreover, processes in the background liver parenchyma such as perfusion alterations that are often detected by MRI can be mistakenly diagnosed as LR-3, instead of LR-2 [28]. Disagreement regarding LR-M lesions is partly explainable by the various differential-diagnostic possibilities such as intrahepatic cholangiocarcinoma, hepatocholangiocarcinoma, atypical HCC, metastasis, lymphoma, and multiple benign entities [29].

The satisfactory performance of our model in LI-RADS category assignment coupled with high sensitivity and PPV for lesion detection suggests several potential use cases. It could be used for automated secondary analysis of MRI cases where lesion assessment in the original report was not according to (the newest version of) LI-RADS. The automation of the segmentation enables large-scale analyses for local or multicenter research projects and clinical investigations. Precise measurements of tumor volume facilitate intra- and interindividual comparisons of tumor burden for response assessment. Also, the extraction of radiomics features of liver lesions for the prediction of histopathological features and prognostication is made possible by our segmentation model [30].

Multiple research groups have published machine learning (ML)- and DL-based studies for automated liver lesion segmentation and/or classification in patients at risk for HCC. Several semi-automatic and automatic segmentation and (LI-RADS) classification approaches for MRI have been reported. However, these approaches are limited by either the need for human annotation for segmentation [31], for classification [32, 33], or they were only tested on a small number of unequally distributed lesions per LI-RADS category, with a disproportionate prevalence of LR-5 lesions [34]. Our approach for automated LI-RADS segmentation/detection/classification differs from the above-mentioned studies. Our model is a fully automated end-to-end semantic segmentation model without a separate assessment of individual imaging features in an interim step. This approach, to our knowledge, is unique in the RADS literature. Our approach allows for a separate evaluation of the effect of individual imaging features on LI-RADS category assignment, although not by analysis of the features themselves but by modification of the images that may contain them. Our segmentation model was trained and tested on a large well-characterized radiological dataset, consisting of heterogeneously acquired MRI scans, comprising lesions with differences in size and texture within the same LI-RADS category. We also show that the nnU-Net pipeline scales well to MRI-based tasks that are more complex than most previously reported use cases. As a byproduct of our analyses, we have shown that nnU-Net improves liver segmentation quality when less accurate liver segmentations are provided as ground truth along with additional input images in multiple input channels.

Limitations of our study include the determination of the gold-standard segmentation and LI-RADS classification by only one expert radiologist, the lack of separate evaluation of distinct LI-RADS features, the incomplete implementation of LI-RADS categories (only LR-3–5 and LR-M were marked), the lack of correlation with histopathological diagnosis, the lack of correlation of the (automated) classification results with the etiology of the underlying liver disease (e.g. alcohol, chronic virus hepatitis), and the use of a single type of hepatocyte-specific contrast agent in the internal datasets. Future studies addressing these limitations would be beneficial.

Conclusion

In conclusion, we proposed and evaluated a simplified approach for the DL-based automation of LI-RADS v2018. We showed that self-configuring semantic segmentation pipelines, like nnU-Net, can be used to detect LR-5 lesions with high sensitivity and PPV and directly extract LI-RADS classification results which show moderate agreement, PPV, and specificity compared to expert classification. Such models have a wide range of downstream use cases from research, such as data exploration as demonstrated on an external cohort, to clinical decision support and quality assurance systems.

Data availability

The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.

Abbreviations

ADC:: Apparent diffusion coefficient maps
ADIT:: Automated DICOM Transfer
AP:: Arterial phase contrast-enhanced T1-weighted
BG:: Background
CCC:: Concordance correlation coefficient
CHILD-A, -B, -C:: Child-Turcott-Pugh scores
DCE-MRI:: Dynamic contrast-enhanced magnetic resonance imaging
DL:: Deep learning
DS:: Dataset
DSC:: Sørensen–Dice coefficient
DWI:: Diffusion-weighted imaging
EOB-MRI:: Gadoxetate disodium-enhanced magnetic resonance imaging
Ex:: External test dataset
FN:: False negative
FP:: False positive
GT:: Ground truth
HBP:: Hepatobiliary phase contrast-enhanced T1-weighted
HBV:: Hepatitis B virus
HCC:: Hepatocellular carcinoma
HCV:: Hepatitis C virus
IP:: In-phase T1-weighted
LI-RADS:: Liver Imaging Reporting And Data System
ML:: Machine Learning
MV:: Volume calculated from manual segmentations
NASH:: Nonalcoholic steatohepatitis
NCE:: Non-contrast T1-weighted
NIfTI:: Neuroimaging Informatics Technology Initiative
NPV:: Negative predictive value
OOP:: Out-of-phase T1-weighted
PPV:: Positive predictive value
PV:: Volume calculated from U-Net predicted segmentations
PVP:: Portal venous phase contrast-enhanced T1-weighted
RV:: Reference volume
T2B:: T2-weighted periodically rotated overlapping parallel lines with enhanced reconstruction (PROPELLER—BLADE®, Siemens Healthcare)
T2H:: T2-weighted Half-Fourier Acquisition Single-shot Turbo spin Echo (HASTE)
T2LTE:: T2-weighted image with longer time to echo
TN:: True negative
TP:: True positive
Tr:: Training dataset
TRA:: Transitional phase contrast-enhanced T1-weighted
Ts:: Internal test dataset

References

Singal AG, Llovet JM, Yarchoan M, Mehta N, Heimbach JK, Dawson LA, et al. AASLD Practice Guidance on prevention, diagnosis, and treatment of hepatocellular carcinoma. Hepatology. 2023;78(6):1922–65.
Article PubMed Google Scholar
Omata M, Cheng A-L, Kokudo N, Kudo M, Lee JM, Jia J, et al. Asia-Pacific clinical practice guidelines on the management of hepatocellular carcinoma: a 2017 update. Hepatol Int. 2017;11(4):317–70.
Article PubMed Google Scholar
Ducreux M, Abou-Alfa GK, Bekaii-Saab T, Berlin J, Cervantes A, de Baere T, et al. The management of hepatocellular carcinoma. Current expert opinion and recommendations derived from the 24th ESMO/World Congress on Gastrointestinal Cancer, Barcelona, 2022. ESMO Open. 2023;8(3):101567.
Article CAS PubMed PubMed Central Google Scholar
Sabrina V, Michael B, Jörg A, Peter B, Wolf B, Susanne B, et al. S3-Leitlinie: Diagnostik und Therapie des hepatozellulären Karzinoms. Z Gastroenterol. 2022;60(01):e56–130.
Article PubMed Google Scholar
Lee YJ, Lee JM, Lee JS, Lee HY, Park BH, Kim YH, et al. Hepatocellular Carcinoma: Diagnostic Performance of Multidetector CT and MR Imaging—A Systematic Review and Meta-Analysis. Radiology. 2015;275(1):97–109.
Article PubMed Google Scholar
Lee S, Kim SH, Lee JE, Sinn DH, Park CK. Preoperative gadoxetic acid–enhanced MRI for predicting microvascular invasion in patients with single hepatocellular carcinoma. J Hepatol. 2017;67(3):526–34.
Article CAS PubMed Google Scholar
Corwin MT, Lee AY, Fananapazir G, Loehfelm TW, Sarkar S, Sirlin CB. Nonstandardized Terminology to Describe Focal Liver Lesions in Patients at Risk for Hepatocellular Carcinoma: Implications Regarding Clinical Communication. AJR Am J Roentgenol. 2018;210(1):85–90.
Article PubMed Google Scholar
Marks RM, Masch WR, Chernyak V. LI-RADS: Past, Present, and Future, From the AJR Special Series on Radiology Reporting and Data Systems. AJR Am J Roentgenol. 2021;216(2):295–304.
Article PubMed Google Scholar
Marks RM, Fung A, Cruite I, Blevins K, Lalani T, Horvat N, et al. The adoption of LI-RADS: a survey of non-academic radiologists. Abdom Radiol (NY). 2023;48(8):2514–24.
Article PubMed Google Scholar
Yano M. Invited Commentary: Contextualization of LI-RADS Reporting. Radiographics. 2021;41(5):E151–2.
Article PubMed Google Scholar
Laino ME, Viganò L, Ammirabile A, Lofino L, Generali E, Francone M, et al. The added value of artificial intelligence to LI-RADS categorization: A systematic review. Eur J Radiol. 2022;150:110251.
Article PubMed Google Scholar
Isensee F, Jaeger PF, Kohl SAA, Petersen J, Maier-Hein KH. nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nat Methods. 2021;18(2):203–11.
Article CAS PubMed Google Scholar
Wasserthal J, Breit H-C, Meyer MT, Pradella M, Hinck D, Sauter AW, et al. TotalSegmentator: Robust Segmentation of 104 Anatomic Structures in CT Images. Radiol Artif Intell. 2023;5(5):e230024.
Article PubMed PubMed Central Google Scholar
Klein S, Staring M, Murphy K, Viergever MA, Pluim JPW. elastix: A Toolbox for Intensity-Based Medical Image Registration. IEEE Trans Med Imaging. 2010;29(1):196–205.
Article PubMed Google Scholar
Fedorov A, Beichel R, Kalpathy-Cramer J, Finet J, Fillion-Robin J-C, Pujol S, et al. 3D Slicer as an image computing platform for the Quantitative Imaging Network. Magn Reson Imaging. 2012;30(9):1323–41.
Article PubMed PubMed Central Google Scholar
Choi SH, Kim SY, Lee SS, Shim JH, Byun JH, Baek S, et al. Subtraction Images of Gadoxetic Acid-Enhanced MRI: Effect on the Diagnostic Performance for Focal Hepatic Lesions in Patients at Risk for Hepatocellular Carcinoma. AJR Am J Roentgenol. 2017;209(3):584–91.
Article PubMed Google Scholar
American College of Radiology Committee on LI-RADS®. LI-RADS CT/MRI Manual 2018. Available at https://www.acr.org/Clinical-Resources/Clinical-Tools-and-Reference/Reporting-and-Data-Systems. Accessed 3 Mar 2025.
Kim B, Lee JH, Kim JK, Kim HJ, Kim YB, Lee D. The capsule appearance of hepatocellular carcinoma in gadoxetic acid-enhanced MR imaging. Medicine. 2018;97(25):e11142.
Article PubMed PubMed Central Google Scholar
Zhang Z, Xv H, Du Y, Lv Z, Yang Z. Optimizing LI-RADS: ancillary features screened from LR-3/4 categories can improve the diagnosis of HCC on MRI. BMC Gastroenterol. 2024;24(1):117.
Article PubMed PubMed Central Google Scholar
Gross M, Spektor M, Jaffe A, Kucukkaya AS, Iseke S, Haider SP, et al. Improved performance and consistency of deep learning 3D liver segmentation with heterogeneous cancer stages in magnetic resonance imaging. PLoS ONE. 2021;16(12):e0260630.
Article CAS PubMed PubMed Central Google Scholar
Okazaki H, Ito K, Fujita T, Koike S, Takano K, Matsunaga N. Discrimination of Alcoholic from Virus-Induced Cirrhosis on MR Imaging. AJR Am J Roentgenol. 2000;175(6):1677–81.
Article CAS PubMed Google Scholar
Gross M, Arora S, Huber S, Kücükkaya AS, Onofrey JA. LiverHccSeg: A publicly available multiphasic MRI dataset with liver and HCC tumor segmentations and inter-rater agreement analysis. Data Brief. 2023;51:109662.
Article CAS PubMed PubMed Central Google Scholar
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: Machine Learning in Python. J Mach Learn Res. 2011;12(85):2825–30.
Google Scholar
Virtanen P, Gommers R, Oliphant TE, Haberland M, Reddy T, Cournapeau D, et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat Methods. 2020;17(3):261–72.
Article CAS PubMed PubMed Central Google Scholar
Lowekamp BC, Chen DT, Ibáñez L, Blezek D. The Design of SimpleITK. Front Neuroinform. 2013;7:45.
Article PubMed PubMed Central Google Scholar
Yokoo T, Singal AG, Diaz de Leon A, Ananthakrishnan L, Fetzer DT, Pedrosa I, Khatri G. Prevalence and clinical significance of discordant LI-RADS(®) observations on multiphase contrast-enhanced MRI in patients with cirrhosis. Abdom Radiol (NY). 2020;45(1):177–87.
Zhang N, Xu H, Ren AH, Zhang Q, Yang DW, Ba T, et al. Does Training in LI-RADS Version 2018 Improve Readers’ Agreement with the Expert Consensus and Inter-reader Agreement in MRI Interpretation? J Magn Reson Imaging. 2021;54(6):1922–34.
Article PubMed Google Scholar
Chernyak V, Fowler KJ, Kamaya A, Kielar AZ, Elsayes KM, Bashir MR, et al. Liver Imaging Reporting and Data System (LI-RADS) Version 2018: Imaging of Hepatocellular Carcinoma in At-Risk Patients. Radiology. 2018;289(3):816–30.
Article PubMed Google Scholar
Ganesan K, Jalkote S, Nellore S. The Gray Zone: LR3, LR-M, and LR-TIV. J Gastrointestinal Abdominal Radiol. 2023;06(03):185–201.
Article Google Scholar
Nam D, Chapiro J, Paradis V, Seraphin TP, Kather JN. Artificial intelligence in liver diseases: Improving diagnostics, prognostics and response prediction. JHEP Reports. 2022;4(4):100443.
Article PubMed PubMed Central Google Scholar
Kim Y, Furlan A, Borhani AA, Bae KT. Computer-aided diagnosis program for classifying the risk of hepatocellular carcinoma on MR images following liver imaging reporting and data system (LI-RADS). J Magn Reson Imaging. 2018;47(3):710–22.
Article CAS PubMed Google Scholar
Hamm CA, Wang CJ, Savic LJ, Ferrante M, Schobert I, Schlachter T, et al. Deep learning for liver tumor diagnosis part I: development of a convolutional neural network classifier for multi-phasic MRI. Eur Radiol. 2019;29(7):3338–47.
Article PubMed PubMed Central Google Scholar
Wu Y, White GM, Cornelius T, Gowdar I, Ansari MH, Supanich MP, Deng J. Deep learning LI-RADS grading system based on contrast enhanced multiphase MRI for differentiation between LR-3 and LR-4/LR-5 liver tumors. Ann Transl Med. 2020;8(11):701.
Article PubMed PubMed Central Google Scholar
Wang K, Liu Y, Chen H, Yu W, Zhou J, Wang X. Fully automating LI-RADS on MRI with deep learning-guided lesion segmentation, feature characterization, and score inference. Front Oncol. 2023;13:1153241.
Article PubMed PubMed Central Google Scholar

Download references

Acknowledgements

This work was supported by the “Kerpel-Fronius Ödön Tehetséggondozó Tanács” of Semmelweis University and the EFOP-3.6.3-VEKOP-16-2017-00009 project, titled „Az orvos-, egészségtudományi- és gyógyszerészképzés tudományos műhelyeinek fejlesztése”. We thank Carley Stewart, Ph.D. from the Clinic for Diagnostic and Interventional Radiology (DIR), Heidelberg University Hospital, for proofreading the English text of this manuscript.

Funding

Open Access funding enabled and organized by Projekt DEAL. R.S. received funding for the current study from the “Kerpel-Fronius Ödön Tehetséggondozó Tanács” of Semmelweis University and the EFOP-3.6.3-VEKOP-16–2017-00009 project, titled „Az orvos-, egészségtudományi- és gyógyszerészképzés tudományos műhelyeinek fejlesztése”.

Author information

Authors and Affiliations

Clinic for Diagnostic and Interventional Radiology (DIR), Heidelberg University Hospital, Heidelberg, Germany
Róbert Stollmayer, Christian Marcel Heidt, Oyunbileg von Stackelberg, Hans-Ulrich Kauczor, Miriam Klauss & Philipp Mayer
Department of Radiology, Medical Imaging Centre, Semmelweis University, Budapest, Hungary
Róbert Stollmayer & Pál Novák Kaposi
Department of Radiology, Diskapi Yildirim Beyazit Training and Research Hospital, University of Health Sciences, Ankara, Turkey
Selda Güven
Department of Diagnostic and Interventional Radiology With Nuclear Medicine, Thoraxklinik at University of Heidelberg, Heidelberg, Germany
Kai Schlamp
Liver Cancer Center Heidelberg (LCCH), Heidelberg University Hospital, Heidelberg, Germany
Hans-Ulrich Kauczor, Miriam Klauss & Philipp Mayer

Authors

Róbert Stollmayer
View author publications
You can also search for this author inPubMed Google Scholar
Selda Güven
View author publications
You can also search for this author inPubMed Google Scholar
Christian Marcel Heidt
View author publications
You can also search for this author inPubMed Google Scholar
Kai Schlamp
View author publications
You can also search for this author inPubMed Google Scholar
Pál Novák Kaposi
View author publications
You can also search for this author inPubMed Google Scholar
Oyunbileg von Stackelberg
View author publications
You can also search for this author inPubMed Google Scholar
Hans-Ulrich Kauczor
View author publications
You can also search for this author inPubMed Google Scholar
Miriam Klauss
View author publications
You can also search for this author inPubMed Google Scholar
Philipp Mayer
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

RS, SG, and PM conceptualized, designed the study and performed image segmentation; RS collected imaging and clinical data, wrote the software code for the study, and analyzed data; RS, CMH and PM interpreted data, designed and created the figures; RS and PM wrote the manuscript; SG, CMH, KS, PNK, OvS, MK and HUK revised the manuscript; all authors read and approved the final version of the manuscript.

Corresponding author

Correspondence to Róbert Stollmayer.

Ethics declarations

Ethics approval and consent to participate

The study was approved by the Institute Review Board of Heidelberg University Hospital (S-309/2016). Patient consent was waived due to the retrospective nature of the study. All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1. Supplementary Methods.

Additional file 2. Dataset fingerprint file used by nnU-Net.

Additional file 3. Inference information for nnU-Net.

Additional file 4. Inference instructions.

Additional file 5. Plans file generated by nnU-Net.

Additional file 6. Learning curves generated by nnU-Net (2d, 3d_fullres).

Additional file 7. Learning curves generated by nnU-Net (3d_lowres, 3d_cascade_fullres).

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article

Stollmayer, R., Güven, S., Heidt, C.M. et al. LI-RADS-based hepatocellular carcinoma risk mapping using contrast-enhanced MRI and self-configuring deep learning. Cancer Imaging 25, 36 (2025). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s40644-025-00844-6

Download citation

Received: 08 May 2024
Accepted: 20 February 2025
Published: 17 March 2025
DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s40644-025-00844-6

LI-RADS-based hepatocellular carcinoma risk mapping using contrast-enhanced MRI and self-configuring deep learning

Abstract

Background

Methods

Results

Conclusion

Background

Methods

Patients

MRI examinations

Manual image segmentation

Model development

External validation

Statistical evaluation

Segmentation

Confusion matrices

Detection

Classification

Results

Study population

Semantic segmentation

Instance segmentation

Lesion detection

LI-RADS classification of detected lesions

Occlusion sensitivity analysis

External validation

Discussion

Conclusion

Data availability

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher’s Note

Supplementary Information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Cancer Imaging

Contact us