Skip to main content

Table 3 Comparison of diagnostic performance between PTs-HDM and 6 radiologists

From: Hierarchical diagnosis of breast phyllodes tumors enabled by deep learning of ultrasound images: a retrospective multi-center study

 

AUC-micro

AUC-macro

Accuracy-macro

Recall-macro

Precision-macro

F1-macro

PTs-HDM

0.856 (0.809, 0.900)

0.842 (0.787, 0.893)

80.9 (75.1, 86.7)

77.8 (69.8, 85.2)

78.0 (70.9, 84.2)

76.6 (69.1, 83.6)

Senior 1

0.827 (0.779, 0.874)

0.709 (0.668, 0.746)

76.8 (70.5, 83.2)

55.8 (50.7, 60.9)

46.7 (41.7, 51.3)

50.7 (45.9, 54.9)

Senior 1+

0.827 (0.779, 0.874)

0.680 (0.637, 0.723)

76.8 (70.5, 83.2)

56.2 (50.8, 62.0) ↑

61.4 (42.3, 82.7) ↑

52.1 (46.0, 59.2) ↑

Senior 2

0.749 (0.692, 0.801)

0.631 (0.578, 0.681)

66.5 (59.0, 72.8)

48.2 (41.5, 54.5)

67.5 (36.0, 76.8)

46.2 (39.1, 54.8)

Senior 2+

0.810 (0.757, 0.857) ↑

0.720 (0.663, 0.777) ↑

74.5 (67.6, 80.3) ↑

59.7 (51.7, 68.4) ↑

65.3 (55.3, 74.9)

61.3 (52.2, 70.2) ↑

Senior Mean

0.788 (0.736, 0.838)

0.670 (0.623, 0.714)

71.7 (64.8, 78.0)

52.0 (46.1, 57.7)

57.1 (38.9, 64.1)

48.5 (42.5, 54.9)

Senior Mean+

0.819 (0.768, 0.866) ↑

0.700 (0.650, 0.750) ↑

74.3 (68.8, 80.2) ↑

56.0 (49.2, 62.8) ↑

54.5 (47.1, 62.0)

54.4 (47.3, 61.3) ↑

Attending 1

0.636 (0.584, 0.692)

0.518 (0.468, 0.573)

51.4 (43.4, 58.4)

36.4 (29.7, 43.1)

36.0 (28.0, 45.0)

34.8 (27.5, 42.6)

Attending 1+

0.725 (0.671, 0.775) ↑

0.635 (0.575, 0.694) ↑

63.1 (55.5, 70.0) ↑

52.0 (44.1, 59.7) ↑

53.2 (44.2, 62.3) ↑

52.3 (43.5, 60.3) ↑

Attending 2

0.780 (0.727, 0.831)

0.720 (0.657, 0.781)

70.5 (63.6, 77.5)

61.3 (52.5, 69.8)

61.6 (52.8, 70.2)

61.2 (53.1, 68.8)

Attending 2+

0.809 (0.762, 0.853) ↑

0.737 (0.679, 0.793) ↑

74.6 (67.6, 80.9) ↑

61.9 (53.0, 70.5) ↑

63.3 (54.3, 71.9) ↑

62.3 (53.7, 71.2) ↑

Attending Mean

0.708 (0.656, 0.762)

0.619 (0.566, 0.675)

61.0 (53.5, 68.0)

48.9 (41.1, 56.5)

48.8 (40.4, 57.6)

48.0 (40.3, 55.7)

Attending Mean+

0.767 (0.717, 0.814) ↑

0.686 (0.627, 0.744) ↑

68.9 (61.6, 80.9) ↑

61.9 (53.0, 70.5) ↑

63.3 (54.3, 71.9) ↑

62.3 (53.7, 71.2) ↑

Resident 1

0.715 (0.662, 0.766)

0.630 (0.575, 0.690)

61.8 (54.3, 69.4)

47.5 (39.5, 55.6)

48.2 (40.3, 56.5)

47.3 (39.6, 54.8)

Resident 1+

0.837 (0.792, 0.883) ↑

0.789 (0.732, 0.842) ↑

78.1 (72.3, 83.8) ↑

69.5 (61.2, 78.2) ↑

72.5 (63.6, 80.6) ↑

69.8 (61.3, 77.8) ↑

Resident 2

0.566 (0.514, 0.623)

0.544 (0.483, 0.601)

42.4 (35.3, 49.1)

39.9 (31.9, 47.2)

37.3 (29.4, 46.1)

34.0 (27.4, 41.2)

Resident 2+

0.770 (0.714, 0.818) ↑

0.753 (0.694, 0.810) ↑

69.5 (63.0, 76.3) ↑

66.3 (57.9, 74.6) ↑

63.7 (55.3, 71.5) ↑

64.0 (55.6, 72.0) ↑

Resident Mean

0.641 (0.588, 0.695)

0.587 (0.529, 0.646)

52.1 (44.8, 59.3)

43.7 (35.7, 51.4)

42.8 (34.9, 51.3)

40.7 (33.5, 48.0)

Resident Mean+

0.804 (0.753, 0.851) ↑

0.771 (0.713, 0.826) ↑

73.8 (67.7, 80.1) ↑

67.9 (59.6, 76.4) ↑

68.1 (59.5, 76.1) ↑

66.9 (58.5, 74.9) ↑

  1. The data in brackets represent the 95% confidence intervals. + indicates with PTs-HDM assistance. The upward arrow (↑) represents indicators that improved owing to AI assistance