注:同时分别提供了有关乳腺癌预防、乳腺癌治疗(成人)、男性乳腺癌治疗以及妊娠期乳腺癌治疗的PDQ总结。
乳腺X线摄影是目前应用最广泛的乳腺癌筛查方法。有证据表明它能降低50-69岁女性的乳腺癌死亡率,但同时伴有潜在危害,如检出无临床意义的不威胁生命的癌症(过度诊断)。对于40至49岁的女性,乳腺X线摄影检查的获益尚不确定。
印度、伊朗和埃及开展的随机试验研究了临床乳房触诊(CBE)在筛查中的作用。其中一些研究显示晚期癌比例有所降低,然而,仍没有充分的证据得出死亡率获益的结论。
现已确认乳房自检不能降低死亡率,而临床乳房触诊(CBE)对发病率和死亡率的影响尚无报道。
超声、磁共振成像和分子乳腺成像等技术正处于评估阶段,这些技术通常作为乳腺X线摄影的辅助手段,而不是一般人群的主要筛查工具。
医疗决策知情正越来越多地应用于癌症筛查受检者。现已对多种不同类型和形式的决策辅助手段进行了研究。(更多相关信息请参阅PDQ总结的“癌症筛查概述”部分。)
50年前启动的随机对照试验(RCT)证明,乳腺X线摄影筛查可降低60-69岁女性(确凿证据)和50-59岁女性(一般证据)的乳腺癌特异性死亡率。最新完成的基于人群的研究就更长时期内接受筛查的人群获益情况提出几点问题。
效应强度:根据对随机对照试验的荟萃分析得出,每预防1例乳腺癌死亡需要邀请接受筛查的女性人数取决于其年龄:对于39-49岁的女性,需要筛查1904人(95%可信区间(CI),929–6378);50-59岁的女性,需要筛查1339人(95%CI,322–7455);60-69岁的女性,需要筛查377人(95%CI,230-1050)。
随机对照试验的荟萃分析显示死亡率的获益,但其效度受到试验完成以来几十年间医学影像和治疗改善情况的限制。加拿大国家乳腺筛查研究(CNBSS)经过25年的随访,
于2014年完成,未显示与乳腺X线检查相关的死亡率获益。
有确凿证据显示,乳腺X线摄影筛查可能导致以下危害:
对于所有这些关于乳腺X线摄影筛查的潜在危害的结论,其内部效度、一致性和外部效度都很高。
CNBSS试验没有开展针对临床乳房触诊(CBE)与不筛查的效果比较。两项在印度和一项在埃及开展的随机对照试验正在评估CBE筛查的效果,但尚未报告死亡率数据。
因此,目前无法评估CBE筛查的效果。
使用CBE筛查可能会导致以下危害:
BSE与未进行筛查相比,在降低乳腺癌死亡率方面无任何获益。
有确凿的证据表明,正规指导并鼓励BSE检查会导致更多的乳腺活检,以及更多良性乳腺病变的诊断。
Note: Separate PDQ summaries on Breast Cancer Prevention, Breast Cancer Treatment (Adult), Male Breast Cancer Treatment, and Breast Cancer Treatment During Pregnancy are also available.
Mammography is the most widely used screening modality for the detection of breast cancer. There is evidence that it decreases breast cancer mortality in women aged 50 to 69 years and that it is associated with harms, including the detection of clinically insignificant cancers that pose no threat to life (overdiagnosis). The benefit of mammography for women aged 40 to 49 years is uncertain.
There are randomized trials in India, Iran, and Egypt that have studied the use of clinical breast examination (CBE) as a screening test. Some of these studies have suggested a shift in late-stage disease; however, there is still insufficient evidence to conclude a mortality benefit.
Breast self-exam has been shown to have no mortality benefit. No results have been published on the outcomes of incidence or mortality for CBE.
Technologies such as ultrasound, magnetic resonance imaging, and molecular breast imaging are being evaluated, usually as adjuncts to mammography, and are not primary screening tools in the average population.
Informed medical decision making is increasingly recommended for individuals who are considering cancer screening. Many different types and formats of decision aids have been studied. (Refer to the PDQ summary on Cancer Screening Overview for more information.)
Randomized controlled trials (RCTs) initiated 50 years ago provide evidence that screening mammography reduces breast cancer–specific mortality for women aged 60 to 69 years (solid evidence) and women aged 50 to 59 years (fair evidence). Population-based studies done more recently raise questions as to the benefits to screened populations who participate in screening for longer time periods.
Magnitude of Effect: Based on a meta-analysis of RCTs, the number of women needed to invite for screening to prevent one breast cancer death depends on the woman’s age: for women aged 39 to 49 years, 1,904 women needed (95% confidence interval [CI], 929–6,378); for women aged 50 to 59 years, 1,339 women needed (95% CI, 322–7,455); and for women aged 60 to 69 years, 377 women needed (95% CI, 230–1,050).
The validity of meta-analyses of RCT demonstrating a mortality benefit is limited by improvements in medical imaging and treatment in the decades since their completion. The 25-year follow-up from the Canadian National Breast Screening Study (CNBSS),
completed in 2014, showed no mortality benefit associated with screening mammograms.
Based on solid evidence, screening mammography may lead to the following harms:
For all of these conclusions regarding potential harms from screening mammography, internal validity, consistency, and external validity are good.
The CNBSS trial did not study the efficacy of CBE versus no screening. Ongoing randomized trials, two in India and one in Egypt, are designed to assess the efficacy of screening CBE but have not reported mortality data.
Thus, the efficacy of screening CBE cannot be assessed yet.
Screening by CBE may lead to the following harms:
BSE has been compared with no screening and has been shown to have no benefit in reducing breast cancer mortality.
There is solid evidence that formal instruction and encouragement to perform BSE leads to more breast biopsies and more diagnoses of benign breast lesions.
乳腺癌是美国女性最常见的非皮肤癌症,预计2019年全美将有268600例乳腺浸润性癌、62930例原位癌和41760例乳腺癌死亡。乳腺癌也是中国女性最常见的癌症,2015年据估计,中国有30.4万例乳腺癌新发病例、7.0万例乳腺癌死亡。
有遗传风险的女性(包括BRCA1和BRCA2基因突变携带者)约占乳腺癌病例的5%-10%。
男性占乳腺癌发病和死亡病例的1%。
乳腺癌的最大危险因素是女性,其次是高龄。其他危险因素包括激素方面(如初潮较早、绝经较晚、未生育、首次妊娠较晚和绝经后接受激素治疗)、饮酒和暴露于电离辐射等。
白人女性的乳腺癌发病率高于黑人女性,但是黑人女性在确诊后各分期的生存率也较低。这可能反映了筛查行为和获得医疗保健的机会的差异。拉美裔和亚太岛民的发病率和死亡率低于白人或黑人。
乳腺癌的发病率取决于生殖因素(如早孕和晚孕、多产和母乳喂养情况)、参与筛查情况和绝经后激素的使用情况。乳腺X线检查方法在美国和英国被广泛应用后,乳腺癌(特别是导管原位癌)的发病率急剧上升。
绝经后激素治疗的广泛使用与乳腺癌发病率急剧增加有关,这种趋势在其使用减少时逆转。
任何人群在进行筛查后,晚期癌症的发病率都不会下降。
相对于在无症状女性中进行的筛查性乳腺X线检查,有乳房症状的女性接受的是诊断性乳腺X线检查。在一项为期10年的针对引起医学关注的乳腺症状研究中,有10.7%的病例是因发现乳腺肿块而被诊断为乳腺癌的,而由于疼痛而被诊断的仅占1.8%。
对活检过程中切除的乳腺组织细胞进行显微镜检查可以诊断乳腺癌。乳腺组织的取样可通过影像学检查或触诊定位。乳腺活检可使用一根细针附在注射器上(细针抽吸)、一根粗针(空芯针活检)或通过切除(切除活检)进行。成像指导可提高精确性。对大到足以进行诊断的异常区域进行针头活检取样。切除活检的目的是摘除整个异常区域。
导管原位癌(DCIS)为非浸润性癌,可与浸润性癌相关或可发展为浸润性癌,其发生率及病程多样。
一些研究者将DCIS纳入浸润性乳腺癌的统计数据中,而其他研究者认为应类比宫颈及前列腺的癌前病变命名,将DCIS重新命名为导管上皮内瘤变会更好,并且应考虑在统计乳腺癌时除外这些DCIS病例。
DCIS最常见的诊断方法是乳腺X线摄影。在美国开展乳腺X线摄影筛查前,1983年全美仅有4,900例女性被诊断为DCIS,而在2019年,预计这一数值将达到62,930人。
加拿大国家乳腺筛查研究Ⅱ对50-59岁的女性进行了评估,发现通过临床乳房触诊(CBE)和乳腺X线摄影筛查的女性DCIS病例比仅通过CBE筛查的女性增加了4倍,乳腺癌死亡率无差别。
(更多信息,请参阅PDQ总结“乳腺癌治疗(成人)”部分。)
我们对DCIS的自然史了解甚少,因为几乎所有的DCIS病例都是通过筛查发现的,并且几乎所有病例都得到了治疗。DCIS治疗后是否发展为乳腺癌取决于病变的病理特征和治疗方法。在一项随机试验中,其DCIS通过肿块切除术进行去除的患者有13.4%在90个月内发展为同侧浸润性乳腺癌,而相比之下,接受肿瘤切除术和放疗的患者只有3.9%发生进展。
在诊断为DCIS并接受治疗的女性中,死于乳腺癌的比例低于与其年龄匹配的一般人群的比例。
这种有利的结果可能反映出疾病的良性性质、治疗的获益或志愿者效应(即接受乳腺癌筛查的女性通常比不接受筛查的女性更健康)。
非典型性是乳腺癌的一个危险因素,发现于4%-10%的乳腺活检中。
非典型性是一种诊断分类,不同病理学家对其认识存在很大差异。
病理学家对乳腺组织的诊断范围包括除外非典型性的良性乳腺疾病、非典型增生、DCIS型乳腺癌和浸润性乳腺癌。在过去的三十年里,由于乳腺X线摄影筛查的广泛应用,非典型增生和DCIS乳腺病变的发病率有所增加,尽管非典型增生通常在乳腺X线摄影成像中不易发现。
对乳腺病变的错误分类可能导致对病变的过度治疗或治疗不足,尤其是在非典型增生和DCIS诊断方面不确定性很大。
这方面最大的研究是B-Path研究,美国115名病理学家每人读取每个病例的一张乳腺活检切片,并将他们的诊断与专家们达成一致的权威诊断进行了比较。
病理学家们的诊断和专家权威诊断在浸润性癌中的总体一致性最高,但在DCIS和非典型增生病例中的一致性明显较低。
由于B-Path研究纳入病例中非典型增生和DCIS的比例高于临床实践,作者应用贝叶斯定理外推,从接受乳腺活检的50-59岁美国女性的角度估计诊断的不定性如何影响准确性。
从美国群体水平估计,92.3%(CI,91.4%–93.1%)的乳腺活检诊断与专家的权威诊断一致,约4.6%(CI,3.9%–5.3%)的初始乳腺活检被过度诊断,3.2%(CI,2.7%–3.6%)被低估。图1显示了每100例乳腺活检的预测结果,包括总结果和各诊断类别结果。
为了解决乳腺组织诊断中的高异质性问题,实验室普遍建立了复核制度。一项对252名参与B-Path研究的乳腺病理学家进行的全国性调查发现,65%的受访者报告称他们有一项实验室政策,要求对最初诊断为浸润性疾病的所有病例进行复核。此外,56%的被调查者还报告了对DCIS的初步诊断需要复核的政策,而36%的被调查者报告了对最初诊断为非典型导管增生的病例的强制性复核政策。
在同一项调查中,病理学家一致认为复核提高了诊断的准确性(96%)。
一项使用B-Path数据的模拟研究评估了12种旨在改善乳腺组织病理学解读的复核策略。
除仅用于浸润性癌症病例的策略外,所有复核策略的准确性都有显著提高。无论病理学家对诊断的信心或经验水平如何,准确性都会提高。虽然复核提高了准确性,但并没有完全消除诊断的不确定性,特别是对具有挑战性的乳腺非典型病例。
携带BRCA1或BRCA2基因突变的女性乳腺癌风险增加,她们可能从筛查中获益。(有关更多信息,请参阅PDQ总结“乳腺癌和妇科癌症遗传学”部分。)
接受伞状野放疗的霍奇金淋巴瘤和非霍奇金淋巴瘤患者从完成治疗后10年开始患乳腺癌的风险增加,并持续终生。因此,尽管乳腺X线检查是提倡的,只是检查的起始年龄可能偏早。
乳腺X线筛查的潜在获益通常在检查后多年才得以显现,而其危害是立竿见影的。因此,预期寿命有限的女性和有基础疾病的患者接受筛查后可能无获益。尽管如此,这些女性中的许多人还是接受了乳腺X线检查。
在一项研究中,约有9%的晚期癌症女性接受了癌症筛查。
对66-79岁女性,乳腺X线筛查的癌症检出率约为1%,但这些检出的癌症大多风险较低。
对老年女性局限性乳腺癌的诊断和治疗是否有益尚无定论。
尚无证据表明40岁以下的一般风险女性进行乳腺X线筛查有益。
约有1%的乳腺癌发生在男性身上。
大多数病例是在评估可感知病变时确诊的,这些病变通常很容易被发现。治疗方法包括手术、放疗、系统性辅助激素治疗或化疗。(有关更多信息,请参阅PDQ总结“男性乳腺癌治疗”部分。)筛查不太可能获益。
Breast cancer is the most common noncutaneous cancer in U.S. women, with an estimated 268,600 cases of invasive disease, 62,930 cases of in situ disease, and 41,760 deaths expected in 2019.
Women with inherited risk, including BRCA1 and BRCA2 gene carriers, comprise approximately 5% to 10% of breast cancer cases.
Males account for 1% of breast cancer cases and breast cancer deaths.
The biggest risk factor for breast cancer is being female followed by advancing age. Other risk factors include hormonal aspects (such as early menarche, late menopause, nulliparity, late first pregnancy, and postmenopausal hormone therapy), alcohol consumption, and exposure to ionizing radiation.
Breast cancer incidence in white women is higher than in black women, who also have a lower survival rate for every stage when diagnosed. This may reflect differences in screening behavior and access to healthcare. Hispanic and Asian-Pacific islanders have lower incidence and mortality than whites or blacks.
Breast cancer incidence depends on reproductive issues (such as early vs. late pregnancy, multiparity, and breastfeeding), participation in screening, and postmenopausal hormone usage. The incidence of breast cancer (especially ductal carcinoma in situ [DCIS]) increased dramatically after mammography was widely adopted in the United States and the United Kingdom.
Widespread use of postmenopausal hormone therapy was associated with a dramatic increase in breast cancer incidence, a trend that reversed when its use decreased.
In any population, the adoption of screening is not followed by a decline in the incidence of advanced-stage cancer.
Women with breast symptoms undergo diagnostic mammography as opposed to screening mammography, which is done in asymptomatic women. In a 10-year study of breast symptoms prompting medical attention, a breast mass led to a cancer diagnosis in 10.7% of cases, whereas pain was associated with cancer in only 1.8% of cases.
Breast cancer can be diagnosed when breast tissue cells removed during a biopsy are studied microscopically. The breast tissue to be sampled can be identified by an abnormality on an imaging study or because it is palpable. Breast biopsies can be performed with a thin needle attached to a syringe (fine-needle aspirate), a larger needle (core biopsy), or by excision (excisional biopsy). Image guidance can improve accuracy. Needle biopsies sample an abnormal area large enough to make a diagnosis. Excisional biopsies aim to remove the entire region of abnormality.
DCIS is a noninvasive condition that can be associated with, or evolve into, invasive cancer, with variable frequency and time course.
Some authors include DCIS with invasive breast cancer statistics, but others argue that it would be better if the term were replaced with ductal intraepithelial neoplasia, similar to the terminology used for cervical and prostate precursor lesions, and that excluding DCIS from breast cancer statistics should be considered.
DCIS is most often diagnosed by mammography. In the United States, only 4,900 women were diagnosed with DCIS in 1983 before the adoption of mammography screening, compared with approximately 62,930 women who are expected to be diagnosed in 2019.
The Canadian National Breast Screening Study-2, which evaluated women aged 50 to 59 years, found a fourfold increase in DCIS cases in women screened by clinical breast examination (CBE) plus mammography compared with those screened by CBE alone, with no difference in breast cancer mortality.
(Refer to the PDQ summary on Breast Cancer Treatment (Adult) for more information.)
The natural history of DCIS is poorly understood because nearly all DCIS cases are detected by screening and nearly all are treated. Development of breast cancer after treatment of DCIS depends on the pathologic characteristics of the lesion and on the treatment. In a randomized trial, 13.4% of women whose DCIS was excised by lumpectomy developed ipsilateral invasive breast cancer within 90 months, compared with 3.9% of those treated by both lumpectomy and radiation.
Among women diagnosed and treated for DCIS, the percentage of women who died of breast cancer is lower than that for the age-matched population at large.
This favorable outcome may reflect the benign nature of the condition, the benefits of treatment, or the volunteer effect (i.e., women who undergo breast cancer screening are generally healthier than those who do not do so).
Atypia, which is a risk factor for breast cancer, is found in 4% to 10% of breast biopsies.
Atypia is a diagnostic classification with considerable variation among practicing pathologists.
The range of pathologists' diagnoses of breast tissue includes benign without atypia, atypia, DCIS, and invasive breast cancer. The incidence of atypia and DCIS breast lesions has increased over the past three decades as a result of widespread mammography screening, although atypia is generally mammographically occult.
Misclassification of breast lesions may contribute to either overtreatment or undertreatment of lesions—with variability especially in the diagnoses of atypia and DCIS.
The largest study on this topic, the B-Path study, involved 115 practicing U.S. pathologists who interpreted a single-breast biopsy slide per case, and it compared their interpretations with an expert consensus-derived reference diagnosis.
While the overall agreement between the individual pathologists’ interpretations and the expert reference diagnoses was highest for invasive carcinoma, there were markedly lower levels of agreement for DCIS and atypia.
As the B-Path study included higher proportions of cases of atypia and DCIS than typically seen in clinical practice, the authors expanded their work by applying Bayes’ theorem to estimate how diagnostic variability affects accuracy from the perspective of a U.S. woman aged 50 to 59 years having a breast biopsy.
At the U.S. population level, it is estimated that 92.3% (confidence interval [CI], 91.4%–93.1%) of breast biopsy diagnoses would be verified by an expert reference consensus diagnosis, with 4.6% (CI, 3.9%–5.3%) of initial breast biopsies estimated to be overinterpreted and 3.2% (CI, 2.7%–3.6%) under interpreted. Figure 1 shows the predicted outcomes per 100 breast biopsies, overall and by diagnostic category.
To address the high rates of discordance in breast tissue diagnosis, laboratory policies that require second opinions are becoming more common. A national survey of 252 breast pathologists participating in the B-Path study found that 65% of respondents reported having a laboratory policy that requires second opinions for all cases initially diagnosed as invasive disease. Additionally, 56% of respondents reported policies that require second opinions for initial diagnoses of DCIS, while 36% of respondents reported mandatory second opinion policies for cases initially diagnosed as atypical ductal hyperplasia.
In this same survey, pathologists overwhelmingly agreed that second opinions improved diagnostic accuracy (96%).
A simulation study that used B-Path study data evaluated 12 strategies for obtaining second opinions to improve interpretation of breast histopathology.
Accuracy improved significantly with all second-opinion strategies, except for the strategy limiting second opinions only to cases of invasive cancer. Accuracy improved regardless of the pathologists’ confidence in their diagnosis or their level of experience. While the second opinions improved accuracy, they did not completely eliminate diagnostic variability, especially in the challenging case of breast atypia.
Women with an increased risk of breast cancer caused by a BRCA1 or BRCA2 genetic mutation might benefit from increased screening. (Refer to the PDQ summary on Genetics of Breast and Gynecologic Cancers for more information.)
Women with Hodgkin and non-Hodgkin lymphoma who were treated with mantle irradiation have an increased risk of breast cancer, starting 10 years after completing therapy and continuing life-long. Therefore, screening mammography has been advocated, even though it may begin at a relatively young age.
The potential benefits of screening mammography occur well after the examination, often many years later, whereas the harms occur immediately. Therefore, women with limited life expectancy and comorbidities who suffer harms may do so without benefit. Nonetheless, many of these women undergo screening mammography.
In one study, approximately 9% of women with advanced cancer underwent cancer screening tests.
Screening mammography may yield cancer diagnoses in approximately 1% of women aged 66 to 79 years, but most of these cancers are low risk.
The question remains whether the diagnosis and treatment of localized breast cancer in elderly women is beneficial.
There is no evidence of benefit in performing screening mammography in average-risk women younger than 40 years.
Approximately 1% of all breast cancers occur in men.
Most cases are diagnosed during the evaluation of palpable lesions, which are generally easy to detect. Treatment consists of surgery, radiation, and systemic adjuvant hormone therapy or chemotherapy. (Refer to the PDQ summary on Male Breast Cancer Treatment for more information.) Screening is unlikely to be beneficial.
乳腺X线检查是利用电离辐射使乳腺组织成像。通过将乳房紧紧地压在两块夹板之间来完成检查,这样可以展开重叠的组织,减少成像所需的辐射量。在美国进行常规筛查时,在内外侧斜位和头尾位处都进行投影成像检查。
这两种体位都包括从乳头到胸肌的乳腺组织。每项标准的双体位筛查的辐射暴露量为4-24mSv。双体位检查的召回率低于单体位检查,因为它们减少了对正常乳房结构重叠引起异常影像的干扰。
双体位检查比单体位检查的间期癌发生率低。
根据1992年美国国会颁布的乳腺X线摄影质量标准法案(MQSA),美国所有开展乳腺X线摄影的机构必须通过美国食品和药品管理局(FDA)认证,以确保技术人员经过标准化的培训,并保证采用低放射剂量的标准化乳腺X线成像技术。
(具体信息参考FDA有关基于MQSA的乳腺X线摄影机构调查、乳腺X线设备评估和放射医师资格要求的网页)。根据1998年MQSA修正案的要求,患者将会获得一份乳腺X线成像结果的书面报告,该报告使用通俗语言书写。
以下乳腺影像报告和数据系统(BI-RADS)分类用于报告乳腺X线摄影结果:
大多数乳腺X线筛查片被解读为阴性或良性(分别为BI-RADS 1类或2类);美国约有10%的女性被召回进行额外的检查评估。
被召回接受额外检查评估的女性所占百分比不仅因每名女性的固有特征而异,还因乳腺X线摄影设备和放射医师而异。
数字乳腺X线摄影的检查费用高于乳腺屏片检查(SFM),但在数据储存及分享方面更为先进。一些试验直接比较了SFM和数字乳腺X线摄影在癌症检出率、灵敏度、特异度和阳性预测值(PPV)方面的表现,在大多数患者组两者的结果接近。
数字乳腺X线摄影筛查试验(DMIST)比较了33个美国中心42760名女性的数字和胶片乳腺影像结果。尽管数字乳腺X线摄影在50岁以下的女性中检测到更多的癌症(数字乳腺X线摄影曲线下面积[AUC]为0.84±0.03;胶片下AUC为0.69±0.05;P=.002),但两者乳腺癌总体检出率没有差异。
DMIST的第二份报告发现,65岁及以上的女性中,胶片X线摄影倾向于比数字X线摄影有更高的AUC。
另一项美国大型队列研究
还发现50岁以下女性的乳腺屏片摄影的灵敏度稍高,特异度相似。
荷兰的一项研究比较了2004-2010年间150万张数字和450万张胶片的乳腺X线筛查结果。数字筛查乳腺X腺摄影的召回率和癌症检出率较高。
一项荟萃分析
纳入10项研究,包括DMIST
及美国队列研究,
在82573名同时接受数字乳腺摄影和乳腺屏片摄影检查的女性中对两种检查进行了比较。在一个随机效应模型中,两种类型的乳腺摄影在癌症检出方面没有统计学显著差异(胶片的AUC为0.92,数字的AUC为0.91)。对于50岁以下的女性,所有研究都发现数字乳腺X线摄影的灵敏度更高,但对于特异性,乳腺屏片摄影有些与数字乳腺X线摄影相同,有些更高。
计算机辅助检测(CAD)系统会突出显示可疑区域,如聚集的微钙化和肿块,
通常灵敏度增加,特异度降低,
以及增加导管原位癌(DCIS)的检出。
多个CAD系统正在使用中。一项基于人群的大规模研究对比了CAD系统引入前后的召回率和乳腺癌检出率,发现两者都没有变化。
另一项大型研究表明,召回率和DCIS检出率有所提高,但浸润性癌症检出率没有提高。
另一项对40-89岁女性的研究,应用大型数据库和数字乳腺X线摄影技术发现,CAD并没有提高灵敏度、特异度或间期癌的检出率,但确实检出了更多的DCIS。
根据监测、流行病学和最终结果(SEER)与医疗保险关联数据库,对2001-2002年和2008-2009年这两个时间段,27万多名65岁及以上女性新型乳腺X线检查方法的使用进行了比较。数字乳腺X线摄影从2%增加到30%,计算机辅助诊断从3%增加到33%,费用从6.6亿美元增加到9.62亿美元。2008年,医保支付的乳腺X线筛查中74%使用了CAD,几乎是2004年的两倍。早期(DCIS、I期)和晚期(IV期)肿瘤的检出率无差异。
断层摄影或三维(3-D)乳腺摄影,如标准2-D乳腺摄影一样,压迫乳房并使用X线创建图像。在不同投影角度获得多个短曝光X射线。使用这种方法比用乳腺X线摄影或超声更容易检出某些癌症。辐射剂量是二维乳腺摄影的两倍。
根据2012-2016年间佛蒙特州8家筛查机构的观察性数据,对86379例数字乳腺断层摄影(DBT)和97378例全野数字化X线摄影(FFDM)筛查检查的结果进行了比较。研究纳入既往没有乳腺癌或隆胸病史,且没有选择退出临床研究项目的女性。通过问卷调查获得人口学和危险因素信息,通过佛蒙特州乳腺癌监测系统获得所有活检的病理学信息。DBT的召回率低于FFDM(7.9% vs. 10.9%;95%CI,0.77-0.85),但活检率和良恶性疾病的检出率没有差异。
无论分期、淋巴结状态和肿瘤大小如何,筛查检出的癌症比其它方式检出的癌症都有更好的预后。
这表明其生物学上的致死性较低(可能增殖较慢,不太可能局部浸润并转移)。这与筛查相关的病程偏倚效应是一致的。也就是说,筛查更容易发现惰性(即生长缓慢的)乳腺癌,而进展更快的癌症会在筛查间期被检出。
芬兰的一项对1983名浸润性乳腺癌患者的10年随访研究表明,肿瘤检出方式是患者预后的独立影响因素。在调整了年龄、淋巴结情况、肿瘤大小等变量后,分析结果显示筛查检出的肿瘤复发风险更小,总体生存率更好。对于非筛查检出的肿瘤患者,尽管他们接受更多的辅助性系统治疗,其死亡风险比仍高达1.90(95%置信区间[CI]为1.15-3.11)。
同样,对三项随机筛查试验(健康保险计划、国家乳腺筛查研究[NBSS]-1和NBSS-2)的分析中,在调整肿瘤分期、淋巴结状态和肿瘤大小等因素后,通过筛查发现的癌症患者预后更好。与筛查出的癌症相比,间期癌和偶发癌的死亡相对危险度(RR)为1.53(95% CI,1.17-2.00);与筛查出的癌症相比,对照组癌症的死亡相对危险度(RR)为1.36(95% CI,1.10-1.68)。
第三项研究比较了5,604例筛查检出乳腺癌的英国女性和1998-2003年间因症状检出的乳腺癌患者的结局。在调整肿瘤大小、淋巴结情况、分级、患者年龄等变量后,研究者发现筛查检出的乳腺癌患者有生存获益。因症状检出的乳腺癌患者生存风险比为0.79(95%CI,0.63-0.99)。
这些研究结果也支持了筛查出一些过度诊断的低危肿瘤的证据。
多项非对照试验和回顾性研究系列显示了乳腺X线摄影诊断早期小乳腺癌的能力,此类乳腺癌有良好的临床进程。
即使筛查并没有延长生命,通过筛查发现的癌症患者也要比非筛查发现的癌症患者生存率高。这一概念可用下面四种统计偏倚解释:
这些偏倚的影响程度不得而知。我们需要一项新的随机对照试验(RCT),以病因特异性死亡率为终点来确定生存获益和过度诊断、领先时间、病程长短和健康志愿者偏倚的影响。这是不可能实现的;将患者随机分为接受筛查组和非筛查组不符合伦理。并且该研究至少需要30年的随访,在此期间,治疗和成像技术的变化将使研究结果无效。因此,决策必须基于现有的随机对照试验,或基于对照组充分且调整混杂因素的生态学研究或队列研究,尽管这些研究都有局限性。(有关更多信息,请参阅PDQ总结“癌症筛查概述”部分。)
美国乳腺癌监测联合会(BCSC)网站上介绍了乳腺X线摄影筛查的性能基准。(有关更多信息,请参阅PDQ总结“癌症筛查概述”部分。)
乳腺X线摄影的灵敏度是指通过乳腺X线摄影筛查检出乳腺癌占所有乳腺癌患者的百分比。灵敏度取决于肿瘤大小、可触及性、激素敏感性、乳腺组织密度、患者年龄、所处的月经周期、图像质量以及放射科医生读片能力。乳腺X线摄影的总体灵敏度约79%,但在年轻女性和乳腺组织致密的女性中,灵敏度较低(请参见BCSC网站)。
灵敏度不同于获益,因为有些可能患有乳腺癌的女性会因为过度诊断带来危害。据美国医师保险协会(PIAA)称,乳腺癌诊断延误和诊断错误是医疗事故诉讼的常见原因。2002-2011年的PIAA数据表明,因乳腺癌索赔总额最大的是诊断错误,平均赔偿额为444,557美元。
乳腺X线摄影的特异度是指实际无乳腺癌的女性通过乳腺X线摄影被判为阴性的百分比。假阳性率是指没有乳腺癌的女性得到乳腺X线摄影阳性结果的可能性。低特异度和高假阳性率会导致不必要的后续检查和处理。因为特异度的分母中包括了所有未患癌症的女性,所以即使是很小比例的假阳性率会导致很大的假阳性例数。因此筛查必须具有高特异度高,即使是95%的特异度对于筛查来说也是相当低的。
间期癌是指在正常筛查和下一次筛查之间诊断出的癌症。研究发现,间期癌多发于50岁以下女性,其组织学为黏液性或小叶性,病理分级高,增殖活性高,乳腺X线特征为相对良性且无钙化。相反,筛查发现的肿瘤通常为管状癌、体积小、分期较早、激素敏感性以及具有DCIS的主要成分。
总体而言,间期癌具有快速生长的特性,
分期晚且预后差。
新斯科舍乳腺筛查项目将漏检定义为上一次筛查结果为假阴性的肿瘤,发生率不到1/1000。该研究结论是,间期癌在40-49岁的女性中发生率约1/1000;50-59岁的女性中,发生率约3/1000。
相反,一项规模更大的试验发现间期癌在40-49岁的女性中更为普遍。在乳腺X线摄影阴性后12个月内出现的间期癌通常是由于乳腺密度所致。在乳腺X线摄影阴性后24个月内出现的间期癌与乳腺致密导致的乳腺X线摄影敏感性低或肿瘤快速生长都有关系。
乳腺X线摄影的准确度受到患者特征的影响,如女性的年龄、乳腺密度、是否初次行乳腺X线摄影以及距离上次乳腺X线摄影检查的时间。年轻女性的灵敏度较低且假阳性率较高。
英国的“百万女性研究”结果显示,绝经后使用激素治疗的50-64岁女性,或者既往乳腺手术史,体重指数低于25,其灵敏度和特异度会降低。
距离上一次乳腺筛查的时间间隔越长,则灵敏度、召回率和癌症检出率越高,但特异度降低。
在月经开始后或激素治疗中断期间安排检查可提高灵敏度。
肥胖可使乳腺X线摄影检查的假阳性率增加20%以上,但灵敏度不变。
致密型乳腺可能会干扰乳腺X线摄影对小肿块的检测,从而降低乳腺X线摄影的灵敏度。
对于所有年龄段的女性,乳腺的高致密度会使检查的灵敏度降低10-29%。
高乳腺密度是一种可遗传的固有特征
或受年龄影响;内源性
及外源性
激素;
选择性雌激素受体调节剂,如他莫昔芬;
及饮食影响。
激素治疗与乳腺密度增加、乳腺X线摄影灵敏度降低和间期癌发病率增加有关。
对于致密型乳房,数字化乳腺X线摄影比乳腺屏片摄影更准确。
美国多数州颁布法律要求乳腺X线摄影机构报告乳腺密度,但指南的不一致性造成了患者和护理人员的困惑和焦虑。
致密型乳腺是正常的乳腺特征。乳腺密度是指乳腺X线摄影成像中致密组织与脂肪组织的比例。
美国放射学会的对乳腺密度的BI-RADS分级如下:
后两类归为致密型乳腺,43%的40-74岁的女性为这一类型。
放射科医生对乳腺密度的分类具有主观性。乳腺密度可能会随着时间的推移而变化。
虽然乳腺密度与乳腺癌风险增加有关,
但密度只是乳腺癌的一个中等风险因素,同时也不意味着乳腺癌死亡风险更高。相对于a类乳腺密度,d类的乳腺癌发病风险增加4倍。
对于乳腺致密女性的筛查,尽管某些研究组建议采用超声或乳腺核磁共振作为补充检查,但没有证据显示这种方法能够降低乳腺癌死亡率。增加这些补充检查的潜在危害是可能产生更多的假阳性,导致额外的影像学检查和乳腺活检,从而使人忧虑并增加成本。
补充筛查也可能增加乳腺癌的过度诊断和过度治疗。
乳腺X线摄影更易发现粘液性癌和小叶癌。有时快速生长的肿瘤会被误认为正常乳腺组织(例如,髓样癌是一种罕见的浸润性导管乳腺癌,常与BRCA1突变和侵袭性相关,但疗效相对较好)。
其他可能被漏诊的肿瘤包括与BRCA1/2突变相关的表现为惰性的肿瘤。
放射科医生的技术水平不一,受经验和阅片数量影响。
相比于社区医院的影像科医师,学术型医师在建议患者行进一步的活检时,其阳性预测值(PPV)往往更高
乳腺影像方面的专科训练也可能会增加乳腺癌的检出率
不同检诊机构的水平不一。仅提供筛查的机构的准确性高于同时进行诊断检查的机构。拥有专门的乳腺影像科医师,采用单人阅片而非双人阅片,每年至少2次专业审核的机构具有更高的准确度。
在对医疗事故关注度较高的机构以及服务于弱势妇女群体(如少数族裔、教育程度不高、家庭收入有限或居住在农村的妇女)的机构假阳性率更高。
这些人群可能有较高的癌症患病率并难以随访。
对不同国家的乳腺X线摄影筛查进行对比研究发现,具有高度集中的筛查系统和国家级质量保证项目的国家,其筛查的特异性更高。
美国的筛查召回率是英国的两倍,而二者的癌症检出率没有差别。
初次(第一次)筛查时癌症确诊概率最高,每1000次筛查中可检出9-26例,这一数字因年龄而异。而随访筛查检出乳腺癌的可能性则有所下降,每1000次筛查中可检出1-3例。
目前尚不确定乳腺X线筛查的最佳间隔;尽管筛查方案与筛查间隔存在差异,但筛查试验整体差别不大。英国的一项前瞻性试验纳入年龄分布于50到62岁间女性患者,随机分为两组分别以一年和三年为间隔进行筛查。两组人群的检出结果在肿瘤分期和淋巴结状态方面类似,但在间隔为一年的筛查组检出的肿瘤更小。
一项大型观察性研究发现,对于40多岁的女性,相对于每年一次筛查,每2年一次乳腺癌筛查时诊断为晚期乳腺癌的风险略高(28% vs. 21%;OR,1.35;95% CI,1.01-1.81)。但对于50多岁和60多岁的女性无此差异。
芬兰的一项研究纳入了14,765名40-49岁的女性,随机分配到每年或每三年一次的筛查组。研究发现,三年间隔组共随访100,738人年,18人死于乳腺癌,而一年间隔组共随访88,780人年,也是18人死于乳腺癌(HR,0.88;95% CI,0.59–1.27)。
1963-2015年间全球进行了多项乳腺X线筛查对乳腺癌死亡率影响的随机对照试验,筛查对象包括来自四个国家的50多万名女性。其中一项是加拿大的NBSS-2,它将乳腺X线摄影联合临床乳房触诊(CBE)与单独的CBE进行了比较;另一项是将乳腺X线摄影联合或不联合CBE与常规检查进行了比较。有关试验的详细说明,请参阅本摘要“随机对照试验附录”部分。
这些试验在研究设计、参与者招募、干预措施(包括筛查方法和治疗方法)、对照组管理、对筛查组和对照组分配的依从性以及结果分析方面有所不同。某些试验采用个体随机化,而另一些则采用整群随机化,在整群随机化中确定队列,然后进行筛查;某些试验利用出生日期分组而非随机分配。整群随机化有时会导致干预组和对照组之间出现不平衡。有些试验发现年龄在组间不匹配,尽管年龄这一因素对结果影响不大。
在“爱丁堡试验”中,研究对象的社会经济状况在干预组和对照组之间存在显著差异,而这一因素又与乳腺癌死亡率相关,这就导致研究结果不具解释性。
乳腺癌死亡率是试验的主要结果指标,因此需要仔细区分患者的死因。有些项目采用了盲法监测委员会(纽约)或匹配到与项目无关的其他数据,如全国死亡登记处(瑞典试验),但这样也不能确保筛查组或对照组人群死因的准确性。有人认为,在双郡试验中,乳腺癌死亡可能被错误分类,从而使结果有利于筛查组。
这些试验在方法学上也存在差异。瑞典开展的五项试验中,有四项研究的对照组采用单轮乳腺X线摄影,检查时间与筛查结束时间一致。上述研究采用了评估分析的方法作为初步分析,只统计了在最后一次乳腺X线摄影时或之前发现的乳腺癌死亡数。在一些试验中,由于对照组在研究结束时接受乳腺X线检查的延迟效应,从而导致对照组有更多的时间发生或诊断为乳腺癌。此外,还有试验使用的是随访分析,即不论确诊时间,而将所有死于乳腺癌患者纳入分析。这类分析用于对五个瑞典试验中的四个进行荟萃分析,以便回应评估分析的缺陷。
对数据的国际间审核和验证的可行性不同。只有加拿大的临床试验采取了正式的审核流程。其他试验则采用了更为宽松的审核流程。
上述所有研究的目的是研究乳腺癌死亡率而非全因死亡率。任何特定人群中,乳腺癌死亡只占总死亡的一小部分。当回顾性分析这些试验中的全因死亡率时,只有“爱丁堡试验”显示出由于研究对象的社会经济差异而造成的死亡率不同,瑞典四项试验的汇总分析(随访方法)也显示全因死亡率有所改善。
乳腺癌死亡率的相对降低,约15-20%可归因于筛查,但对于个体的获益相对较少。乳腺癌筛查带来的潜在获益可以认为是由于早期检出乳腺癌而延长了寿命。
RCT反映的是在特定时期内接受定期检查的结果,但实际上,女性一生中要接受筛查的时间可长达20-30年。
使用这些50年前进行的随机对照试验来评估筛查降低乳腺癌死亡率的益处,可能存在许多问题,包括:
因此,筛查降低乳腺癌死亡率的效果评估是基于多方面的研究,包括高质量的队列研究、生态学研究以及随机对照试验。
筛查效果的评估可通过基于筛查人群与未筛查人群的非随机对照研究、基于真实社区的病例-对照研究以及分析筛查对大型人群影响的模型研究。这些研究的设计必须尽可能降低或排除影响乳腺癌死亡率变化的因素(如治疗方法的改进和社区居民对乳腺癌认知的提高)。
瑞典的三项基于人群的观察性研究对比了乳腺X线摄影筛查实施前后,乳腺癌死亡率的差别。其中一项研究在瑞典25个县中的7个开展,主要对比了两个相邻的时间段内上述数值的差别。结果显示实施筛查项目后乳腺癌死亡率较无筛查项目的对照组下降了18%-32%,具有显著的统计学上意义。
这项研究中最重要的偏倚是,在实施筛查的时间段内,这几个县的乳腺癌辅助治疗效果得到了显著提高,但研究作者却没能提到这个因素的变化。第二项研究比较了进行筛查的7个县和未进行筛查的6个县,为期11年的筛查效果。
研究结果倾向于筛查获益,但作者仍然没有考虑辅助治疗的影响或地理位置的差异(城市与农村)可能影响治疗方法。
第三项研究试图详细分析各县的数据来说明治疗的效果。研究发现,筛查影响不大。研究设计和分析中的缺陷弱化了结论。
1975年,在荷兰奈梅亨市开展了一项基于人群的筛查项目。通过病例-队列研究发现,与未接受筛查女性相比,筛查女性的死亡率发生了降低(OR,0.48)。
然而,随后的一项研究将奈梅亨市的乳腺癌死亡率与邻近的没有进行过筛查的荷兰阿纳姆市进行了比较,结果显示两者在乳腺癌死亡率上没有差异。
1983-1998年,一项利用美国医疗系统开展的基于社区的病例-对照研究发现,既往接受过筛查与乳腺癌死亡率降低无关,但该项目的乳腺X线摄影筛查率普遍较低。
一项高质量的生态学研究比较了三对在医疗体系和人口结构上相似的欧洲临近城市,其中一个城市开启的国家性筛查项目早于其它城市。研究者发现每个城市都出现了乳腺癌死亡率的下降,但筛查不能解释配对的组间差异。作者认为,与筛查相比,乳腺癌治疗方式和/或医疗护理机构的发展更有可能是死亡率降低的原因。
2011年3月发表的一篇系统综述总结了多项生态学研究和大型队列研究关于筛查对于50-69岁的女性乳腺癌死亡率的影响。这些研究开展筛查的时期不同。共有 17项研究符合纳入标准,但所有研究均存在方法学问题,包括对照组差异、对不同地区在乳腺癌风险及治疗方面的差异未经调整,以及对比地区对于乳腺癌死亡率的测量方法问题。这些研究的结果差异很大,其中有四项研究发现乳腺癌死亡率相对降低了33%以上(具有较宽的置信区间),而且有五项研究发现乳腺癌死亡率没有降低。由于乳腺癌死亡率总体下降仅有小部分可归因于筛查,因此该综述结论认为:因筛查而导致的乳腺癌死亡率的相对下降可能不超过10%。
1976-2008年,在美国开展的一项生态学分析研究了40岁及以上女性的早期和晚期乳腺癌的发病率。为了评估筛选效果,作者将早期癌症的增幅与晚期癌症的预期降幅进行了比较。在这项研究中,早期癌症的绝对增加率为122/100,000,而晚期癌症的绝对减少率为8/100,000。在对因激素治疗和其它不确定原因引起的发病率变化进行调整后,作者得出结论:(1)筛查对于乳腺癌死亡率的益处很小;(2)22%至31%的确诊乳腺癌为过度诊断;( 3)所观察到的乳腺癌死亡率降低很可能是治疗的改善而不是筛查所致。
目前有一种分析方法已用于近似估计筛查相比治疗对乳腺癌死亡率降低和过度诊断程度的贡献大小。
该发方法使用SEER数据分析了美国40岁及以上女性乳腺癌从引入乳腺X线筛查前至2012年(筛查开始后)的肿瘤大小迁移变化情况。同时假定在此期间有临床意义的乳腺癌诊断率在此期间相对稳定。作者发现:肿瘤较大(≥2 cm)乳腺癌的发生率降低,但相应病死率也在下降。肿瘤较大乳腺癌的低死亡率可归因于治疗改善。其中2/3的大小特异性的病死率下降可归因于治疗改善。
美国的一项基于社区的前瞻性队列研究发现,对于非致密性乳腺的50-74岁女性进行筛查,与每两年一次相比,每年一次的筛查并未降低恶性乳腺癌的检出率。但对于40-49岁的拥有致密型乳腺的女性通过年度筛查,可降低2厘米及以上肿瘤的检出率(OR,2.39; 95%CI,1.34-4.18)。
在加拿大的12项筛查项目中,有7项对40至74岁的女性进行了观察性研究,比较了1990至2009年至少筛查一次(占所有参与者的85%)与从未筛查的参与者(占所有参与者的15%)的乳腺癌死亡率。该摘要报告筛查参与者的乳腺癌平均死亡率为40%。但是,根据讨论部分的语言表述,作者可能想表达的是乳腺癌死亡率降低了40%。
该研究的局限性包括:缺乏全因死亡数据、筛查的强度、研究外接受筛查、研究前接受筛查、用于计算预期死亡率的方法,未参加筛查者的基线乳腺癌死亡参考值、未参加筛查者的生存率、省级人口差异,以及数据库的限制在多大程度上妨碍了年龄和参与者之间其它差异的校正、单个省(不列颠哥伦比亚省)的子研究数据的外推性,以及选择偏倚的潜在影响。总体而言,该研究缺乏以上重要数据,并且在方法学和数据分析方面存在局限性。
建模人员给出了最佳筛查间隔。尽管模型的假设可能不正确,但是当模型的总体结论与随机临床试验结果大体一致时,以及使用该模型用于内推或外推时,该模型的可信度更高。例如,如果模型的输出结果与用于年度筛查的RCT结果相符,该模型被用于比较两年一次和每年一次筛查组的相对有效性时,其可信度也较高。
2000年,美国国家癌症研究所组建了一个建模小组联盟(癌症干预与检测建模网络[CISNET]),用来探讨筛查和辅助治疗分别对美国的乳腺癌死亡率下降所作出的相对贡献。
这些模型预测的乳腺癌死亡率的下降与RCT研究所得出的结果相似;但相比于RCT,建模研究还考虑到改进的辅助性治疗方案。2009年,CISNET的建模人员对与乳腺X线筛查利弊相关的一些问题进行了研究,包括比较每年1次与每两年1次筛查方案的区别。
综合六组建模数据,对于50-74岁间的女性,如果从每年筛查一次改为每两年筛查一次,乳腺癌死亡率的可下降72%-95%(中位数80%)。
目前有限的数据难以确定筛查成像技术的进步和治疗效果的提高对于1990年后死亡率下降的归因比。在一项CISNET对六个模拟模型的研究中,2012年乳腺癌死亡率下降约三分之一的原因是筛查,其余则归因于治疗。
在这项CISNET研究中,如果没有进行筛查或治疗,相对于2012年的基线估计死亡率,预计总体乳腺癌死亡率平均降低幅度为49%(模型区间,39%– 58%)。这一降幅的37%(模型区间,26%–51%)与筛查有关,而63%(模型区间,49%–74%)与治疗有关。
乳腺X线筛查的负面影响包括:过度诊断(临床上无有意义的真阳性)、假阳性(与检查方法的特异度有关)、假阴性(与检查方法的灵敏度有关)、与检查相关的不适感、辐射风险、心理创伤、经济压力和机会成本。
表1概述了对10,000例女性开展乳腺X线筛查,每年一次连续10年的预估利弊。
年龄,y | 在接下来的15年中,通过乳腺X线摄影筛查避免的乳腺癌死亡人数 | 在10年内有≥1次假阳性结果数量(95%置信区间) | 在10年内有≥1次假阳性导致活检数量(95%置信区间) | 在10年内诊断为临床无意义的乳腺癌或原位癌的数量(过度诊断) | |
---|---|---|---|---|---|
40 | 1–16 | 6,130(5,940–6,310) | 700(610–780) | ?–104 | |
50 | 3–32 | 6,130(5,800–6,470) | 940(740–1,150) | 30–137 | |
60 | 5–49 | 4,970(4,780–5,150) | 980(840–1,130) | 64–194 | |
No.=数量; CI =置信区间; DCIS =导管内原位癌。 | |||||
a 改编自佩斯和基廷。 | |||||
b 避免死亡人数来自韦尔奇和帕索。下限代表在乳腺癌死亡率的相对风险为0.95时乳腺癌死亡率的降幅(基于加拿大试验的最低收益),上限代表相对风险为0.64时乳腺癌死亡率的降幅(基于瑞典双郡试验)。 | |||||
c 假阳性和活检估计值,95%的置信区间是Hubbard和Braithwaite等报道的10年累积风险。 | |||||
d 过度诊断的病例数由Welch和Passow计算得出。下限表示根据马尔默试验得出的过度诊断,而上限代表Bleyer和Welch的估计。 | |||||
e Welch和Passow报告的过度诊断的下限估计来自马尔默研究。该研究未招募50岁以下的女性。 |
当筛查检测到的癌症在未筛查的情况下永远不会在临床上变为明显的癌症时,就会发生过度诊断。 过度诊断的严重程度一直存在争议,特别是乳腺导管原位癌,它是一种自然史不明确的癌前病变。由于无法可靠地预测诊断时的肿瘤行为,因此浸润性癌和DCIS的标准治疗可能会导致过度治疗。 相关的危害包括与治疗相关的副作用以及一系列与癌症诊断相关的危害,这些是立竿见影的。 相反,对于降低死亡率的收益,其出现时间未知。
理解过度诊断的一种方法是研究因非癌症死亡女性中隐匿性癌症的患病率。在对七项尸检研究的概述中,隐匿性浸润性乳腺癌的中位患病率为1.3%(区间为0%–1.8%),而乳腺导管原位癌的中位患病率为8.9%(区间为0%–14.7%)。
过度诊断可以通过比较筛查人群和未筛查人群的乳腺癌发生率来间接衡量。其混杂因素包括两组人群的差异,如时间、地区、健康行为和激素使用情况。可能因为领先时间偏倚等混杂因素,不同研究者在校准这些因素的同时对过度诊断的计算也会产生差异。
对29项研究的一项概述发现,过度诊断率为0%–54%,而随机化研究为11%至22%。
在丹麦同时存在筛查和未筛查人群,采用两种不同的方法计算,得出浸润癌的过度诊断率分别为14%和39%。如果包括原位癌,则过度诊断率分别为24%和48%。第二种方法考虑到低于筛查年龄人群的地区差异,可能为更准确。
理论上,在人群中如果发现更多的早期乳腺癌,将导致晚期癌症的发生率下降。但目前为止,尚未在任何研究人群中发现这种情况。因此,更多早期癌症的检出可能意味着过度诊断。在荷兰进行的一项基于人群的研究表明,筛查出的乳腺癌(包括原位癌)中约有一半为过度诊断,与其它研究结果一致,都表明筛查相关的过度诊断率很高。
挪威的一项队列比较了符合纳入筛查条件(年龄和居住地)的女性和不符合纳入条件的年轻女性的癌症发生率。符合纳入条件的女性发生局部癌的风险增加60%(RR,1.60; 95% CI,1.42-1.79),而两组晚期癌症的发病风险相似(RR,1.08; 95% CI,0.86-1.35)。
美国的一项人群研究比较了进行筛查的多个县,研究显示乳腺X线筛查率越高则乳腺癌诊断率越高,但乳腺癌的10年死亡率没有相应降低。
该研究的优势包括规模非常大(1600万女性)以及各县之间结果较为一致。其局限性包括乳线X线检查采用自报、采用2年期估计筛查患病率以及分析的时间段(此时存在更年期激素的普遍使用问题)。
加拿大的NBSS研究估计了过度诊断的程度。NBSS是一项随机临床试验。在五轮筛查结束后,乳腺X线筛查组比对照组多诊断出142例浸润性乳腺癌。
15年后,乳腺X线筛查组比对照组的癌症检出例数多出106,相当于有484例筛查发现的浸润性癌症中,过度诊断率为22%。
乳腺X线筛查的结果发现了更多的惰性乳腺癌,这就可能会导致过度治疗。在他莫昔芬与无系统治疗的早期乳腺癌患者的随机试验的二次分析中,作者采用MammaPrint检测了70种基因,确定了15%的病例为超低风险患者,其20年疾病特异性生存期在他莫昔芬组为97%、在对照组为94%。因此,这些患者仅凭手术就可能有非常好的结果。在筛查人群中此类超低风险癌症的发生率可能约为25%。将来可能利用70基因MammaPrint检测等工具来识别这些低风险癌症,从而降低过度治疗的风险。但是,需要进一步的研究来确认这些研究结果。
2016年,加拿大NBSS项目(一项为期25年的随访随机筛查试验)按年龄分组,重新评估了乳腺X线筛查对乳腺癌的过度诊断,并得出结论:在40-49岁的女性中筛查出的浸润癌约30%为过度诊断,而50-59岁组至多为20%。当包括原位癌时,40至49岁女性过度诊断风险为40%,50-59岁该风险为30%。过度诊断的计算方法是筛查组相对于对照组超额检出且长期存活的乳腺癌例数除以筛查发现的总病例数(超标发病率方法)。 使用这种方法对过度诊断进行充分估计的要求包括:
CNBSS试验基本满足上述条件。因为加拿大在CNBSS结束后的2年,甚至某些地区在筛查后5-10年,才开始进行基于人群的筛查(因此,允许在筛查期后停止筛查,同时相比多数估计的领先时间,可以随访更长时间);同时因为有记录的沾染很小;而且由于良好的个体随机化,两组在44个人口学因素和风险因素的分布几乎完全相同。
1988年筛查试验结束后,筛查质量、强度、受邀年龄范围和活检阈值的差异降低了这些结果的外推性。上述因素以及改善的成像技术/质量和降低的活检阈值,可能导致了对原位癌过度诊断的低估。
上表1显示了对10,000名女性进行10年筛查的结果,估计了乳腺癌患者或DCIS患者不会发展成具有临床意义的肿瘤的数量。健康保险计划研究中可能没有过度诊断,该研究使用了以前的乳腺X线摄影方法和CBE。在乳腺X射线摄影技术改进的时代,过度诊断变得更加突出。但是,尚无证据显示改良技术能够进一步降低死亡率。总之,乳腺癌的过度诊断是一个复杂的话题。不同方法的研究的估计区间较广,目前尚无办法评估新的癌症病例是否被过度诊断或对患者有真正的危害。
因为筛查出的乳腺癌不到5/1000,因此即使乳腺X线摄影的特异性达90%(即所有未患乳腺癌的女性中有90%的乳腺X线摄影结果为阴性),大多数异常的X线摄影报告都是假阳性。
乳腺X线摄影的高假阳性率被低估了,并且由于统计学的认知偏见(称为“基率谬误”)而显得与直觉相反。由于乳腺癌的基线发病率较低(5/1000),因此即使使用非常精确的检测方法,假阳性率也大大超过了真阳性率。
乳腺X射线摄影的真阳性率约为90%,这意味着在患有乳腺癌的女性中,约有90%的女性其检查结果呈阳性。 90%的真阴性率意味着,在未患乳腺癌的女性中,有90%的女性检测结果呈阴性。若1,000人中假阳性率达到10%,则意味着1,000人中将有100个假阳性。如果每1,000名女性中有5名患有乳腺癌,那么4.5名乳腺癌患者将为阳性检查结果。换句话说,每4.5个真阳性大约对应100个假阳性。
此外,乳腺X线摄影筛查的异常结果会提示需要进行等其他的乳腺相关检查,如区域性乳腺X线摄影、超声、磁共振和组织活检(通过细针穿刺、粗针活检或切除活检)。总之,必须权衡早期发现与非必要检查和治疗的利弊。
一项乳腺癌筛查研究纳入了健康管理机构的2,400例女性,在10年间诊断出88例乳腺癌;其中乳腺X线检出58例。在研究期间,1/3乳腺X线摄影结果异常的女性做了进一步的检查,包括539例乳腺X线复查、186例超声检查及188例活检。乳腺X线摄影的累积活检率(真阳性率)约为1/4(23.6%)。该人群中,40-49岁女性的乳腺X线摄影阳性预测值(PPV)为6.3%,50-59岁为6.6%,60-69岁为7.8%。
对该女性队列进行的后续分析和数据建模,结果显示该群体进行第一次乳腺X线摄影时,出现至少一次假阳性结果的风险约为7.4%(95%置信区间[CI]为6.4%-8.5%),而第五次及第九次检查时该风险分别为26.0%(95%CI,24.0%-28.2%)和43.1%(95%CI,36.6%-53.6%)。
至少一次假阳性结果的累计风险取决于四个患者因素(年龄小、既往乳腺活检次数较多、乳腺癌家族史以及当前的雌激素使用情况)和三个放射学因素(较长的筛查间隔,两次乳线X线结果不可比较,以及某些放射科医师倾向于报告异常结果)。总的来说,对于乳腺X线摄影结果出现假阳性,主要是由于放射科医师读片时倾向于报告异常结果造成的。
一项基于社区筛查的前瞻性队列研究发现,无论乳腺密度如何,与每两年筛查一次相比,年度筛查的妇女在10年后至少有一次假阳性筛查的比例更高。对于具有散在纤维腺体密度特征的40多岁女性,每年筛查的假阳性率为68.9%,而每两年筛查为46.3%。对于50-74岁的同类型乳腺密度女性,每年筛查假阳性率为49.8%,而每两年筛查时为30.7%。
如表1所示,每1万名接受每年一次乳腺X线筛查中的女性中,10年至少出现一次假阳性结果人数:40至50岁的为6,130人,60岁为4,970。假阳性导致活检的数量,根据年龄不同,人数估计为700-980。
乳腺X线摄影的灵敏度范围为70-90%,具体取决于放射科医师的技术(经验水平)和受检者特征(年龄、乳腺密度、激素状态和饮食)。假设平均灵敏度为80%,则乳腺X线筛查时会漏诊约20%乳腺癌(假阴性)。这些漏诊的肿瘤许多具有高风险以及不良的生物学特性。如果乳腺X线摄影的阴性结果导致受检者或医生不愿意进一步检查或延迟检查,那么可能导致患者的不利结局。因此,乳腺X线摄影的阴性结果绝对不应该导致受检者或医生不愿意进一步检查。
受检者的体位和乳房受压可减少身体活动引起的伪影并提高影像质量。据报道,90%的女性在接受乳腺X线摄影时感到疼痛和/或不适,其中12%的女性认为这种感觉强烈或无法忍受。
一项系统性评价总结了22项乳腺X线摄影相关的疼痛和不适,结果差异很大,其中一些与月经周期、焦虑和乳腺X线摄影前的预期疼痛有关。
辐射相关性乳腺癌的主要危险因素是接受射线暴露时年龄偏小以及照射剂量。但是,极少的女性会因遗传易感性更易出现电离辐射损伤,甚至避免在任何年龄段的辐射暴露。
40岁以上的女性,进行乳腺X线筛查的利大于弊。
标准的双体位乳腺X线摄影对乳腺的平均辐射剂量为4 mSv,对整个身体的辐射剂量为0.29 mSv。
因此,对于40-80岁女性每年进行乳腺X线摄影,每1000例女性至多检出1例乳腺癌。乳房较大的女性需要增加放射剂量,做过隆胸的需要额外增加观察视野。这两类人的辐射风险会成倍增加。对于从50岁开始每两年筛查一次而不是从40岁开始每年筛查的女性,放射诱发的乳腺癌风险可减少五倍。
一项对308例女性的电话调查发现,这些被调查者均在3个月前接受过乳腺X线筛查,被召回接受额外检查的68位女性中,约有1/4女性即使检查已排除了癌症,仍表现出担心并影响情绪或功能。
关于假阳性结果的心理影响是否长期存在的研究,得出了各种各样的结果。 2002年在西班牙进行的一项队列研究发现,乳腺X线摄影的假阳性结果会对受检者产生即时心理影响,但这种影响在几个月内消失了。
2013年在丹麦进行的一项队列研究测量了假阳性结果对受检者的心理影响,发现存在长期的负面心理后果。
多项研究表明,对假阳性结果的焦虑感会增加未来筛查检查的参与度。
筛查的这些潜在危害尚未得到充分研究,但是很明显它们确实存在。
Mammography utilizes ionizing radiation to image breast tissue. The examination is performed by compressing the breast firmly between two plates, which spreads out overlapping tissues and reduces the amount of radiation needed for the image. For routine screening in the United States, examinations are taken in both mediolateral oblique and craniocaudal projections.
Both views will include breast tissue from the nipple to the pectoral muscle. Radiation exposure is 4 to 24 mSv per standard two-view screening examination. Two-view examinations have a lower recall rate than single-view examinations because they reduce concern about abnormalities caused by superimposition of normal breast structures.
Two-view exams have lower interval cancer rates than single-view exams.
Under the Mammography Quality Standards Act (MQSA) enacted by Congress in 1992, all U.S. facilities that perform mammography must be certified by the U.S. Food and Drug Administration (FDA) to ensure the use of standardized training for personnel and a standardized mammography technique utilizing a low radiation dose.
(Refer to the FDA's web page on Mammography Facility Surveys, Mammography Equipment Evaluations, and Medical Physicist Qualification Requirement under MQSA.) The 1998 MQSA Reauthorization Act requires that patients receive a written lay-language summary of mammography results.
The following Breast Imaging Reporting and Data System (BI-RADS) categories are used for reporting mammographic results:
Most screening mammograms are interpreted as negative or benign (BI-RADS 1 or 2, respectively); about 10% of women in the United States are asked to return for additional evaluation.
The percentage of women asked to return for additional evaluation varies not only by the inherent characteristics of each woman but also by the mammography facility and radiologist.
Digital mammography is more expensive than screen-film mammography (SFM) but is more amenable to data storage and sharing. Performance of both SFM and digital mammography for cancer detection rate, sensitivity, specificity, and positive predictive value (PPV) has been compared directly in several trials, with similar results in most patient groups.
The Digital Mammographic Imaging Screening Trial (DMIST) compared the findings of digital and film mammograms in 42,760 women at 33 U.S. centers. Although digital mammography detected more cancers in women younger than 50 years (area under the curve [AUC] of 0.84 +/- 0.03 for digital; AUC of 0.69 +/- 0.05 for film; P = .002), there was no difference in breast cancer detection overall.
A second DMIST report found a trend toward higher AUC for film mammography than for digital mammography in women aged 65 years and older.
Another large U.S. cohort study
also found slightly better sensitivity for film mammography for women younger than 50 years with similar specificity.
A Dutch study compared the findings of 1.5 million digital versus 4.5 million screen-film screening mammograms performed between 2004 and 2010. A higher recall and cancer detection rate was observed for the digital screens.
A meta-analysis
of 10 studies, including the DMIST
and the U.S. cohort study,
compared digital mammography and film mammography in 82,573 women who underwent both types of the exam. In a random-effects model, there was no statistically significant difference in cancer detection between the two types of mammography (AUC of 0.92 for film and AUC of 0.91 for digital). For women younger than 50 years, all studies found that sensitivity was higher for digital mammography, but specificity was either the same or higher for film mammography.
Computer-aided detection (CAD) systems highlight suspicious regions, such as clustered microcalcifications and masses,
generally increasing sensitivity, decreasing specificity,
and increasing detection of ductal carcinoma in situ (DCIS).
Several CAD systems are in use. One large population-based study that compared recall rates and breast cancer detection rates before and after the introduction of CAD systems, found no change in either rate.
Another large study noted an increase in recall rate and increased DCIS detection but no improvement in invasive cancer detection rate.
Another study, using a large database and digital mammography in women aged 40 to 89 years, found that CAD did not improve sensitivity, specificity, or detection of interval cancers, but it did detect more DCIS.
The use of new screening mammography modalities by more than 270,000 women aged 65 years and older in two time periods, 2001 to 2002 and 2008 to 2009, was examined, relying on a Surveillance, Epidemiology, and End Results (SEER)–Medicare-linked database. Digital mammography increased from 2% to 30%, CAD increased from 3% to 33%, and spending increased from $660 million to $962 million. CAD was used in 74% of screening mammograms paid for by Medicare in 2008, almost twice as many screening mammograms as in 2004. There was no difference in detection rates of early-stage (DCIS or stage I) or late-stage (stage IV) tumors.
Tomosynthesis, or 3-dimensional (3-D) mammography, like standard 2-D mammography, compresses the breast and uses x-rays to create the image. Multiple short-exposure x-rays are obtained at different angles. Some cancers are better seen with this method than on mammography or ultrasound. The radiation dose is double that of 2-D mammography.
Observational data from eight screening facilities in Vermont allowed the comparison of findings from 86,379 digital breast tomosynthesis (DBT) and 97,378 full-field digital mammography (FFDM) screening examinations performed between 2012 and 2016. Women were included if they had no history of breast cancer or breast implants and if they had not chosen to opt out of clinical research projects. Demographic and risk factor information was obtained by questionnaire, and pathology for all biopsies was obtained through the Vermont Breast Cancer Surveillance System. Recall rate was lower with DBT than with FFDM (7.9% vs. 10.9%; 95% confidence interval [CI], 0.77–0.85), but there was no difference in the rates of biopsy or the detection of benign or malignant disease.
Regardless of stage, nodal status, and tumor size, screen-detected cancers have a better prognosis than those diagnosed outside of screening.
This suggests that they are biologically less lethal (perhaps slower growing and less likely to invade locally and metastasize). This is consistent with the length bias effect associated with screening. That is, screening is more likely to detect indolent (i.e., slow-growing) breast cancers, while the more aggressive cancers are detected in the intervals between screening sessions.
A 10-year follow-up study of 1,983 Finnish women with invasive breast cancer demonstrated that the method of cancer detection is an independent prognostic variable. When controlled for age, nodal status, and tumor size, screen-detected cancers had a lower risk of relapse and better overall survival. For women whose cancers were detected outside of screening, the hazard ratio (HR) for death was 1.90 (95% CI, 1.15–3.11), even though they were more likely to receive adjuvant systemic therapy.
Similarly, an examination of the breast cancers found in three randomized screening trials (Health Insurance Plan, National Breast Screening Study [NBSS]-1, and NBSS-2) accounted for stage, nodal status, and tumor size and determined that patients whose cancer was found via screening had a more favorable prognosis. The relative risks (RR) for death were 1.53 (95% CI, 1.17–2.00) for interval and incident cancers, compared with screen-detected cancers; and 1.36 (95% CI, 1.10–1.68) for cancers in the control group, compared with screen-detected cancers.
A third study compared the outcomes of 5,604 English women with screen-detected cancers to those with symptomatic breast cancers diagnosed between 1998 and 2003. After controlling for tumor size, nodal status, grade, and patient age, researchers found that the women with screen-detected cancers fared better. The HR for survival of the symptomatic women was 0.79 (95% CI, 0.63–0.99).
The findings of these studies are also consistent with the evidence that some screen-detected cancers are low risk and represent overdiagnosis.
Numerous uncontrolled trials and retrospective series have documented the ability of mammography to diagnose small, early-stage breast cancers, which have a favorable clinical course.
Individuals whose cancer is detected by screening show a higher survival rate than those whose cancers are not detected by screening even when screening has not prolonged any lives. This concept is explained by the following four types of statistical bias:
The impact of these biases is not known. A new randomized controlled trial (RCT) with cause-specific mortality as the endpoint is needed to determine both survival benefit and impact of overdiagnosis, lead time, length time, and healthy volunteer biases. This is not achievable; randomizing patients to screen and nonscreen groups would be unethical, and at least three decades of follow-up would be needed, during which time changes in treatment and imaging technology would invalidate the results. Decisions must therefore be based on available RCTs, despite their limitations, and on ecologic or cohort studies with adequate control groups and adjustment for confounding. (Refer to the PDQ summary on Cancer Screening Overview for more information.)
Performance benchmarks for screening mammography in the United States are described on the Breast Cancer Surveillance Consortium (BCSC) website. (Refer to the PDQ summary on Cancer Screening Overview for more information.)
The sensitivity of mammography is the percentage of women with breast cancers detected by mammographic screening. Sensitivity depends on tumor size, conspicuity, hormone sensitivity, breast tissue density, patient age, timing within the menstrual cycle, overall image quality, and interpretive skill of the radiologist. Overall sensitivity is approximately 79% but is lower in younger women and in those with dense breast tissue (see the BCSC website).
Sensitivity is not the same as benefit because some woman with possible breast cancer are harmed by overdiagnosis. According to the Physician's Insurance Association of America (PIAA), delay in diagnosis of breast cancer and errors in diagnosis are common causes of medical malpractice litigation. PIAA data from 2002 through 2011 note that the largest total indemnity payments for breast cancer claims are for errors in diagnosis, with an average indemnity payment of $444,557.
The specificity of mammography is the percentage of all women without breast cancer whose mammograms are negative. The false-positive rate is the likelihood of a positive test in women without breast cancer. Low specificity and high rate of false positives result in unnecessary follow-up examinations and procedures. Because specificity includes all women without cancer in the denominator, even a small percentage of false positives turns out to be a large number in absolute terms. Thus—in screening—a good specificity must be very high. Even 95% specificity is quite low for a screening test.
Interval cancers are cancers that are diagnosed in the interval between a normal screening examination and the anticipated date of the next screening mammogram. One study found interval cancers occurred more often in women younger than 50 years, and had mucinous or lobular histology, high histologic grade, high proliferative activity with relatively benign mammographic features, and no calcifications. Conversely, screen-detected cancers often had tubular histology, small size, low stage, hormone sensitivity, and a major component of DCIS.
Overall, interval cancers have characteristics of rapid growth,
are diagnosed at an advanced stage, and carry a poor prognosis.
The Nova Scotia Breast Screening Program defined missed cancers as those that were false negatives on the previous screening exam, occurring less often than 1 per 1,000 women. It concluded that interval cancers occurred in approximately 1 per 1,000 women aged 40 to 49 years, and 3 per 1,000 women aged 50 to 59 years.
Conversely, a larger trial found that interval cancers were more prevalent in women aged 40 to 49 years. Those appearing within 12 months of a negative screening mammogram were usually attributable to greater breast density. Those appearing within a 24-month interval were related to decreased mammographic sensitivity caused by greater breast density or to rapid tumor growth.
The accuracy of mammography has been noted to vary with patient characteristics, such as a woman's age, breast density, whether it is her first or subsequent exam, and the time since her last mammogram. Younger women have lower sensitivity and higher false-positive rates than do older women.
The Million Women Study in the United Kingdom found decreased sensitivity and specificity in women aged 50 to 64 years if they used postmenopausal hormone therapy, had prior breast surgery, or had a body mass index below 25.
Increased time since the last mammogram increases sensitivity, recall rate, and cancer-detection rate and decreases specificity.
Sensitivity may be improved by scheduling the exam after the initiation of menses or during an interruption from hormone therapy.
Obese women have more than a 20% increased risk of having false-positive mammography, although sensitivity is unchanged.
Dense breasts may obscure the detection of small masses on mammography, thereby reducing the sensitivity of mammography.
For women of all ages, high breast density is associated with 10% to 29% lower sensitivity.
High breast density is an inherent trait, which can be inherited
or affected by age; endogenous
and exogenous
hormones;
selective estrogen receptor modulators, such as tamoxifen;
and diet.
Hormone therapy is associated with increased breast density, lower mammographic sensitivity, and an increased rate of interval cancers.
Digital mammography is more accurate than film mammography in examining dense breasts.
Most U.S. states have enacted laws mandating that mammography facilities report breast density, but inconsistent guidelines have generated confusion and anxiety among patients and health care providers.
Dense breast tissue is not abnormal. Breast density is a description of the proportion of dense versus fatty tissue in a mammographic image.
The American College of Radiology’s BI-RADS classifies breast density as follows:
The latter two categories are considered dense breast tissue, a description affecting 43% of women aged 40 to 74 years.
A radiologist's assignment of breast density is subjective, and in any woman, it may vary over time.
While breast density is associated with an increased risk of breast cancer,
density is only a modest risk factor for breast cancer and does not confer a higher risk for breast cancer death. The fourfold elevated risk for breast cancer incidence according to breast density is a comparison of density category d versus density category a.
Supplemental imaging with ultrasonography or breast magnetic resonance imaging (MRI) has been suggested by some groups for screening women with dense breasts, but there are no data showing that this strategy results in lower breast cancer mortality. The potential harm of adding these supplemental screening tests is the likelihood of producing more false positives, leading to additional imaging and breast biopsies, with resultant anxiety and cost.
Supplemental screening may also increase overdiagnosis of breast cancer with resultant overtreatment.
Mucinous and lobular cancers are more easily detected by mammography. Rapidly growing cancers can sometimes be mistaken for normal breast tissue (e.g., medullary carcinomas, an uncommon type of invasive ductal breast cancer that is often associated with the BRCA1 mutation and aggressive characteristics, but that may demonstrate comparatively favorable responses to treatment).
Some other cancers associated with BRCA1/2 mutations, which may appear indolent, can also be missed.
Radiologists’ performance is variable, affected by levels of experience and the volume of mammograms they interpret.
Biopsy recommendations of radiologists in academic settings have a higher positive PPV than do community radiologists.
Fellowship training in breast imaging may improve detection.
Performance also varies by facility. Mammographic screening accuracy was higher at facilities offering only screening examinations than at those also performing diagnostic tests. Accuracy was also better at facilities with a breast imaging specialist on staff, performing single rather than double readings, and reviewing performance audits two or more times each year.
False-positive rates are higher at facilities where concern about malpractice is high and at facilities serving vulnerable women (racial or ethnic minorities and women with less education, limited household income, or rural residence).
These populations may have a higher cancer prevalence and a lack of follow-up.
International comparisons of screening mammography have found higher specificity in countries with more highly centralized screening systems and national quality assurance programs.
The recall rate in the United States is twice that of the United Kingdom, with no difference in the rate of cancer detection.
The likelihood of diagnosing cancer is highest with the prevalent (first) screening examination, ranging from 9 to 26 cancers per 1,000 screens, depending on the woman’s age. The likelihood decreases for follow-up examinations, ranging from 1 to 3 cancers per 1,000 screens.
The optimal interval between screening mammograms is unknown; there is little variability across the trials despite differences in protocols and screening intervals. A prospective U.K. trial randomly assigned women aged 50 to 62 years to receive mammograms annually or triennially. Although tumor grade and nodal status were similar in the two groups, more cancers of slightly smaller size were detected in the annual screening group than in the triennial screening group.
A large observational study found a slightly increased risk of late-stage disease at diagnosis for women in their 40s who were adhering to a 2-year versus a 1-year schedule (28% vs. 21%; odds ratio [OR], 1.35; 95% CI, 1.01–1.81), but no difference was seen for women in their 50s or 60s based on schedule difference.
A Finnish study of 14,765 women aged 40 to 49 years randomly assigned women to receive either annual screens or triennial screens. There were 18 deaths from breast cancer in 100,738 life-years in the triennial screening group and 18 deaths from breast cancer in 88,780 life-years in the annual screening group (HR, 0.88; 95% CI, 0.59–1.27).
RCTs that studied the effect of screening mammography on breast cancer mortality were performed between 1963 and 2015, with participation by over half-a-million women in four countries. One trial, the Canadian NBSS-2, compared mammography plus clinical breast examination (CBE) to CBE alone; the other trials compared screening mammography with or without CBE to usual care. Refer to the Appendix of Randomized Controlled Trials section of this summary for a detailed description of the trials.
The trials differed in design, recruitment of participants, interventions (both screening and treatment), management of the control group, compliance with assignment to screening and control groups, and analysis of outcomes. Some trials used individual randomization, while others used cluster randomization in which cohorts were identified and then offered screening; one trial used nonrandomized allocation by day of birth in any given month. Cluster randomization sometimes led to imbalances between the intervention and control groups. Age differences have been identified in several trials, although the differences had no major effect on the trial outcome.
In the Edinburgh Trial, socioeconomic status, which correlates with the risk of breast cancer mortality, differed markedly between the intervention and control groups, rendering the results uninterpretable.
Breast cancer mortality was the major outcome parameter for each of these trials, so the attribution of cause of death required scrupulous attention. The use of a blinded monitoring committee (New York) and a linkage to independent data sources, such as national mortality registries (Swedish trials), were incorporated but could not ensure impartial attributions of cancer death for women in the screening or control arms. Possible misclassification of breast cancer deaths in the Two-County Trial biasing the results in favor of screening has been suggested.
There were also differences in the methodology used to analyze the results of these trials. Four of the five Swedish trials were designed to include a single screening mammogram in the control group and were timed to correspond with the end of the series of screening mammograms in the study group. The initial analysis of these trials used an evaluation analysis, tallying only the breast cancer deaths that occurred in women whose cancer was discovered at or before the last study mammogram. In some of the trials, a delay occurred in the performance of the end-of-study mammogram, resulting in more time for members of the control group to develop or be diagnosed with breast cancer. Other trials used a follow-up analysis, which counts all deaths attributed to breast cancer, regardless of the time of diagnosis. This type of analysis was used in a meta-analysis of four of the five Swedish trials as a response to concerns about the evaluation analyses.
The accessibility of the data for international audits and verification also varied, with a formal audit having been undertaken only in the Canadian trials. Other trials have been audited to varying degrees, but with less rigor.
All of these studies were designed to study breast cancer mortality rather than all-cause mortality because breast cancer deaths contribute only a small proportion of total mortality in any given population. When all-cause mortality in these trials was examined retrospectively, only the Edinburgh Trial showed a difference attributable to the previously noted socioeconomic differences in the study groups. The meta-analysis (follow-up methods) of the four Swedish trials also showed a small improvement in all-cause mortality.
The relative improvement in breast cancer mortality attributable to screening is approximately 15% to 20%, and the absolute improvement at the individual level is much less. The potential benefit of breast cancer screening can be expressed as the number of lives extended because of early breast cancer detection.
The RCT results represent experiences in a defined period of regular examinations, but in practice, women undergo 20 to 30 years of screening throughout their lifetimes.
There are several problems with using these RCTs that were performed up to 50 years ago to estimate the current benefits of screening on breast cancer mortality. These problems include the following:
For these reasons, estimates of the breast cancer mortality reduction resulting from current screening are based on well-conducted cohort and ecologic studies in addition to the RCTs.
An estimate of screening effectiveness can be obtained from nonrandomized controlled studies of screened versus nonscreened populations, case-control studies of screening in real communities, and modeling studies that examine the impact of screening on large populations. These studies must be designed to minimize or exclude the effects of unrelated trends influencing breast cancer mortality such as improved treatment and heightened awareness of breast cancer in the community.
Three population-based, observational studies from Sweden compared breast cancer mortality in the presence and absence of screening mammography programs. One study compared two adjacent time periods in 7 of the 25 counties in Sweden and found a statistically significant breast cancer mortality reduction of 18% to 32% attributable to screening.
The most important bias in this study is that the advent of screening in these counties occurred over a period during which dramatic improvements in the effectiveness of adjuvant breast cancer therapy were being made, changes that were not addressed by the study authors. The second study considered an 11-year period comparing seven counties with screening programs with five counties without them.
There was a trend in favor of screening, but again, the authors did not consider the effect of adjuvant therapy or differences in geography (urban vs. rural) that might affect treatment practices.
The third study attempted to account for the effects of treatment by using a detailed analysis by county. It found screening had little impact, a conclusion weakened by several flaws in design and analysis.
In Nijmegen, the Netherlands, where a population-based screening program was undertaken in 1975, a case-cohort study found that screened women had decreased mortality compared with unscreened women (OR, 0.48).
However, a subsequent study comparing Nijmegen breast cancer mortality rates with neighboring Arnhem in the Netherlands, which had no screening program, showed no difference in breast cancer mortality.
A community-based case-control study of screening in high-quality U.S. health care systems between 1983 and 1998 found no association between previous screening and reduced breast cancer mortality, but the mammography screening rates were generally low.
A well-conducted ecologic study compared three pairs of neighboring European countries that were matched on similarity in health care systems and population structure, one of which had started a national screening program some years earlier than the others. The investigators found that each country had experienced a reduction in breast cancer mortality, with no difference between matched pairs that could be attributed to screening. The authors suggested that improvements in breast cancer treatment and/or health care organizations were more likely responsible for the reduction in mortality than was screening.
A systematic review of ecologic and large cohort studies published through March 2011 compared breast cancer mortality in large populations of women, aged 50 to 69 years, who started breast cancer screening at different times. Seventeen studies met inclusion criteria, but all studies had methodological problems, including control group dissimilarities, insufficient adjustment for differences between areas in breast cancer risk and breast cancer treatment, and problems with similarity of measurement of breast cancer mortality between compared areas. There was great variation in results among the studies, with four studies finding a relative reduction in breast cancer mortality of 33% or more (with wide CIs) and five studies finding no reduction in breast cancer mortality. Because only a part of the overall reduction in breast cancer mortality could possibly be attributed to screening, the review concluded that any relative reduction in breast cancer mortality resulting from screening would likely be no more than 10%.
A U.S. ecologic analysis conducted between 1976 and 2008 examined the incidence of early-stage versus late-stage breast cancer for women aged 40 years and older. To assess a screening effect, the authors compared the magnitude of increase in early-stage cancer with the magnitude of an expected decrease in late-stage cancer. Over the study, the absolute increase in the incidence of early-stage cancer was 122 cancers per 100,000 women, while the absolute decrease in late-stage cancers was 8 cases per 100,000 women. After adjusting for changes in incidence resulting from hormone therapy and other undefined causes, the authors concluded (1) the benefit of screening on breast cancer mortality was small, (2) between 22% and 31% of diagnosed breast cancers represented overdiagnosis, and (3) the observed improvement in breast cancer mortality was probably attributable to improved treatment rather than screening.
An analytic approach was used to approximate the contributions of screening versus treatment to breast cancer mortality reduction and the magnitude of overdiagnosis.
The shift in the size distribution of breast cancers in the United States (before the introduction of mammography) to 2012 (after its widespread dissemination), was investigated using SEER data in women aged 40 years and older. The rate of clinically meaningful breast cancer was assumed to be stable during this time. The authors documented a lower incidence of larger (≥2 cm) tumors as well as a reduction in breast cancer case fatality. The lower mortality for women with larger tumors was attributed to improvements in therapy. Two-thirds of the decline in size-specific case fatality was ascribed to improved treatment.
A prospective cohort study of community-based screening programs in the United States found that annual compared with biennial screening mammography did not reduce the proportion of unfavorable breast cancers detected in women aged 50 to 74 years or in women aged 40 to 49 years without extremely dense breasts. Women aged 40 to 49 years with extremely dense breasts did have a reduction in cancers larger than 2.0 cm with annual screening (OR, 2.39; 95% CI, 1.37–4.18).
An observational study of women aged 40 to 74 years conducted in 7 of 12 Canadian screening programs compared breast cancer mortality in those participants screened at least once between 1990 and 2009 (85% of the population) with those not screened (15% of the population). The abstract reported a 40% average breast cancer mortality among participants; however, it was likely intended to report a 40% reduction in breast cancer mortality on the basis of language utilized in the Discussion section.
Limitations of this study included the lack of all-cause mortality data, the extent of screening, screening outside of the study, screening prior to the study, the method used for calculating expected mortality and the referent rates of nonparticipants, nonparticipant survival, province-specific population differences, the extent to which limitations of the database prevented correcting for age and other differences between participants, the generalizability of the substudy data of a single province (British Columbia), and the potentially large impact of selection bias. Overall, the study lacked important data and had limitations in methodology and data analysis.
The optimal screening interval has been addressed by modelers. Modeling makes assumptions that may not be correct; however, the credibility of modeling is greater when the model produces overall results that are consistent with randomized trials and when the model is used to interpolate or extrapolate. For example, if a model’s output agrees with RCT outcomes for annual screening, it has greater credibility to compare the relative effectiveness of biennial versus annual screening.
In 2000, the National Cancer Institute formed a consortium of modeling groups (Cancer Intervention and Surveillance Modeling Network [CISNET]) to address the relative contribution of screening and adjuvant therapy to the observed decline in breast cancer mortality in the United States.
These models predicted reductions in breast cancer mortality similar to those expected in the circumstances of the RCTs but updated to the use of modern adjuvant therapy. In 2009, CISNET modelers addressed several questions related to the harms and benefits of mammography, including comparing annual versus biennial screening.
Women aged 50 to 74 years received most of the mortality benefit of annual screening by having a mammogram every 2 years. The reduction in breast cancer deaths that was maintained because of the move from annual to biennial screening ranged across the six models from 72% to 95%, with a median of 80%.
Data are limited as to how much of the reduction in mortality, seen over time from 1990 onward, is attributable to advances in imaging techniques for screening and as to how much is the result of the improved effectiveness of therapy. In one CISNET study of six simulation models, about one-third of the decrease in breast cancer mortality in 2012 was attributable to screening, with the balance attributed to treatment.
In this CISNET study, the mean estimated reduction in overall breast cancer mortality rate was 49% (model range, 39%–58%), relative to the estimated baseline rate in 2012 if there was no screening or treatment; 37% (model range, 26%–51%) of this reduction was associated with screening, and 63% (model range, 49%–74%) of this reduction was associated with treatment.
The negative effects of screening mammography are overdiagnosis (true positives that will not become clinically significant), false positives (related to the specificity of the test), false negatives (related to the sensitivity of the test), discomfort associated with the test, radiation risk, psychological harm, financial stress, and opportunity costs.
Table 1 provides an overview of the estimated benefits and harms of screening mammography for 10,000 women who underwent annual screening mammography over a 10-year period.
Age, y | No. of Breast Cancer Deaths Averted With Mammography Screening During the Next 15 y | No. (95% CI) With ≥1 False-Positive Result During the 10 y | No. (95% CI) With ≥1 False Positive Resulting in a Biopsy During the 10 y | No. of Breast Cancers or DCIS Diagnosed During the 10 y That Would Never Become Clinically Important (Overdiagnosis) | |
---|---|---|---|---|---|
40 | 1–16 | 6,130 (5,940–6,310) | 700 (610–780) | ?–104 | |
50 | 3–32 | 6,130 (5,800–6,470) | 940 (740–1,150) | 30–137 | |
60 | 5–49 | 4,970 (4,780–5,150) | 980 (840–1,130) | 64–194 | |
No. = number; CI = confidence interval; DCIS = ductal carcinoma in situ. | |||||
aAdapted from Pace and Keating. | |||||
bNumber of deaths averted are from Welch and Passow. The lower bound represents breast cancer mortality reduction if the breast cancer mortality relative risk were 0.95 (based on minimal benefit from the Canadian trials ), and the upper bound represents the breast cancer mortality reduction if the relative risk were 0.64 (based on the Swedish 2-County Trial ). | |||||
cFalse positive and biopsy estimates and 95% confidence intervals are 10-year cumulative risks reported in Hubbard et al. and Braithwaite et al. | |||||
dThe number of overdiagnosed cases are calculated by Welch and Passow. The lower bound represents overdiagnosis based on results from the Malmö trial, whereas the upper bound represents the estimate from Bleyer and Welch. | |||||
eThe lower-bound estimate for overdiagnosis reported by Welch and Passow came from the Malmö study. The study did not enroll women younger than 50 years. |
Overdiagnosis occurs when screening procedures detect cancers that would never become clinically apparent in the absence of screening. The magnitude of overdiagnosis is debated, particularly regarding DCIS, a cancer precursor whose natural history is unknown. By reason of this inability to predict confidently the tumor behavior at time of diagnosis, standard treatment for invasive cancers and DCIS can cause overtreatment. The related harms include treatment-related side effects and the number of harms associated with a cancer diagnosis, which are immediate. Conversely, a mortality benefit would occur at an uncertain point in the future.
One approach to understanding overdiagnosis is to examine the prevalence of occult cancer in women who died of noncancer causes. In an overview of seven autopsy studies, the median prevalence of occult invasive breast cancer was 1.3% (range, 0%–1.8%) and of DCIS was 8.9% (range, 0%–14.7%).
Overdiagnosis can be indirectly measured by comparing breast cancer incidence in screened versus unscreened populations. These comparisons can be confounded by differences in the populations, such as time, geography, health behaviors, and hormone usage. The calculations of overdiagnosis can vary in their adjustment for lead-time bias.
An overview of 29 studies found calculated rates of overdiagnosis to be 0%–54%, with rates from randomized studies between 11% and 22%.
In Denmark, where screened and unscreened populations existed concurrently, the rate of overdiagnosis of invasive cancer was calculated to be 14% and 39%, using two different methodologies. If DCIS cases were included, the overdiagnosis rates were 24% and 48%. The second methodology accounts for regional differences in women younger than the screening age and is likely more accurate.
Theoretically, in a given population, the detection of more breast cancers at an early stage would result in a subsequent reduction in the incidence of advanced-stage cancers. This has not occurred in any of the populations studied to date. Thus, the detection of more early stage cancers likely represents overdiagnosis. A population-based study in the Netherlands showed that about one-half of all screen-detected breast cancers, including DCIS, would represent overdiagnosis and is consistent with other studies, which showed substantial rates of overdiagnosis associated with screening.
A cohort study in Norway compared the increase in cancer incidence in women who were eligible for screening with the cancer incidence in younger women who were not eligible for screening, eligibility was based on age and residence. Eligible women experienced a 60% increase in incidence of localized cancers (RR, 1.60; 95% CI, 1.42–1.79), while the incidence of advanced cancers remained similar in the two groups (RR, 1.08; 95% CI, 0.86–1.35).
A population study that compared different counties in the United States showed that higher rates of screening mammography use were associated with higher rates of breast cancer diagnoses, yet there was no corresponding decrease in 10-year breast cancer mortality.
The strengths of this study include its very large size (16 million women) and the strength and consistency of correlation observed across counties. The limitations of this study include the self-reporting of mammograms, the use of a 2-year window to estimate screening prevalence, and the period of analysis (when menopausal hormone use was present).
The extent of overdiagnosis has been estimated in the Canadian NBSS, a randomized clinical trial. At the end of the five screening rounds, 142 more invasive breast cancer cases were diagnosed in the mammography arm, compared with the control arm.
At 15 years, the excess number of cancer cases in the mammography arm versus the control arm was 106, representing an overdiagnosis rate of 22% for the 484 screen-detected invasive cancers.
As a consequence of screening mammography, greater numbers of breast cancers with indolent behavior are now identified, resulting in potential overtreatment. In a secondary analysis of a randomized trial of tamoxifen versus no systemic therapy in patients with early breast cancer, the authors utilized the 70-gene MammaPrint assay and identified 15% of patients at ultra-low risk, with 20-year disease-specific survival rates of 97% in the tamoxifen group and 94% in the control group. Thus, these patients would likely have extremely good outcomes with surgery alone. The frequency of such ultra-low risk cancers in the screened population is likely around 25%. Tools such as the 70-gene MammaPrint assay might be utilized in the future to identify these cancers, and thereby, reduce the risk of overtreatment. However, additional studies are needed to confirm these findings.
In 2016, the Canadian NBSS, a randomized screening trial with 25-year follow-up, re-estimated overdiagnosis of breast cancer from mammography screening by age group and concluded that approximately 30% of invasive screen-detected cancers in women aged 40 to 49 years and up to 20% of those detected in women aged 50 to 59 years were overdiagnosed. When in situ cancers are included, the estimated risks of overdiagnosis are 40% aged 40 to 49 years and 30% in women aged 50 to 59 years. Overdiagnosis was calculated as the persistent excess incidence in the screened arm versus the control arm divided by the number of screen-detected cases (excess incidence method). Requirements for adequate estimation of overdiagnosis utilizing this method included the following:
These conditions were largely met in the CNBSS because population-based screening did not become available throughout Canada until a minimum of 2 years later and in most instances 5 to 10 years later (thereby, allowing for cessation of screening after the trial screening period and follow-up longer than most estimates of lead time), because contamination is documented to have been minimal, and because individual randomization resulted in 44 almost identically distributed demographic factors and risk factors between the two trial arms.
Since the conclusion of the trial screening period in 1988, differences in screening quality, intensity, invited age range, and biopsy thresholds decrease the generalizability of these results. These factors and improved imaging technique/quality and low threshold for biopsy, likely contribute to lower estimates of overdiagnosis of in situ cancer than that of invasive cancer.
Table 1, above, shows results from a 10-year period of screening 10,000 women, estimating the number of women with breast cancer or DCIS that would never become clinically important (overdiagnosis). There was likely no overdiagnosis in the Health Insurance Plan study, which used old-technology mammography and CBE. Overdiagnosis has become more prominent in the era of improved-technology mammography. The improved technology has not, however, been shown to make further reductions in mortality than the original technology. In summary, breast cancer overdiagnosis is a complex topic. Studies that used many different methods reported a wide range of estimates, and there is currently no way to assess whether new cancer cases are overdiagnosed or are of real harm to patients.
Because fewer than 5 per 1,000 women screened have breast cancer, most abnormal mammograms are false positives, even given the 90% specificity of mammography (i.e., 90% of all women without breast cancer will have a negative mammogram).
This high false positive rate of mammography is underestimated and can seem counterintuitive because of a statistically based cognitive bias known as the base rate fallacy. Because the base rate of breast cancer is low, (5/1000), the false-positive rate vastly exceeds the true-positive rate, even when utilizing a very accurate test.
Mammography’s true-positive rate of approximately 90% means that, of women with breast cancer, approximately 90% will test positive. The true-negative rate of 90% means that, of women without breast cancer, 90% will test negative. A 10% false-positive rate over 1,000 people means that there will be 100 false positives in 1,000 people. If 5 in 1,000 women have breast cancer, then 4.5 women with breast cancer will have a positive test. In other words, there will approximately 100 false positive for every 4.5 true positives.
Further, abnormal results from screening mammograms prompt additional tests and procedures, such as mammographic views of the region of concern, ultrasound, MRI, and tissue sampling (by fine-needle aspiration, core biopsy, or excisional biopsy). Overall, the harm from unnecessary tests and treatments must be weighed against the benefit of early detection.
A study of breast cancer screening in 2,400 women enrolled in a health maintenance organization found that over a decade, 88 cancers were diagnosed, 58 of which were identified by mammography. One-third of the women had an abnormal mammogram result that required additional testing: 539 additional mammograms, 186 ultrasound examinations, and 188 biopsies. The cumulative biopsy rate (the rate of true positives) resulting from mammographic findings was approximately 1 in 4 (23.6%). The PPV of an abnormal screening mammogram in this population was 6.3% for women aged 40 to 49 years, 6.6% for women aged 50 to 59 years, and 7.8% for women aged 60 to 69 years.
A subsequent analysis and modeling of data from the same cohort of women, estimated that the risk of having at least one false-positive mammogram was 7.4% (95% CI, 6.4%–8.5%) at the first mammogram, 26.0% (95% CI, 24.0%–28.2%) by the fifth mammogram, and 43.1% (95% CI, 36.6%–53.6%) by the ninth mammogram.
Cumulative risk of at least one false-positive result depended on four patient variables (younger age, higher number of previous breast biopsies, family history of breast cancer, and current estrogen use) and three radiologic variables (longer time between screenings, failure to compare the current and previous mammograms, and the individual radiologist’s tendency to interpret mammograms as abnormal). Overall, the factor most responsible for a false-positive mammogram was the individual radiologist’s tendency to read mammograms as abnormal.
A prospective cohort study of community-based screening found that a greater proportion of women undergoing annual screening had at least one false-positive screen after 10 years than did women undergoing biennial screening, regardless of breast density. For women with scattered fibroglandular densities, the difference was 68.9% (annual) versus 46.3% (biennial) for women in their 40s. For women aged 50 to 74 years, the difference for this density group was 49.8% (annual) versus 30.7% (biennial).
As shown in Table 1, the estimated number of women out of 10,000 who underwent annual screening mammography during a 10-year period with at least one false-positive test result is 6,130 for women aged 40 to 50 years and 4,970 for women aged 60 years. The number of women with a false-positive test that results in a biopsy is estimated to range from 700 to 980, depending on age.
The sensitivity of mammography ranges from 70% to 90%, depending on characteristics of the interpreting radiologist (level of experience) and characteristics of the woman (age, breast density, hormone status, and diet). Assuming an average sensitivity of 80%, mammograms will miss approximately 20% of the breast cancers that are present at the time of screening (false negatives). Many of these missed cancers are high risk, with adverse biologic characteristics. If a normal mammogram dissuades or postpones a woman or her doctor from evaluating breast symptoms, she may suffer adverse consequences. Thus, a negative mammogram should never dissuade a woman or her physician from additional evaluation of breast symptoms.
Positioning of the woman and breast compression reduce motion artifact and improve mammogram image quality. Pain and/or discomfort was reported by 90% of women undergoing mammography, with 12% of women rating the sensation as intense or intolerable.
A systematic review of 22 studies investigating mammography-associated pain and discomfort found wide variations, some of which were associated with menstrual cycle stage, anxiety, and premammography anticipation of pain.
The major risk factors for radiation-associated breast cancer are young age at exposure and dose; however, rarely there are women with an inherited susceptibility to radiation-induced damage who must avoid radiation exposure at any age.
For many women older than 40 years, the likely benefits of screening mammography outweigh the risks.
Standard two-view screening mammography exposes the breasts to a mean dose of 4 mSv, and the whole body to 0.29 mSv.
Thus, up to one breast cancer may be induced per 1,000 women undergoing annual mammograms from ages 40 to 80 years. Such risk is doubled in women with large breasts who require increased radiation doses and in women with breast augmentation who require additional views. Radiation-induced breast cancers may be reduced fivefold for women who begin biennial screening at age 50 years rather than annually at age 40 years.
A telephone survey of 308 women performed 3 months after screening mammography revealed that about one-fourth of the 68 women recalled for additional testing were still experiencing worry that affected their mood or functioning, even though that testing had ruled out cancer.
Research into whether the psychological impact of a false-positive test is long-standing yields mixed results. A cohort study in Spain in 2002 found immediate psychological impact to a woman after receiving a false-positive mammogram, but these results dissipated within a few months.
A cohort study in Denmark in 2013 that measured the psychological effects of a false-positive test result several years after the event found long-term negative psychological consequences.
Several studies have shown that the anxiety after evaluation of a false-positive test leads to increased participation in future screening examinations.
These potential harms of screening have not been well researched, but it is clear that they exist.
超声可用于对触诊或乳腺X线摄影检出的肿块进行诊断评估,而不是作为初始的筛查手段。欧洲乳腺癌筛查工作组发表了一篇综述和专家意见,结论是“尚无证据支持超声可用于对任何年龄女性进行乳腺癌筛查。”
日本的一项筛查试验“战略性抗癌随机试验(J-START)”,将40-49岁的女性随机分配到乳腺X线和超声筛查组(干预组)或仅乳腺X线摄影(对照组)中。该试验的初步结果表明,在乳腺X线摄影中增加超声筛查可显著提高乳腺癌检出率,但尚未评估其对乳腺癌死亡率的影响。
乳腺磁共振用于对女性的诊断评估,包括评估硅胶乳房假体的完整性、评价手术或放疗后可触性肿块、在已有腋窝淋巴结转移的患者中发现乳腺X线及超声未能检出的隐匿性乳腺癌,以及为一些已知的乳腺癌患者进行术前规划。该检查没有辐射暴露。对于乳腺癌发生风险较高的人群推荐使用磁共振成像进行乳腺癌筛查,这些人群包括:BRCA1/2 突变携带者、明确乳腺癌家族史、及部分遗传综合征如Li-Fraumeni 综合征或Cowden氏病。
乳腺磁共振比乳腺X线摄影更灵敏,但特异度较低,
并且价格高达其35倍。
使用红外成像技术,乳房的热成像可以将皮肤中的温度变化作为潜在肿瘤提示,并将这些温度变化以不同的颜色图案表示。热成像设备已获得美国食品药品监督管理局(510.k)批准,但尚无随机试验将热成像与其它筛查方式进行比较。小型队列研究结果显示使用热成像辅助乳腺癌筛查并不增加获益。
Ultrasound is used for the diagnostic evaluation of palpable or mammographically identified masses, rather than serving as a primary screening modality. A review of the literature and expert opinion by the European Group for Breast Cancer Screening concluded that “there is little evidence to support the use of ultrasound in population breast cancer screening at any age.”
The Japan Strategic Anti-cancer Randomized Trial (J-START) is a screening trial that randomly assigned women aged 40 to 49 years to either mammography and ultrasound screening (intervention group) or mammography screening alone (control group). The initial results of this trial indicated that the addition of screening ultrasound to mammography substantially increases breast cancer detection rates, but the impact on breast cancer mortality has not yet been evaluated.
Breast MRI is used in women for diagnostic evaluation, including evaluating the integrity of silicone breast implants, assessing palpable masses after surgery or radiation therapy, detecting mammographically and sonographically occult breast cancer in patients with axillary nodal metastasis, and preoperative planning for some patients with known breast cancer. There is no ionizing radiation exposure with this procedure. MRI has been promoted as a screening test for breast cancer among women at elevated risk of breast cancer based on BRCA1/2 mutation carriers, a strong family history of breast cancer, or several genetic syndromes, such as Li-Fraumeni syndrome or Cowden disease.
Breast MRI is more sensitive but less specific than screening mammography
and is up to 35 times as expensive.
Using infrared imaging techniques, thermography of the breast identifies temperature changes in the skin as a possible indicator of an underlying tumor, displaying these changes in color patterns. Thermographic devices have been approved by the U.S. Food and Drug Administration under the 510(k) process, but no randomized trials have compared thermography to other screening modalities. Small cohort studies do not suggest any additional benefit for the use of thermography as an adjunct modality.
目前尚不能完全明确临床乳房触诊(CBE)筛查对乳腺癌死亡率的影响。加拿大国家乳腺癌筛查研究(CNBSS)在50至59岁的女性中,比较了高质量的CBE联合乳腺X线摄影筛查,与单独CBE筛查的效果差异。CBE由受过训练的专业医生执行,每个乳房触诊5到10分钟,同时,定期对触诊质量进行评估。两组人群的癌症诊断率、癌症分期、间期癌和乳腺癌死亡率相似,同时与单独乳腺X线摄影筛查结果相似。
平均随访13年后,两组人群的乳腺癌死亡率相似(死亡率比,1.02[95%置信区间[CL],0.78-1.33])。
调查者对CBE单独筛查的准确性进行了评估;19965名50-59岁的女性中,CBE的第1、2、3、4和5年的灵敏度分别为83%、71%、57%、83%和77%;特异度在88%到96%之间。阳性预测值(PPV, 异常结果中检出的癌症比例)在3%至4%之间。对于只在招募时接受CBE检查的25,620名40-49岁女性,灵敏度为71%,特异度为84%,PPV为1.5%。
在由社区医生参与的临床试验中,CBE单独筛查的特异度更高(97%–99%),灵敏度相比有经验的检查医生要低(22%–36%)。
一项针对乳腺癌家族史女性的筛查研究表明:正常的初始评估之后,病人自己或由临床医生执行CBE,能发现比乳腺X线摄影筛查更多的癌症。
另一项评价乳腺X线摄影筛查基础上补充CBE检查效果的研究中,共有61,688名40岁以上女性同时接受了乳腺X线摄影和CBE筛查,乳腺X线摄影的灵敏度为78%,乳腺X线摄影联合CBE的灵敏度为82%。 两种筛查方法联合的特异度低于单独乳腺X线摄影筛查的特异度(97% vs. 99%)。
其他多国合作的CBE临床试验也在进行中,包括印度的两项研究,及埃及的一项研究。
建议每个月进行一次BSE,但没有证据表明BSE可以降低乳腺癌死亡率。
目前唯一针对BSE的大型随机对照试验来自于上海纺织女工,266,064名女工被分配到BSE指导、强化和鼓励组,或分配到下背痛指导及预防组。两组均未接受任何其它乳腺癌筛查方法。随访10-11年后,BSE组发现135例乳腺癌死亡,而对照组发现131例癌症死亡(相对危险度[RR],1.04; 95%CI,0.82-1.33)。尽管两组中浸润性乳腺癌的数量相当,但与对照组相比,BSE指导组女性的乳腺活检率和良性病变的检出率更高。
另有三个试验也研究了BSE。首先,10万多名圣彼得堡女性被整群随机分配到BSE培训组或对照组,BSE培训组进行了更多的乳腺活检,但乳腺癌死亡率并未改善。
其次,英国乳腺癌早期检测试验中,63,500多名45至64岁的女性接受了BSE培训课程。10年随访后,接受与未接受BSE培训的女性乳腺癌死亡率相似(RR,1.07; 95%CI,0.93-1.22)。
第三,CNBSS研究采用巢式病例对照研究设计分析了参加筛查前自我报告的BSE频率与乳腺癌死亡率之间的关系。在视诊乳房的同时,采用指腹触诊的女性,相比采用中间三指触诊的女性,乳腺癌死亡率更低。
目前已有多种分析乳房组织恶性的方法,用于乳腺癌筛查,然而上述方法均未证实能减低乳腺癌死亡率。
The effect of screening clinical breast examination (CBE) on breast cancer mortality has not been fully established. The Canadian National Breast Screening Study (CNBSS) compared high-quality CBE plus mammography with CBE alone in women aged 50 to 59 years. CBE, lasting 5 to 10 minutes per breast, was conducted by trained health professionals, with periodic evaluations of performance quality. The frequency of cancer diagnosis, stage, interval cancers, and breast cancer mortality were similar in the two groups and similar to outcomes with mammography alone.
With a mean follow-up of 13 years, breast cancer mortality was similar in the two groups (mortality rate ratio, 1.02 [95% confidence interval [CI], 0.78–1.33]).
The investigators estimated the operating characteristics for CBE alone; for 19,965 women aged 50 to 59 years, sensitivity was 83%, 71%, 57%, 83%, and 77% for years 1, 2, 3, 4, and 5 of the trial, respectively; specificity ranged between 88% and 96%. Positive predictive value (PPV), which is the proportion of cancers detected per abnormal examination, was estimated to be 3% to 4%. For 25,620 women aged 40 to 49 years who were examined only at entry, the estimated sensitivity was 71%, specificity was 84%, and PPV was 1.5%.
In clinical trials involving community clinicians, CBE-type screening had higher specificity (97%–99%) and lower sensitivity (22%–36%) than that experienced by examiners.
A study of screening in women with a positive family history of breast cancer showed that, after a normal initial evaluation, the patient herself, or her clinician performing a CBE, identified more cancers than did mammography.
Another study examined the usefulness of adding CBE to screening mammography; among 61,688 women older than 40 years and screened by mammography and CBE, sensitivity for mammography was 78%, and combined mammography-CBE sensitivity was 82%. Specificity was lower for women undergoing both screening modalities than it was for women undergoing mammography alone (97% vs. 99%).
Other international trials of CBE are under way, two in India and one in Egypt.
Monthly BSE has been promoted, but there is no evidence that it reduces breast cancer mortality.
The only large, randomized clinical trial of BSE assigned 266,064 female Shanghai factory workers to either BSE instruction with reinforcement and encouragement, or instruction on the prevention of lower back pain. Neither group underwent any other breast cancer screening. After 10 to 11 years of follow-up, 135 breast cancer deaths occurred in the instruction group, and 131 cancer deaths occurred in the control group (relative risk [RR], 1.04; 95% CI, 0.82–1.33). Although the number of invasive breast cancers diagnosed in the two groups was about the same, women in the instruction group had more breast biopsies and more benign lesions diagnosed than did women in the control group.
Other research results on BSE come from three trials. First, more than 100,000 Leningrad women were assigned to BSE training or control by cluster randomization; the BSE group training had more breast biopsies without improved breast cancer mortality.
Second, in the United Kingdom Trial of Early Detection of Breast Cancer, more than 63,500 women aged 45 to 64 years were invited to educational sessions about BSE. After 10 years of follow-up, breast cancer mortality rates were similar to the rates in centers without organized BSE education (RR, 1.07; 95% CI, 0.93–1.22).
Thirdly, in contrast, a case-control study nested within the CNBSS compared self-reported BSE frequency before enrollment with breast cancer mortality. Women who examined their breasts visually, used their finger pads for palpation, and used their three middle fingers had a lower breast cancer mortality rate.
Various methods to analyze breast tissue for malignancy have been proposed to screen for breast cancer, but none have been shown to be associated with mortality reduction.
研究设计和执行方法使得这些结果难以评估,也难以与其它试验结果相结合。
在中位随访17.7年后,乳腺癌死亡率降低对应的绝对危险度减少为:每1000人中减少了0.1(或每10000人中减少了1)。
证据不足以支持39至49岁女性进行乳腺X线摄影筛查可以显著降低乳腺癌死亡率的结论。由于非标准的成像时间、非标准的成像方案和非标准的活检阈值,所报告的死亡率降低是非常小而短暂的乳腺癌死亡率降低;因此,它与一般人群的相关性不确定。从绝对值上讲,这相当于使死亡的绝对风险降低了1000分之0.1(或10,000之一)。此外,死亡率的降低是基于对原始数据的重新分析,该分析结果没有显著的统计学意义,并且对亚组乳腺癌死亡率的重新计算仅限于10年随访。在20年的随访中,乳腺癌或全因死亡风险并没有统计学上显著的降低。
此证据不足以明确过度诊断的程度。因为证据是基于亚组分析、非标准的成像时间、非标准的成像方案,以及与一般人群相关性不确定的非标准的活检阈值,所以它不支持研究人员提出的“至少有少量过度诊断”的结论。
The study design and conduct make these results difficult to assess or combine with the results of other trials.
The reduction in breast cancer mortality at a median follow-up of 17.7 years corresponds to an absolute risk reduction of 0.1 of 1,000 (or 1 of 10,000) fewer deaths.
The evidence is inadequate to support the conclusion of a clinically significant breast cancer mortality reduction attributable to initiation of screening mammography among women aged 39 to 49 years. The reported mortality reduction is a very small, transient reduction in breast cancer mortality based on a nonstandard imaging schedule, nonstandard imaging protocol, and nonstandard threshold for biopsy; therefore, it is of uncertain relevance to the general population. In absolute terms, it corresponds to an absolute risk reduction of 0.1 of 1,000 (or 1 of 10,000) fewer deaths. Additionally, the mortality reduction is based on a re-analysis of the original data set, which was not statistically significant, and the recalculation of breast cancer mortality in a subgroup restricted to 10 years of follow-up. At 20 years of follow-up, there was no statistically significant decrease in risk of breast cancer or all-cause mortality.
The evidence is inadequate to make a clear determination of the magnitude of overdiagnosis. Because the evidence is based on subgroup analysis and nonstandard imaging schedule, nonstandard imaging protocol, and a nonstandard threshold for biopsy with uncertain relevance to the general population, it does not support the investigators' conclusion of “at worst a small amount of overdiagnosis."
PDQ癌症信息摘要已经定期审核,更新信息已经可以获取。 本节描述了截至上述日期对该摘要所做的最新更改。
经修订的文本指出:具有遗传风险的妇女,包括BRCA1和BRCA2基因携带者,约占乳腺癌病例的5%至10%。
本篇内容由PDQ儿科治疗编委会撰写和维护,编委会是独立于NCI的机构。本篇内容的立场选取公正,不代表NCI和NIH任何政治观点。有关本篇内容的政策及编委会在PDQ维护中的作用等更多信息,请参考PDQ总结以及PDQ®-NCI综合癌症数据库页面内容。
The PDQ cancer information summaries are reviewed regularly and updated as new information becomes available. This section describes the latest changes made to this summary as of the date above.
Revised text to state that women with inherited risk, including BRCA1 and BRCA2 gene carriers, comprise approximately 5% to 10% of breast cancer cases.
This summary is written and maintained by the PDQ Screening and Prevention Editorial Board, which is editorially independent of NCI. The summary reflects an independent review of the literature and does not represent a policy statement of NCI or NIH. More information about summary policies and the role of the PDQ Editorial Boards in maintaining the PDQ summaries can be found on the About This PDQ Summary and PDQ® - NCI's Comprehensive Cancer Database pages.
This PDQ cancer information summary for health professionals provides comprehensive, peer-reviewed, evidence-based information about breast cancer screening. It is intended as a resource to inform and assist clinicians who care for cancer patients. It does not provide formal guidelines or recommendations for making health care decisions.
This summary is reviewed regularly and updated as necessary by the PDQ Screening and Prevention Editorial Board, which is editorially independent of the National Cancer Institute (NCI). The summary reflects an independent review of the literature and does not represent a policy statement of NCI or the National Institutes of Health (NIH).
Board members review recently published articles each month to determine whether an article should:
Changes to the summaries are made through a consensus process in which Board members evaluate the strength of the evidence in the published articles and determine how the article should be included in the summary.
Any comments or questions about the summary content should be submitted to Cancer.gov through the NCI website's Email Us. Do not contact the individual Board Members with questions or comments about the summaries. Board members will not respond to individual inquiries.
Some of the reference citations in this summary are accompanied by a level-of-evidence designation. These designations are intended to help readers assess the strength of the evidence supporting the use of specific interventions or approaches. The PDQ Screening and Prevention Editorial Board uses a formal evidence ranking system in developing its level-of-evidence designations.
PDQ is a registered trademark. Although the content of PDQ documents can be used freely as text, it cannot be identified as an NCI PDQ cancer information summary unless it is presented in its entirety and is regularly updated. However, an author would be permitted to write a sentence such as “NCI’s PDQ cancer information summary about breast cancer prevention states the risks succinctly: [include excerpt from the summary].”
The preferred citation for this PDQ summary is:
PDQ® Screening and Prevention Editorial Board. PDQ Breast Cancer Screening. Bethesda, MD: National Cancer Institute. Updated
Images in this summary are used with permission of the author(s), artist, and/or publisher for use within the PDQ summaries only. Permission to use images outside the context of PDQ information must be obtained from the owner(s) and cannot be granted by the National Cancer Institute. Information about using the illustrations in this summary, along with many other cancer-related images, is available in Visuals Online, a collection of over 2,000 scientific images.
The information in these summaries should not be used as a basis for insurance reimbursement determinations. More information on insurance coverage is available on Cancer.gov on the Managing Cancer Care page.
More information about contacting us or receiving help with the Cancer.gov website can be found on our Contact Us for Help page. Questions can also be submitted to Cancer.gov through the website’s Email Us.
This PDQ cancer information summary for health professionals provides comprehensive, peer-reviewed, evidence-based information about breast cancer screening. It is intended as a resource to inform and assist clinicians who care for cancer patients. It does not provide formal guidelines or recommendations for making health care decisions.
This summary is reviewed regularly and updated as necessary by the PDQ Screening and Prevention Editorial Board, which is editorially independent of the National Cancer Institute (NCI). The summary reflects an independent review of the literature and does not represent a policy statement of NCI or the National Institutes of Health (NIH).
Board members review recently published articles each month to determine whether an article should:
Changes to the summaries are made through a consensus process in which Board members evaluate the strength of the evidence in the published articles and determine how the article should be included in the summary.
Any comments or questions about the summary content should be submitted to Cancer.gov through the NCI website's Email Us. Do not contact the individual Board Members with questions or comments about the summaries. Board members will not respond to individual inquiries.
Some of the reference citations in this summary are accompanied by a level-of-evidence designation. These designations are intended to help readers assess the strength of the evidence supporting the use of specific interventions or approaches. The PDQ Screening and Prevention Editorial Board uses a formal evidence ranking system in developing its level-of-evidence designations.
PDQ is a registered trademark. Although the content of PDQ documents can be used freely as text, it cannot be identified as an NCI PDQ cancer information summary unless it is presented in its entirety and is regularly updated. However, an author would be permitted to write a sentence such as “NCI’s PDQ cancer information summary about breast cancer prevention states the risks succinctly: [include excerpt from the summary].”
The preferred citation for this PDQ summary is:
PDQ® Screening and Prevention Editorial Board. PDQ Breast Cancer Screening. Bethesda, MD: National Cancer Institute. Updated
Images in this summary are used with permission of the author(s), artist, and/or publisher for use within the PDQ summaries only. Permission to use images outside the context of PDQ information must be obtained from the owner(s) and cannot be granted by the National Cancer Institute. Information about using the illustrations in this summary, along with many other cancer-related images, is available in Visuals Online, a collection of over 2,000 scientific images.
The information in these summaries should not be used as a basis for insurance reimbursement determinations. More information on insurance coverage is available on Cancer.gov on the Managing Cancer Care page.
More information about contacting us or receiving help with the Cancer.gov website can be found on our Contact Us for Help page. Questions can also be submitted to Cancer.gov through the website’s Email Us.