Applying Evidence to Practice
Reading the Medical Literature is designed as a resource for Fellows of the American College of Obstetricians and Gynecologists (ACOG) and others to offer a better understanding of evidence-based medicine, particularly as it relates to the development of ACOG's clinical practice guidelines. As evidence-based medicine continues to develop and to be integrated into clinical practice, an understanding of its basic elements is critical in translating the medical literature into appropriate clinical practice. The emphasis on evidence-based medicine has taken on new and greater importance as the environment of clinical practice grows more diverse, with increased access to more information by both physicians and patients and the changing allocation of resources. Practice guidelines are a formal synthesis of evidence, developed according to a rigorous research and review process. This document provides an overview of ACOG's guideline development process, including elements of study design that are linked to the strength of the evidence. Reading the Medical Literature is not intended to serve as a comprehensive overview of the scientific methods of epidemiology and study design. Rather, it is provided to serve as a readily available introduction to and overview of the topic.
In 1995, ACOG began developing scientifically based practice guidelines, formerly known as Practice Patterns and subsequently as Practice Bulletins. The guidelines are derived from the best available evidence of clinical efficacy and consideration of costs, with recommendations explicitly linked to the evidence. These evidence-based practice guidelines are intended to be a means of improving the quality of health care, decreasing its cost, and diminishing professional liability. They are proscriptive in nature and, therefore, directive in their approach.
This document describes how ACOG Practice Committees identify, evaluate, and synthesize evidence from the medical literature to produce practice guidelines. In particular, this document briefly describes various study designs evaluated in the production of evidence-based guidelines and the decision-making steps used to construct evidence-based recommendations on clinical issues. Also highlighted are potential major flaws in study design that affect the validity and applicability of study results, as well as the strength of the evidence. This document includes a glossary of commonly encountered epidemiologic and biostatistic terms found in reports of
scientific evidence, as well as suggestions for further reading.
Selection of Topics
Topics developed into evidence-based practice guidelines are selected based on clinical issues in obstetrics and gynecology with unexplained variations in practice or because there are differences between what is known scientifically and what is practiced. Once a topic has been identified, objectives of the guideline are developed and research questions are formulated to guide the literature search. The research questions highlight the most important aspects of a particular clinical issue, focusing on areas relevant to practice and useful in patient management.
Searching the Literature
In the ACOG Resource Center, medical librarians with extensive subject expertise perform a literature search based on the clinical questions and objectives. The search includes a review of the MEDLINE database, the Cochrane Library, ACOG's internal resources and documents, and other databases as appropriate. In addition, ACOG librarians review more than 200 journals. This process locates relevant articles from journals not indexed in MEDLINE and those not yet indexed.
The search is limited to documents published in English, and a specific strategy may be used to refine the search further. This filter strategy restricts the search by study design or publication type and is similar to the process used by the Cochrane Library. No further screening or elimination of records is done by the librarians. Updated searches are conducted as the topic is developed or further revised.
After results of the literature search are compiled, the study abstracts are reviewed to assess the relevance of each study or report. Those articles appropriate for further critical appraisal are obtained and subdivided according to the research question they address. The bibliographies of these articles are also reviewed to identify additional studies that may not have been identified in the initial literature search.
The data in the literature are evaluated to provide answers to the clinical questions. The articles obtained for review are organized by study design to ascertain the possible strengths and weaknesses inherent in each study, as well as the quality of evidence they may provide. Certain aspects of a clinical issue may not be addressed in re-search studies, and expert opinion may be used and identified as such.
The levels of evidence used are based on the method used by the U.S. Preventive Services Task Force. The U.S. Preventive Services Task Force was a 20-member panel of scientific and medical experts charged with developing recommendations on appropriate use of clinical interventions. Their recommendations were based on a systematic review of the evidence of clinical effectiveness.
Types of Study Designs
Level I Evidence
Commonly referred to as clinical trials, intervention studies are characterized by the investigators' roles in assigning subjects to exposed and nonexposed groups and
following the subjects to assess outcome. Intervention studies may involve the use of a comparison group, which may include subjects under another treatment, drug, test, or placebo.
Randomized controlled trials are characterized by the prospective assignment of subjects, through a random method, into an experimental group and a control (placebo) group. The experimental group receives the drug
or treatment to be evaluated, while the control group receives a placebo, no treatment, or the standard of care. Both groups are followed for the outcome(s) of interest. Randomization is the most reliable method to ensure that the participants in both groups are as similar as possible with respect to all known or unknown factors that might affect the outcome.
are characterized by the prospective assignment of subjects, through a random method, into an experimental group and a control (placebo) group. The experimental group receives the drug or treatment to be evaluated, while the control group receives a placebo, no treatment, or the standard of care. Both groups are followed for the outcome(s) of interest. Randomization is the most reliable method to ensure that the participants in both groups are as similar as possible with respect to all known or unknown factors that might affect the outcome.
Postmenopausal women are identified from a population and randomly assigned either to a study group that will be prescribed hormone replacement therapy or to a control group that will be prescribed a placebo. Both groups of women are observed prospectively to determine who in each group subsequently develops endometrial cancer. The rate at which women prescribed hormone replacement therapy develop endometrial cancer is compared to that of women in the control group.
Major Study Flaws
Level II-1 Evidence
- Randomization was not valid and resulted in a differential assignment of treatment that affected the outcomes.
- The sample size was too small to detect a clinically important difference.
- Poor compliance or loss to follow-up was significant enough to affect the outcomes.
Controlled trials without randomization are intervention studies in which allocation to either the experimental or control group is not based on randomization, making assignment subject to biases that may influence study results. Conclusions drawn from these types of studies are considered to be less reliable than those from randomized controlled trials.
are intervention studies in which allocation to either the experimental or control group is not based on randomization, making assignment subject to biases that may influence study results. Conclusions drawn from these types of studies are considered to be less reliable than those from randomized controlled trials.
Postmenopausal women are identified from a population and assigned in a nonrandomized manner either to a study group that will be prescribed hormone replacement therapy or to a control group that will be prescribed a placebo. Both groups of women are observed prospectively to determine who subsequently develops endometrial cancer. The rate at which women prescribed hormone replacement therapy develop endometrial cancer is compared to that of women in the control group.
Major Study Flaws
- Nonrandom group assignment resulted in unequal distribution of known and unknown factors that may influence the outcome.
- Other potential flaws are the same as those for randomized controlled trial (Level I).
Level II-2 Evidence
There are two types of observational studies in this category: cohort and case-control. In these studies, the investigator has no role in assignment of study exposures but, rather, observes the natural course of events of exposure and outcome.
The starting point for a cohort study is exposure status. Subjects are classified on the basis of the presence or absence of exposure to a risk factor, a treatment, or an intervention and then followed for a specified period to determine the presence or absence of disease. Cohort studies can be of two different types determined by the timing of initiation of the study: retrospective (nonconcurrent) or prospective (concurrent) studies. In a prospective cohort study, the groups of exposed and unexposed subjects have been identified and the investigator must conduct follow-up for an adequate period to ascertain the outcome of interest. In a retrospective cohort study, both the exposure and outcomes of interest already have occurred by the initiation of the study. The rate of disease in the exposed group is divided by the rate of disease in the unexposed group, yielding a rate ratio or relative risk.
A group of postmenopausal women who have been prescribed hormone replacement therapy is identified (study group), as is an otherwise similar group of postmenopausal women who have not been prescribed hormone replacement therapy (control group). The study and control groups are observed to determine who subsequently develops endometrial cancer. The rate at which women using hormone replacement therapy develop endometrial cancer is compared with that of women not using hormone replacement therapy who also develop endometrial cancer.
Major Study Flaws
- Criteria for determining exposure status were inadequately defined.
- The assessments of the outcome for the exposed and nonexposed groups differed in a biased manner.
- The nonexposed comparison group was inappropriate.
A case-control study
is a retrospective study in which a group of subjects with a specified outcome (cases) and a group without that same outcome (controls) are identified. Thus, the starting point for a case-control study is disease status. Investigators then compare the extent to which each subject was previously exposed to the variable of interest such as a risk factor, a treatment, or an intervention. A disadvantage of this study type is that assessment of exposure may have been influenced by disease status, including the possibility that cases recalled their exposure differently than controls. The odds of exposure in the case group compared with the odds of exposure in the control group provide the measure of association between the disease and exposure (odds ratio).
Researchers conduct a case-control study to assess the relationship between hormone replacement therapy and endometrial cancer. A group of women who have recently developed endometrial cancer (cases) and a group of women with similar characteristics who did not develop endometrial cancer (controls) are identified. The use of hormone replacement therapy for each woman in the case group and the control group is determined to assess exposure history. The odds that women who developed endometrial cancer had used hormone replacement therapy are compared with the odds that women who did not develop endometrial cancer had used hormone replacement therapy. These odds are calculated to determine any association of hormone replacement therapy to endometrial cancer.
Major Study Flaws
Level II-3 Evidence
- The case or control group preferentially included or excluded subjects with a particular exposure history.
- Cases or controls were selectively more likely to recall or admit to a particular exposure.
- The possibility of known or unknown factors that may have been related to both exposure status and outcome were not adequately considered and assessed.
Cross-sectional studies are observational studies that assess the status of individuals with respect to the presence or absence of both exposure and outcome at a particular time. In this type of study, one is unlikely to be able to discern the temporal relationship between an exposure and outcome. Results from cross-sectional studies can yield correlations and disease prevalence. Prevalence is defined as the proportion of individuals with a disease at a specific time; in contrast, incidence is the number of new cases occurring over a specified period.
are observational studies that assess the status of individuals with respect to the presence or absence of both exposure and outcome at a particular time. In this type of study, one is unlikely to be able to discern the temporal relationship between an exposure and outcome. Results from cross-sectional studies can yield correlations and disease prevalence. Prevalence is defined as the proportion of individuals with a disease at a specific time; in contrast, incidence is the number of new cases occurring over a specified period.
Uncontrolled investigational studies report the results of treatment or interventions in a particular group, but lack a control group for comparison. They may demonstrate impressive results, but in the absence of a control group the results may be attributable to factors other than the intervention or treatment.
report the results of treatment or interventions in a particular group, but lack a control group for comparison. They may demonstrate impressive results, but in the absence of a control group the results may be attributable to factors other than the intervention or treatment.
Of all observational studies, cross-sectional and uncontrolled investigational studies provide the least evidence of causation.
Postmenopausal women are identified from a population and surveyed at a particular time about their current intake of calcium. Bone densitometry is evaluated in these women at the same time to identify signs of osteoporosis. In this cross-sectional study, a measure of calcium intake in women with and without signs of osteoporosis is compared.
Major Study Flaws
Level III Evidence
- It is usually not possible to determine the temporal relationship between disease and exposure.
- Other factors that may contribute to the disease, particularly past exposure to factors other than the factor under study, are not taken into consideration.
These studies provide limited information about the relationship between exposure and the outcome of interest. This category includes descriptive studies, such as case reports and case series, and expert opinion, which is often based on clinical experience.
A case study describes clinical characteristics or other interesting features of a single patient or a series of pa-tients. The latter is referred to as a case series.
Expert opinion often is used to make recommendations. This type of evidence includes findings from expert panels and committees and the opinions of respected experts in a particular field.
often is used to make recommendations. This type of evidence includes findings from expert panels and committees and the opinions of respected experts in a particular field.
Other Study Designs
A meta-analysis is a systematic structured process, not merely a literature review. It combines results from more than one investigation to obtain a weighted average of the effect of a variable or intervention on a defined outcome. This approach can increase precision of the exposure to the outcome measured, although it is important to add that the validity of the conclusions of the meta-analysis depends largely on the quality of its component studies. Results are usually presented in a graph that illustrates the measure of association by each study type and the overall summary association (Fig. 1).
A decision analysis is a type of study that uses mathematical models of the sequences of multiple strategies to determine which are optimal. The basic framework is the decision tree in which branches of the tree represent key probabilities or decisions. Decision analysis is driven by key assumptions. Ideally the assumptions are based on data that may include findings from meta-analyses. Often a decision analysis is undertaken when there are inadequate data to perform a meta-analysis (Fig. 2).
Cost-benefit analysis and cost-effectiveness analysis are related analytic methods that compare health care practices or techniques in terms of their relative economic efficiencies in providing health benefits. In a cost-effectiveness analysis, the net monetary costs of a health care intervention are compared with some measure of clinical outcome or effectiveness. In a cost-benefit analysis, the net monetary costs of a health care intervention typically are compared with the net monetary costs of the clinical outcome or effectiveness. Therefore, a cost-benefit analysis compares costs associated with an intervention with monetary benefits from the use of that intervention. The advantage of a cost-benefit analysis is the ability to use dollars for comparison across interventions. The disadvantage is the difficulty in assigning a monetary value to health status or quality of life.
and are related analytic methods that compare health care practices or techniques in terms of their relative economic efficiencies in providing health benefits. In a , the net monetary costs of a health care intervention are compared with some measure of clinical outcome or effectiveness. In a , the net monetary costs of a health care intervention typically are compared with the net monetary costs of the clinical outcome or effectiveness. Therefore, a cost-benefit analysis compares costs associated with an intervention with monetary benefits from the use of that intervention. The advantage of a cost-benefit analysis is the ability to use dollars for comparison across interventions. The disadvantage is the difficulty in assigning a monetary value to health status or quality of life.
Developing Evidence-Based Recommendations
Having stated the clinical question and assembled and graded the literature using the levels just outlined, recommendations are formulated according to the quality and quantity of evidence. Based on the highest available level of evidence, recommendations are provided and graded according to the following categories:
- There is good evidence to support the recommendation.
- There is fair evidence to support the recommendation.
- There is insufficient evidence to support the recommendation; however, the recommendation may be made on other grounds.
- There is fair evidence against the recommendation.
- There is good evidence against the recommendation.
This method explicitly links recommendations to the evidence. Determination of the quality of the evidence and the strength of recommendations are based on good, fair, or insufficient evidence. These descriptors address the levels of evidence and also provide a qualitative review of the evidence in terms of its methodologic strengths and weaknesses. A prerequisite for inclusion of each study in the analysis is that it provides overall evidence of "good quality."
It is important to note that an exact correlation does not exist between the strength of the recommendation and the level of evidence (ie, an "A" grade does not necessarily require Level I evidence, nor does Level I evidence necessarily lead to an "A" grade). For example, for some clinical issues a randomized trial is not possible for medical or ethical reasons, and recommendations must be based on evidence from other types of studies (Level II-2, II-3). In other cases, high-quality studies have produced conflicting results, or evidence of significant benefit is offset by evidence of important harm from the intervention. Although these studies may be randomized controlled trials (Level I), insufficient or conflicting evidence would result in a "C" recommendation.
Implications for Practice
Medicine will continue to face the rapid introduction of new technologies, rationing of health resources, and increasing attention to the quality and outcomes of medical care. Physicians will have to acquire the skills necessary to review the medical literature critically to identify the best evidence in managing patients. This process for developing practice guidelines identifies available evidence and constructs recommendations based on the best evidence so that obstetrician–gynecologists can continue to provide the highest quality of care.
Accuracy: The degree to which a measurement or an estimate based on measurements represents the true value of the attribute that is being measured.
Bias: Deviation of results or inferences from the truth, or processes leading to such deviation; it is any trend in the collection, analysis, interpretation, publication, or review of data that can lead to conclusions that are systematically different from the truth. Three frequently occurring types of bias include selection bias, information bias, and confounding. Selection bias is error due to systematic differences in characteristics between those who are selected for study and those who are not. Information bias, also called observational bias, is a flaw in measuring exposure or outcome data that results in different quality (accuracy) of information between comparative groups. Recall bias is an example of information bias. The third example of bias, confounding, describes a situation in which the effects of two processes are not separated; it is the distortion of the apparent effect of an exposure on risk brought about by the association with other factors that can influence the outcome.
Confidence interval: An indication of the variability of a point estimate, such as an odds ratio or relative risk. In general, the wider the confidence interval, the less precise the point estimate. The 95% confidence interval is often used. As an example, if the 95% confidence interval does not overlap 1.0, then one would reject the null hypothesis.
Confounding variable (syn: confounder): A variable that can cause or prevent the outcome of interest, is not an intermediate variable, and is associated with the factor under investigation. Unless it is possible to adjust for confounding variables, their effects cannot be distinguished from those factor(s) being studied. Bias can occur when adjustment is made for any factor that is caused in part by the exposure and is also correlated with the outcome.
Incidence: The number of instances of illness commencing, or persons falling ill, during a given period in a specified population. More generally, the number of new events (eg, new cases of a disease in a defined population) within a specified period.
Null hypothesis (test hypothesis): The statistical hypothesis that one variable has no association with another variable or set of variables, or that two or more population
distributions do not differ from one another. In simplest terms, the null hypothesis states that the results observed in a study, experiment, or test are no different from what might have occurred as a result of the operation of chance alone.
Odds ratio (syn: cross product ratio, relative odds):
The ratio of two odds. The exposure odds ratio for a set
of case control data is the ratio of the odds in favor
of exposure among the cases (a/b) to the odds in favor of exposure among noncases (c/d). A 2 • 2 table (Table 1) can be used to illustrate this calculation of odds ratios.
P-value: The probability that a test statistic would be as extreme or more extreme than observed if the null hypothesis were true. The letter P, followed by the abbreviation n.s. (not significant) or by the symbol < (less than) and a decimal notation such as 0.01 or 0.05, is a statement of the probability that the difference observed could have occurred by chance if the groups were really alike (ie, under the null hypothesis). Investigators may arbitrarily set their own significance levels, but in most biomedical and epidemiologic work, a study result whose probability value is less than 5% (P <0.05) or 1% (P <0.01) is considered sufficiently unlikely to have occurred by chance and would justify the designation "statistically significant." By convention, most investigators choose P <0.05 as statistically significant.
Power (statistical power): The ability of a study to demonstrate an association if one exists. The power of the study is determined by several factors, including the frequency of the condition under study, the magnitude of the effect, the study design, and sample size.
Prevalence: The number of events (eg, instances of a given disease or other condition) in a given population at a designated time; sometimes used to mean prevalence rate. When used without qualification, the term usually refers to the situation at a specified time (point prevalence).
Relative risk: The ratio of risk of disease or death among the exposed to that of the risk among the unexposed; this usage is synonymous with risk ratio. If the relative risk is above 1.0, then there is a positive association between the exposure and the disease; if it is less than 1.0, then there is a negative association.
Sensitivity and specificity: Sensitivity is the proportion of truly diseased persons in the screened population who are identified as diseased by the screening test. Specificity is the proportion of truly nondiseased persons who are so identified by the screening test. Table 2 illustrates these quantities.
In screening and diagnostic tests, the probability that a person with a positive test is a true positive (ie, does have the condition) is referred to as the predictive value of a positive test. The predictive value of a negative test is the probability that a person with a negative test does not have the condition. The predictive value of a screening test is determined by the sensitivity and specificity of the test and by the prevalence of the condition for which the test is being used.
a = Diseased individuals detected by the test (true positives)
b = Nondiseased individuals positive by the test (false positives)
c = Diseased individuals not detected by the test (false negatives)
d = Nondiseased individuals negative by the test (true negatives)
Sensitivity = a/a+c; specificity = d/b+d.
Type I error: The error of rejecting a true null hypothesis (ie, declaring that a difference exists when it does not).
Type II error: The error of failing to reject a false null hypothesis (ie, declaring that a difference does not exist when in fact it does).
Asilomar Working Group on Recommendations for Reporting of Clinical Trials in the Biomedical Literature. Checklist of information for inclusion in reports of clinical trials. Ann Intern Med 1996;124:741–743
Chalmers TC, Smith H Jr, Blackburn B, Silverman B, Schroeder B, Reitman D, et al. A method for assessing the quality of a randomized control trial. Control Clin Trials 1981; 2:31–49
DuRant RH. Checklist for the evaluation of research articles.
J Adolesc Health 1994;15:4–8
Grisso JA. Making comparisons. Lancet 1993;342:157–160
Guyatt GH, Sackett DL, Cook DJ. Users' guides to the medical literature. II. How to use an article about therapy or prevention. A. Are the results of the study valid? Evidence-Based Medicine Working Group. JAMA 1993;270:2598–2601
Guyatt GH, Sackett DL, Cook DJ. Users' guides to the medical literature. II. How to use an article about therapy or prevention. B. What were the results and will they help me in caring for my patients? Evidence-Based Medicine Working Group. JAMA 1994;271:59–63
Guyatt GH, Sackett DL, Sinclair JC, Hayward R, Cook DJ, Cook RJ. Users' guides to the medical literature. IX. A method for grading health care recommendations. Evidence-Based Medicine Working Group. JAMA 1995;274:1800–1804
Hadorn DC, Baker D, Hodges JS, Hicks N. Rating the quality of evidence for clinical practice guidelines. J Clin Epidemiol 1996;49:749–754
Hayward RS, Wilson MC, Tunis SR, Bass EB, Guyatt G. Users' guides to the medical literature. VIII. How to use clinical practice guidelines. A. Are the recommendations valid? The Evidence-Based Medicine Working Group. JAMA 1995; 274:570–574
Jaeschke R, Guyatt GH, Sackett DL. Users' guides to the medical literature. III. How to use an article about a diagnostic test. B. What are the results and will they help me in caring for my patients? The Evidence-Based Medicine Working Group. JAMA 1994;271:703–707
Laupacis A, Wells G, Richardson WS, Tugwell P. Users' guides to the medical literature. V. How to use an article about prognosis. Evidence-Based Medicine Working Group. JAMA 1994;272:234–237
Naylor CD, Guyatt GH. Users' guides to the medical literature. X. How to use an article reporting variations in the outcomes of health services. The Evidence-Based Medicine Working Group. JAMA 1996;275:554–558
Naylor CD, Guyatt GH. Users' guides to the medical literature. XI. How to use an article about a clinical utilization review. Evidence-Based Medicine Working Group. JAMA 1996;275: 1435–1439
Oxman AD. Checklists for review articles. BMJ 1994;309: 648–651
Oxman AD, Cook DJ, Guyatt GH. Users' guides to the medical literature. VI. How to use an overview. Evidence-Based Medicine Working Group. JAMA 1994;272:1367–1371
Oxman AD, Sackett DL, Guyatt GH. Users' guides to the medical literature. I. How to get started. The Evidence-Based Medicine Working Group. JAMA 1993;270:2093–2095
Peipert JF, Gifford DS, Boardman LA. Research design and methods of quantitative synthesis of medical evidence. Obstet Gynecol 1997;90:473–478
Richardson WS, Detsky AS. Users' guides to the medical literature. VII. How to use a clinical decision analysis. A. Are the results of the study valid? Evidence-Based Medicine Working Group. JAMA 1995;273:1292–1295
Wilson MC, Hayward RS, Tunis SR, Bass EB, Guyatt G. Users' guides to the medical literature. VIII. How to use clinical practice guidelines. B. What are the recommendations and will they help you in caring for your patients? The Evidence-Based Medicine Working Group. JAMA 1995;274:1630–1632
Developed under the direction of the ACOG Committee on Practice Patterns:
James R. Scott, MD, Chair
Daniel W. Cramer, MD, ScD
Herbert B. Peterson, MD
Benjamin P. Sachs, MD
Mary L. Segars Dolan, MD, MPH
Stanley Zinberg, MD
Director of Practice Activities
Nancy E. O'Reilly, MHS
Manager, Practice Activities
Peter J. Sebeny