you're reading...
...Other Thoughts

4. Evaluating the Evidence

the evidence
4. Appraise that evidence for its validity (closeness to the truth) and applicability (usefulness in clinical practice)

Step 1: Evaluating the Validity of a Therapy Study

We have now identified current information which can answer our clinical question. The next step is to read the article and evaluate the study.

There are three basic questions that need to be answered for every type of study:

  • Are the results of the study valid?
  • What are the results?
  • Will the results help in caring for my patient?

This tutorial will focus on the first question: are the results of the study valid? The issue of validity speaks to the “truthfulness” of the information. The validity criteria should be applied before an extensive analysis of the study data. If the study is not valid, the data may not be useful.

The evidence that supports the validity or truthfulness of the information is found primarily in the study methodology. Here is where the investigators address the issue of bias, both conscious and unconscious. Study methodologies such as randomization, blinding and accounting for all patients help insure that the study results are not overly influenced by the investigators or the patients.

Evaluating the medical literature is a complex undertaking. This session will provide you with some basic criteria and information to consider when trying to decide if the study methodology is sound. You will find that the answers to the questions of validity may not always be clearly stated in the article and that clinicians will have to make their own judgments about the importance of each question.

Once you have determined that the study methodology is valid, you must examine the results and their applicability to the patient. Clinicians may have additional concerns such as whether the study represented patients similar to his/her patients, whether the study covered the aspect of the problem that is most important to the patient, or whether the study suggested a clear and useful plan of action.

Note: The questions that we used to test the validity of the evidence are adapted from work done at McMaster University. See the References/Glossary unit: ‘Users’ Guides to the Medical Literature.’

Read the following article to determine if the article meets the criteria for validity.

Massie BM, Irbesartan in patients with heart failure and preserved ejection fraction. N Engl J Med. 2008 Dec 4;359(23):2456-67. PubMed PMID: 19001508. You can view a copy of the article that is marked to show you where the validity information is found.

Are the results valid?

These questions address the issues of validity and the methodology of the study:

Did intervention and control groups start with the same prognosis?

1. Were patients randomized?

The assignment of patients to either group (treatment or control) must be done by a random allocation. This might include a coin toss (heads to treatment/tails to control) or use of randomization tables, often computer generated.

Research has shown that random allocation comes closest to insuring the creation of groups of patients who will be similar in their risk of the events you hope to prevent. Randomization balances the groups for known prognostic factors (such as age, weight, gender, etc.) and unknown prognostic factors (such as compliance, genetics, socioeconomics, etc.).  This reduces the chance of over-representation of any one characteristic within the study groups.

More information: Treatment allocation in controlled trials: why randomise? Douglas G Altman & J Martin Bland BMJ 1999;318:1209-1209 (1 May)

Massie article: Study procedures: Eligible patients were treated with single-blind placebo for 1 to 2 weeks before randomization; those who successfully completed this run-in phase and whose condition remained clinically stable were randomly assigned in a 1:1 ratio to receive irbesartan or matching placebo. The randomization schedule was implemented with the use of an interactive voice-response system. [page 2457]

2. Was group allocation concealed?

The randomization sequence should also be concealed from the clinicians and researchers of the study to further eliminate conscious or unconscious selection bias. Concealment (part of the enrollment process) ensures that the researchers cannot predict or change the assignments of patients to treatment groups. If allocation is not concealed it may be possible to influence the outcome (consciously or unconsciously) by changing the enrollment order or the order of treatment which has been randomly assigned. Concealed allocation can be done by using a remote call center for enrolling patients or the use of opaque envelopes with assignments.  This is different from blinding which happens AFTER randomization.

More information: Concealing treatment allocation in randomised trials Douglas G Altman & Kenneth F Schulz BMJ 2001;323:446-447 (25 August)

Massie article: Study procedures: The randomization schedule was implemented with the use of an interactive voice-response system. [Page 2457] This methodology would conceal the randomized allocation scheme.

3. Were patients in the study groups similar with respect to known prognostic variables?

The treatment and the control group should be similar for all prognostic characteristics except whether or not they received the experimental treatment. This information is usually displayed in Table 1, which outlines the baseline characteristics of both groups.  This is a good way to verify that randomization resulted in similar groups.

Massie article: Table 1: The study groups did not differ significantly in baseline characteristics. [Page 2460]

Was prognostic balance maintained as the study progressed?

4. To what extent was the study blinded?

Blinding means that the people involved in the study do not know which treatments were given to which patients. Patients, researchers, data collectors and others involved in the study should not know which treatment is being administered. This eliminates bias and preconceived notions as to how the treatments should be working. When it is difficult or even unethical to blind patients to a treatment, especially a surgical treatment, then a “blinded” clinician or researcher is needed to interpret the results.

More information: Blinding in clinical trials and other studies Simon J Day & Douglas G Altman BMJ 2000;321:504 (19 August)

Massie article:  Patients were blinded using a matched placebo. All investigators and committee members who were involved in the conduct of the study (except for members of the data and safety monitoring board) were unaware of study-group assignments. Data analysts were blinded.  The sponsors or a contract research organization collected the trial data, which were then analyzed at the Statistical Data Analysis Center at the University of Wisconsin, Madison, independently of the sponsors and according to a predefined statistical analysis plan. Adjudicators were blinded. Deaths and hospitalizations were adjudicated by members of an independent end-point committee who were unaware of study-group assignments and used pre specified criteria. [Page 2458]

Were the groups prognostically balanced at the study’s completion?

5. Was follow-up complete?

The study should begin and end with the same number of patients in each group. Patients lost to the study must be accounted for or risk making the conclusions invalid. Patients may drop out because of the adverse effects of the therapy being tested. If not accounted for, this can lead to conclusions that may be overly confident in the efficacy of the therapy.  Good studies will have better than 80% follow-up for their patients. When there is a large loss to follow-up, the lost patients should be assigned to the “worst-case” outcomes and the results recalculated. If these results still support the original conclusion of the study then the loss may be acceptable.

Massie Article: The mean follow-up time was 49.5 months, and the trial included 16,798 patient-years of follow-up. At the end of the study, vital-status data were not available for 29 patients (1%) in the irbesartan group and 44 patients (2%) in the placebo group. [Page 2459]

6. Were patients analyzed in the groups to which they were first allocated?

Anything that happens after randomization can affect the chances that a patient in a study has an event. Patients who forget or refuse their treatment should not be eliminated from the study results or allowed to “change groups”. Excluding noncompliant patients from a study group may leave only those that may be more likely to have a positive outcome, thus compromising the unbiased comparison that we got from the process of randomization. Therefore all patients must be analyzed within their assigned group. Randomization must be preserved.  This is called “intention to treat” analysis.

More information: The effects of excluding patients from the analysis in randomised controlled trials: meta-epidemiological study. Nüesch E. et al. BMJ. 2009 Sep 7;339:b3244

Massie article:   Statistical analysis: Data from all patients who underwent randomization were analyzed according to the intention-to-treat principle. [Page 2458]  In addition, Table 2 shows results for primary outcomes that includes all patients in the trial. [Page 2463]

7. Was the trial stopped early?

Stopping a trial early may provide an incomplete picture of the real effect of an intervention.  Trials ended early may compromise randomization if they stop at a “random high” when prognostic factors may temporarily favor the intervention group.  When study size and the number of events are small, stopping early may overestimate the treatment effective.

Massie article:  The study was not stopped early.

Are the results of this study valid?

Yes. This study methodology appears to be sound and the results should be valid.

Key validity issues for studies of Therapy:

  • randomization
  • concealed allocation
  • baseline similarities
  • blinding
  • follow-up complete
  • intention-to-treat

Guyatt, G. Rennie, D. Meade, MO, Cook, DJ.  Users’ Guide to Medical Literature: A Manual for Evidence-Based Clinical Practice, 2nd Edition 2008.

Step 2: What are the results?

How large was the treatment effect?
What was the relative risk reduction?
What was the absolute risk reduction?

How precise was the estimate of the treatment effect?
What were the confidence intervals?

From the abstract: During a mean follow-up of 49.5 months, the primary outcome occurred in 742 patients in the irbesartan group and 763 in the placebo group. Primary event rates in the irbesartan and placebo groups were 100.4 and 105.4 per 1000 patient-years, respectively (hazard ratio, 0.95; 95% confidence interval [CI], 0.86 to 1.05; P = 0.35).Overall rates of death were 52.6 and 52.3 per 1000 patient-years, respectively (hazard ratio, 1.00; 95% CI, 0.88 to 1.14; P = 0.98).Rates of hospitalization for cardiovascular causes that contributed to the primary outcome were 70.6 and 74.3 per 1000 patient-years, respectively (hazard ratio, 0.95; 95% CI, 0.85 to 1.08; P = 0.44). There were no significant differences in the other pre specified outcomes.  [Page 2456]The irbersartan and placebo groups did not differ for the primary composite endpoint, its individual components or the secondary composite endpoints. Irbesartan did not improve the outcomes of patients with heart failure and a preserved left ventricular ejection fraction.

Primary Outcome with Component Events

2 x 2 Table for Calculating the Effect Size for CV Hospitalizations

Hospitalization for CV Not Hospitalized for CV Totals
Irbesartan Group



Placebo Group




Experimental Event Rate (EER) = 521 / 2067 = 25.2%
outcome present / total in experimental group

Control Event Rate (CER) = 537 / 2061 = 26%
outcome present / total in control group

Absolute Risk Reduction (ARR) = 26% – 25.2% = .8%
is the arithmetic difference between the rates of events in the experimental and control group. An Absolute Risk Reduction (ARR) refers to the decrease of a bad event as a result of the intervention. An Absolute Benefit Increase (ABI) refers to the increase of a good event as the result of the intervention. [ARR = EER-CER]

Relative Risk Reduction (RRR) = .8% / 26% = 4%
is the proportional reduction in risk between the rates of events in the control group and the experimental group. Relative Risk Reduction is often a larger number than the ARR and therefore may tend to exaggerate the difference. [RRR = EER – CER/CER]

Numbers Needed to Treat (NNT) = for this study not significant
is the number of patients who need to be treated to prevent one bad outcome or produce one good outcome. In other words, it is the number of patients that a clinician would have to treat with the experimental treatment to achieve one additional patient with a favorable outcome. [NNT = 1/ARR]

Confidence Intervals
are a measure of the precision of the results of a study. For example, “36 [95% CI 27-51]”, a 95%CI range means that if you were to repeat the same clinical trial a hundred times you can be sure that 95% of the time the results would fall within the calculated range of 27-51. Wider intervals indicate lower precision; narrow intervals show greater precision.

P value
refers to the probability that any particular outcome would have arisen by chance. The smaller the P value the less likely the data was by chance and more likely due to the intervention. Standard scientific practice, usually deems a P value of less than 1 in 20 (expressed as P=.05) as “statistically significant”. The smaller the P value the higher the significance.

Clinical versus Statistical Significance
“Although it is tempting to equate statistical significance with clinical importance, critical readers should avoid this temptation. To be clinically important requires a substantial change in an outcome that matters. Statistically significant changes, however, can be observed with trivial outcomes. And because statistical significance is powerfully influenced by the number of observations, statistically significant changes can be observed with trivial (small) changes in important outcomes. Large studies can be significant without being clinically important and small studies may be important without being significant.” [[http://www.acponline.org/clinical_information/journals_publications/ecp/julaug01/primer.htm]

Clinical significance has little to do with statistics and is a matter of judgment. Clinical significance often depends on the magnitude of the effect being studied. It answers the question “Is the difference between groups large enough to be worth achieving?” Studies can be statistically significant yet clinically insignificant.

For example, a large study might find that a new antihypertensive drug lowered BP, on average, 1 mm Hg more than conventional treatments. The results were statistically significant with a P Value of less than .05 because the study was large enough to detect a very small difference. However, most clinicians would not find the 1 mm Hg difference in blood pressure large enough to justify changing to a new drug. This would be a case where the results were statistically significant (p value less than .05) but clinically insignificant.

Guyatt, G. Rennie, D. Meade, MO, Cook, DJ.  Users’ Guide to Medical Literature: A Manual for Evidence-Based Clinical Practice, 2nd Edition 2008.

Step 3: How can I apply the results to patient care?

Were the study patients similar to my population of interest? 
Does your population match the study inclusion criteria?
If not, are there compelling reasons why the results should not apply to your population?

Were all clinically important outcomes considered? 
What were the primary and secondary endpoints studied?
Were surrogate endpoints used?

Are the likely treatment benefits worth the potential harm and costs?
What is the number needed to treat (NNT) to prevent one adverse outcome or produce one positive outcome?
Is the reduction of clinical endpoints worth the increase of cost and risk of harm?

It appears from our brief analysis that this article meets the criteria for validity. To complete the analysis you would need to review the results and determine if they are applicable to Pauline.Our patient is a 73 year old female with heart failure and a left ejection fraction of 40%.  She has no other remarkable past medical history and is living on her own.  She meets the inclusion criteria for this study.The results show that irbesartan may not be effective for our patient.  There are also drug interactions to consider.  The next step is to talk with the patient. Patient

Take a moment to reflect on how well you were able to conduct the steps in the EBM Process.

Did you ask a relevant, well focused question? Do you have fast and reliable access to the necessary resources? Do you know how to use them efficiently? Did you find a pre-appraised article? If not, was it difficult to critically evaluate the article?

Guyatt, G. Rennie, D. Meade, MO, Cook, DJ.  Users’ Guide to Medical Literature: A Manual for Evidence-Based Clinical Practice, 2nd Edition 2008.

Note: For validity criteria for other types of studies, see the following supplements:  Diagnosis |Prognosis Etiology/Harm Systematic Review

Also in this series:

1.  What is EBM

2. The Clinical Question

3. The Literature Search

5. Testing Your Knowledge


Source: UNC



No comments yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: