|
The first is the issue of feasibility. Randomized trials cannot be done
for many, perhaps most, of the questions that arise in clinical practice. Randomized
clinical trials would be frustrating, unfeasible, or unethical for evaluating all of these
situations. One of them is "unstable" therapy, where potent new agents appear
while the trial is in progress. For example, suppose we do a trial to test either surgery
or a medical therapy versus the balloon angioplasty for coronary disease. While that trial
is in progress people may develop stents or other new ways of reaming out the coronary
artery that may be better than balloons. By the time the trial is done, whatever we found
may be obsolete because the stent may have replaced the balloon.
A second problem is to compare a new agent versus multiple active treatments. For example,
suppose I have produced a new treatment for hypertension: Alvanol, my drug - it is
wonderful! It lowers blood pressure gently, but firmly. It does not affect any metabolism.
And whatever needs to be done to the libido it does. If the libido needs to be raised, it
raises it. If it needs to be lowered, it lowers it. It is an absolutely wonderful drug for
the treatment of hypertension. What do I compare it against? There are 85 different new
drugs out there for hypertension. And by the time you finish reading, there may be 86. How
do I choose which one? And if I compare it against placebo you may say I am unethical.
What about new physical changes in pharmaceutical or other agents? If I bring out a new
suture material: must I do randomized trials in people to show that it works, or will you
be satisfied with various in vitro or animal studies?
The long-term adverse effects of treatment are too complicated and take too long to occur
for us to study all of them with randomized trials.
Most forms of new diagnostic technology cannot be studied with randomized trials, because
the outcomes of the diagnostic technology are not just diagnoses. They are the therapeutic
process which itself would need another randomized trial.
And then, finally, the suspected "noxious" agents of etiology have not been
tested with randomized trials and usually cannot be. Whenever we suspect that things are
noxious, such as cigarette smoking or the use of any particular agents that you want to
call a risk factor, it is already beyond the time when randomized trials could be done.
Another problem we have with randomized trials is that there are different kinds of
clinical therapy: remedial, primary prophylaxis, and secondary prophylaxis (fig. 2).
Remedial therapy is the classical kind of clinical treatment, where the baseline state is
a manifestation of disease such as headache or a pain, and the outcome event is the relief
or removal of that manifestation. Aspirin treatment for pain is a classical example of
this process.
Table 2: Types of clinical therapy
Types of therapy |
Baseline state |
Outcome event |
Example |
| remedial |
manifestation of disease |
relief or removal |
Aspirin for pain |
prophylactic
(1: contrapathic) |
disease absent |
disease present |
vaccination vs Polio |
prophylactic
(2: contratophic) |
disease present |
adverse progression |
b -blocker after myocardial infarction |
Primary prevention: The baseline state is that "the disease is absent". The
outcome event is the presence of disease. Vaccination against poliomyelitis is a classical
example of primary prophylaxis.
Secondary prevention: A disease is already present, such as diabetes mellitus or myocardial
infarction. Our goal is to prevent the adverse progression of that disease. Giving b-blockers after myocardial infarction to prevent arrythmias
or to prevent recurrence of myocardial infarction or to prevent death is that kind of
secondary prevention.
A distinguished American statistician, some years ago, did not understand these
distinctions. When I told him that the treatment of cancer was prophylaxis whenever we use
death as the endpoint, he simply did not understand it. However, if you recognize the
distinction between remedial trials and prophylactic trials, you will realize that
randomized trials have been extremely successful in remedial therapy. You have a
homogenous group who all have the same target, such as headache or a pain to be remedied.
The target can be directly observed for change. The change does not take long to occur,
and huge numbers are not required for "statistical significance". If you think
back to the first trial that is usually regarded as the opening of the modern era of
randomized trials - the study of the streptomycin in the U.K. in the middle of the 1940s -
it was a remedial trial. The patients had advanced tuberculosis, demonstrated on chest
x-rays. One can follow what happened on the chest x-rays. It was easy to do.
The trials for which we have major problems are those of prophylactic therapy, where we
have a heterogeneous spectrum of people at baseline. The trials take a long time to do,
because the outcome events are not there at the beginning (we are trying to keep them from
happening.) We need large numbers of patients, often requiring collaboration of multiple
institutions. During the long duration of follow-up, there can be many intervening
changes. After the onset of therapy we have the problem that many different outcome events
can occur, and we have to choose which ones we are going to look at. There are cumbersome
logistics and high costs. And, if you think about the controversies over randomized trials
during the past few decades, they have all been over prophylactic studies. Dating back
almost 30 years ago, to the University Group Diabetes Study, the MRFIT trial, and to the
trials of anticoagulant therapy, they have all been prophylactic trials, and they were
difficult to do for all of these reasons. That is the problem of feasibility.
In generalizability, we have the problem that the results in the selected groups, agents,
and outcomes may not always apply to other clinical situations. Whatever agents have been
tested, they may not be the agents that doctors want to know about. The particular group
who are entered into the trial, or the events used as outcomes in the trial, may not be
the things that are desired for clinical practice.
Beyond all these problems, there are conflicting policies in the goals and analysis of
trials. In the fastidious policy - which some people call "explanatory" - the
goal of the trials is to get unbiased, "reliable" results. And the analysis is
done with the so-called unbiased "intention-to-treat" analysis. In the pragmatic
policy, one wants to get clinically pertinent results and to study the reality of what
happened. This is not the place for me to go into the conflict between these two policies.
They are both correct, and the essence of tragedy occurs when you have the destructive
collision of two opponents, both of whom are right. In the case of these two policies, if
you design a trial with one policy it will be unsatisfactory for the other.
With respect to the pertinence of the trials: what the randomized trial gives us is
results for an "average" randomized patient that may not pertain to distinctive
clinical subgroups in the spectrum of the "disease". What are some of those
subgroups?
The distinctive clinical subgroups depend on the anticipated prognosis and on the
tolerance or acceptability of the proposed treatment. That is how clinicians choose which
patients are going to be treated. They have to have a satisfactory prognosis, and they
have to be able to tolerate the therapy. The prognosis that is commonly contemplated by a
thoughtful clinician depends on the severity of the ailment, but "severity" is
usually poorly defined and inadequately delineated by the clinician. Unfortunately most
clinicians have not learned how to articulate what they know. We as statisticians do not
get them to articulate it; so we therefore ignore it. But clinical severity is usually
indicated by prognostic staging systems, like: the TNM stages (tumor, nodes and
metastasis) for the extensiveness of cancer; the Killip classes for myocardial infarction;
or the Glasgow coma scale for patients with stroke and other neurological defects. They
are constantly used in prognostic decisions, and many other things are constantly used,
and yet they are overlooked in our designs and subgroupings of randomized trials.
The formation of distinctive clinical subgroups is not done well with customary
statistical approaches, and they are done as "biologic unions" in the sense of a
Boolean union of different attributes. If I tell you that a patient has cancer in the
liver, cancer in bone, cancer in lungs, the computer will never know, unless we tell it,
that these can all be catalogued in the union of metastatic cancer. If I tell you that a
patient has dyspnea and extended neck veins and an enlarged liver and palpable edema, you
may want to enter each entity into the computer as a separate binary variable, but only a
knowledgeable person can tell you that they fit in the biologic union of congestive heart
failure. The computer may find intersections or interactions; but it does not find unions.
The knowledgeable clinical biologist has to define them.
Furthermore, important prognostic variables may be absent or inadequately classified. They
are absent, because they are not articulated, or they are not gathered, or they are
perhaps collected as individual manifestations, but not adequately classified into
pathophysiological unions, such as congestive heart failure.
We now come to the fourth problem of suitability. The results in the "hard data"
describing baseline and outcome conditions may be unsuitable for the distinctions of
"soft data" needed in clinical practice. What are some of those "soft
data" distinctions?
- The pattern and severity of symptoms in the illness: is the patient symptomatic or
asymptomatic? Are the symptoms mild or severe?
- The auxometry or rate of progression of the illness: does the cancer or the coronary
disease appear to be stable and static, or is it rapidly progressing?
- The co-morbidity of the associated diseases that may be present beyond the main disease.
It is remarkable how many randomized trials have been done as though human beings exist in
a vacuum with one disease only, and yet most of the people in modern hospitals cannot
escape having more than one disease. We often pay no attention to all the other diseases.
- The responses to previous therapy. One of the best ways of telling how people are going
to progress is to see what happens in the first 24 hours of treatment for things like
congestive heart failure or stroke or pneumonia. These factors are also neglected in our
prognostic classifications.
When we look at the outcome results of treatment, we want to know about the relief of
symptoms, adverse events, changes in functional status, impact on the family, and impact
on "quality of life". We are, after all, treating people, not dogs, not rats,
and not molecules. In the assessment of what treatment accomplishes for people, we have to
recognize that they are people and that the whole purpose of treatment is to deal with
human needs and human aspirations, and not just disease.
Quality of life, in particular, is dealt with in an extremely unsatisfactory way. Ever
since it was discovered that the quality of life required attention, people have used
various psychometric principles to appraise "health status" and "quality of
life". This is usually done by collecting information on multiple components or items
that are then aggregated in weighted combinations. The components and weights are chosen
by experts and by statistical "models". Patients are not asked directly what
they feel and want. The results often have high coefficients for statistical
"reliability" and "validity", but they may lack the face validity of
common sense.
The measurements of quality of life are usually inadequate because the emphasis is on
functional status, but "quality of life" is not a functional status; it is an
individual personīs reaction to functional status and to other, non-medical, aspects of
life. And it has to be determined by asking each person directly, not by mathematical
aggregation of multiple items in an "instrument".
It is only if we ask patients directly: "How are you? What would you like to have
done ?" - that we can find out how they feel and what they want. These are some of
difficulties that exist and once we recognize them, we realize what the challenges are
that we have to deal with. In these challenges, we have to construct satisfactory methods
for getting data and satisfactory methods for getting unbiased analyses for the many
circumstances that cannot be studied with randomized trials.
We have to recognize that for cause-effect relationships - between an etiological agent
that produces or promotes disease, or a therapeutic agent that remedies or prevents
disease -, we can use a single model for doing cause-effect reasoning without having to
construct two separate models: one for epidemiologic studies of cause, and the second for
clinical studies of therapy.
The model that we want is a quite simple one that would be used in any laboratory
throughout the world for contemplating cause-effect relationships. We have baseline
states that are exposed to a principal maneuver and a comparative maneuver, and we look at
the outcomes. That is a very simple, straightforward, ordinary model, which any sensible
person would use if his or her mind had not been impaired either by medical education or
by mathematical modeling.
Once we contemplate that model, we realize that we have a baseline state of our total
group that is divided into the active and comparison groups, receiving the active and
comparison maneuvers, being followed to the outcome events. When we want to compare them,
we have to do it "ceteris paribus", i.e., all other things being equal.
What are the things that we want to be equal? When we get from the outside world into the
baseline state, we want an assembly that will give us suitable people for what we are
looking at. When the baseline state is divided into the active and comparison groups, we
want them to have similar susceptibilities to the outcome events. When the maneuvers are
performed, we want the performance to be done in a equitable manner. When these outcome
events are detected, we want them properly detected, and finally, we want the transfer of
these outcome events into the data to be done in a suitable manner.
Table 3: Sources of biased comparisons
| Baseline
susceptibility |
Compared groups have
unequal prognoses for outcome, e.g., "operable" vs. "inoperable" |
| Proficiency
of maneuvers |
Inequalities in performance
of maneuvers, e.g., skill of surgeon, compliance with regimen |
| Detection
of outcomes |
Unequal methods of
surveillance identification e.g., observers affected by knowledge of Rx |
| Transfer
of cohort |
Inequalities in accounting
for all "exposed", e.g., omitting post-operative deaths from
"survival" denominator |
With respect to biased comparisons: there will be bias if the compared groups differ in
baseline susceptibility, that is, if they have unequal prognoses for the outcome. This
bias has been created for decades in the world of surgery where the surgeons pick and
choose the best patients for surgical treatment of cancer. They then say that the people
with metastatic disease or with bad co-morbidity are inoperable. These patients are sent
to have radiotherapy or chemotherapy. The surgeons then quite inappropriately compare
the surgical results in the operable group versus the radiotherapy or chemotherapy in the
inoperable group. That is the kind of biased comparison that randomization is intended to
avoid.
When we look at proficiency of the maneuvers, we have to contemplate inequalities in the
performance of the maneuver, such as the skill of the surgeon. If we want to consider
radical mastectomy versus simple mastectomy, it is important that the surgeons have equal
skill - not that simple surgery be done, say, by a first year medical student, and the
radical mastectomy by a renowned clinical professor.
The regimen has to receive suitable compliance in order to work. It is remarkable how many
doctors and statisticians discovered compliance some years ago. They used to ignore the
fact that many patients donīt take their medicine, particularly in the field of
hypertension, where quality of life can be markedly improved if you just stop taking the
drug.
In detection of outcomes, we may have unequal methods of surveillance and identification.
In epidemiology, when detecting disease is the outcome, we have hordes of studies that
seem to pay no attention to the role of mammography in identifying breast cancer, to the
role of MRIs and CAT scans of the head in identifying various kinds of cerebral cancer, or
to the role of abdominal ultrasound in identifying pancreatic cancer.
Furthermore, we do not want a biased transfer in selecting the exposed group. I remember
some surgeons in my institution many years ago reporting a 90% success rate for a
particular operation. Those of us who are internists could not believe it. It finally
turned out that the 90% success rate was in those patients who left the hospital alive.
All of those who died in the hospital were omitted from the calculation of survival rates.
If we contemplate what we want - admission criteria, baseline state, imposed maneuvers,
outcome events and transfers - the clinical pertinence for individual patients rests on
suitability. We want a suitable delineation of the eligible population for extrapolation
of results; we want a suitable identification of the baseline state, proficiency in the
maneuvers, selection in the outcome event. In addition, we need to recognize the problems
that may arise when extrapolating the results to real life situations. But we also want
similarity in order to get scientific validity in the comparison: equal susceptibility in
the baseline state, equal performance of the maneuvers, equal detection of the outcome
events and equal distribution of the transfers. If these are the things that we want, what
does randomization do for us?
Table 4: Issues addressed by randomisation
|
Suitability |
Similarity |
| Baseline
states |
No |
Yes |
| Outcome
events |
No |
No |
| Active
maneuver |
No |
No |
| Comparison
maneuver |
No |
No |
Does randomization address suitability in baseline states, outcome events, active
maneuvers and comparison maneuvers, in these things? Not at all! That is part of the
overall planning of the research.
Does the randomization address similarity in the baseline states? Yes! (When the
randomization, of course, works and when there are no screw-ups due simply to the luck of
the draw in chance.)
Does the randomization address the detection of outcome events? No! That role is for
double-blinding!
Does the randomization address equality in the performance of the maneuvers? No, that is
the way we plan to give the maneuvers.
Randomization is wonderful for achieving similarity in baseline susceptibility. But for
all the other things, randomization does not work, and we still have to think, even in a
randomized trial, about what all these other things are going to be.
Though the randomized trials have the great virtue that randomized allocation prevents or
reduces susceptibility bias, it is the double-blinding in a clinical trial, not the
randomization, that prevents or reduces detection bias. The randomization helps create the
experimental environment to which we can apply the double-blinding. But the randomization
is not the requirement. The life-table and intention-to-treat analyses may prevent
transfer bias. But once again, that is not a function of randomization. And then, in
exchange for all of these advantages, we have the limitations of the constraints of our
admission criteria, the selected therapy, the performances of therapy, the selected
outcomes and also the feasibility.
So, if we are going to have to do other kinds of studies in order to get the answers that
we want, let us contemplate very specifically what do we want? What we want is an issue in
scientific clinical identification; it is not supplied to us by statistical criteria. One
actually has to think, and we are going to have to get the doctors and force them to think
and to express what is necessary.
We have to recognize that the currently absent uncollected clinical data may contain
crucial clinical factors, which when suitably identified and classified, can be used for
adjustments that remove or reduce the biases that occur when observational data are used
for therapeutic comparisons.
We have to recognize that with suitable adjustments for clinical factors, observational
data can yield results similar to randomized trials. There have now been several studies
published that have shown that the results of observational studies done in parallel
with a randomized trial are the same as those obtained from the randomized study. However,
everyone tends to ignore these studies, because, I suppose, it is a Martin Luther heresy
to conduct an observational study when one could do the sanctified randomized trial.
For example, the Betablocker Heart Attack trial was a randomized trial. It was replicated
with a suitable, "refined" observational cohort study done by Ralph Horwitz and
his colleagues at Yale-New Haven Hospital. When they applied, to the regular patients
being treated at that hospital, the same admission criteria that had been used in the
Betablocker Heart Attack study, applied a prognostic stratification to deal with the
absence of randomization, and looked at the same outcome events, they obtained almost
identical results.
In another study, reported in the prestigious New England Journalof Medicine, the
analysis of insurance claims data led to the conclusion that open prostatectomy had a
lower 5-year mortality rate than transurethral resection for prostatic hyperplasia. The
results were spread throughout the country that the transurethral resection was dangerous
and that people should have the more radical open prostatectomy with its much higher rate
of incontinence and impotence. John Concato, one of my colleagues, and I looked at this
and said: "This is nonsense! They have neglected co-morbidity." We went into
medical records and used a better classification of co-morbidity, we found that when you
have a better classification of co-morbidity, the results are the same. It is just that
patients with more co-morbidity are selected preferentially to receive the simpler, easier
transurethral resection. The co-morbidity will then lead to a higher mortality rate; and
if you have not properly acknowledged and classified co-morbidity, you can then reach the
non-sensible conclusion that the simple operation is dangerous.
Those are the kinds of things that people will have to pay attention to as we go into the
future.
Table 5: Improvements in observational studies
| 1 |
Study "refined"
cohorts, assembled with same criteria used for admission to randomized clinical trials of
same comparision |
| 2 |
Develop improved
clinimetric indexes or rating scales for important "soft" data regarding
baseline state and outcomes |
| 3 |
Develop appropriate
prognostic staging systems to avoid or reduce bias by comparing treatments only for
patients in similar stages |
| 4 |
Note subsequent changes and
reasons for changes in treatment |
| 5 |
Reasons may indicate
important but otherwise neglected outcome events or prognostic alterations |
| 6 |
Develop better analytical
methods to incorporate entire clinical course and outcomes, not just state "at
randomisation" |
| 7 |
Develop better statistical
methods, e.g. conjunctive consolitation, for multivariable analysis |
What are some of the improvements that are needed in the observational studies? We need
to study refined cohorts. A "refined" cohort is assembled with the same criteria
used for admission to a randomized trial of the same comparison. You will find, if you
simply use the same admission criteria, that much of the bias of the observational studies
will vanish because use of the same admission criteria will exclude from the refined
observational cohort some of the people who have either the best or the worst results.
We need to develop improved clinimetric indexes or rating scales for important
"soft" data, regarding baseline state and outcomes. And these improvements have
to be done by people who are knowledgeable about the subject matter. They will not come
simply by preparing 200-item questionnaires and hoping that valuable results will somehow
emerge.
We need to develop appropriate prognostic staging systems and we can then avoid or reduce
bias by comparing treatments only for patients in similar stages.
We can note subsequent changes and reasons for changes in treatment after randomized
trials or non-randomized trials. To use intention-to-treat analysis is splendid if you
want to ignore everything that goes on after treatment is started. But for most patients
and most doctors what happens "after the treatment is started" is a very
important thing. And when treatments are changed or stopped or not complied with, there
are reasons for that. Those reasons may not have been included in the randomization
process. It gave you perhaps equality of groups before treatment was started, but it does
not at all explain what has happened afterwards. The reasons for changes in treatment may
indicate important but otherwise neglected outcome events or prognostic alterations. For
example, I am the person who "discovered" (if that this is the right word)
co-morbidity. How did I discover co-morbidity? It has always been there, of course. But I
discovered it because I was looking at the charts of patients who should have had surgery
and who didnīt! Why didnīt they have surgery? It then became obvious that the
inoperability was because of the co-morbidity. If you look at the reasons why things are
done, you will discover important variables that are being ignored.
We need to develop better analytic methods to incorporate the entire clinical course and
outcomes, not just the state at randomization. And we need to develop better statistical
methods, such as what I have called conjunctive consolidation, for doing things like
multivariable analysis, instead of using a constant linear model. The linear analyses are
wonderful if you want to make a mathematical model feel good, but not if you really
want to understand what is going on in the data. |