Contents

Contributors

Editors:
U. Abel,
A. Koch

Search
Linklist

© Copyright

Published by
symposion logo

Nonrandomized Comparative Clinical Studies -

Proceedings of the International Conference on Nonrandomized Comparative Clinical Studies in Heidelberg, April 10 -11,1997

Order printed volume

The mythology of randomization

U. Abel, A. Koch

Abstract

In biostatistics and medicine one sometimes encounters an extremely negative view or even a categorical rejection of nonrandomized studies. This attitude may be comprehensible from a historical, pragmatic, or educational viewpoint but it is not well-founded on epistemological grounds. In addition, it is potentially harmful.

Usually, randomization is credited with advantages that it does not possess or confer, and the criticism of nonrandomized studies is based on a catalogue of admittedly unhappy examples from the medical literature. Neither these examples nor the existing empirical investigations into the differences between published randomized and nonrandomized studies tell us whether well-designed and carefully analyzed nonrandomized studies would have yielded results that are distinctly or even qualitatively different from those of the randomized trials.

Dogmatic beliefs about randomisation

In this contribution we are going to criticize widespread opinions about randomization. We do not mean to criticize randomization itself. Of course, randomization is a reasonable methodological principle, and, to express it in a slogan-like manner, you should randomize whenever you can.

After its introduction into therapeutical research in 1948, randomization has been a great success story. According to Horwitz [26] the randomized trial may rightly be called a scientific paradigm today. Olkin [34] estimated that currently some 9000 randomized clinical trials are performed every year, although it remains unclear whether this was meant to be the annual incidence of new studies or the prevalence of the studies being conducted. Many authors believe that randomization ultimately owes its attractiveness to the fact that it introduces the method of scientific experimentation into therapeutical research [11, 17, 42, 53].

Nevertheless until today there has been a long standing and often bitter controversy about randomized trials. Virtually all imaginable aspects are at issue: the possibility to perform randomized studies, their adequacy, their ethical justification, and their conclusiveness. Roughly speaking, the front lines of the dispute are as follows: critics of randomization are mainly found in unconventional therapies and in surgery, whereas its supporters - or rather emphatical critics of nonrandomized studies - are particularly frequent in internal medicine, among biostatisticians, and among the regulatory agencies.

It is not exaggerating to say that the controversy about the necessity and the value of randomization is sometimes like a religious dispute. In 1978 Rimm and Bortin [41] remarked that the randomized trial is not only a ritual but has all elements of a religion - they called it TRIALISM -, with gods, devils and 10 commandments the first of which is: Thou shall randomize.

One should admit that a lack of sober and rational consideration of randomization is common among biostatisticians as well [18]. This shows up in two phenomena. On the one hand, some scientists seem to regard randomization as a sort of quality mark without which the products of clinical research, that is to say, the study results, are valueless from the start. Thus, in 1981 Sackett gave the following advice to doctors about how to read clinical journals: "....discard at once all articles on therapy that are not about randomized trials" [14]. An even more extreme view was held by Cowan who wrote "With some exceptions participation of any group of patients in a nonrandomized trial is wholly unjustified and unethical since nothing can be learned from it" (quoted from Royall, [43]). And finally, Sir Richard Doll stated in 1994: "Biases...make outcome research ...as inadequate a means for assessing the value of a specific form of treatment as the outdated technique of comparing the results in a current series of patients with those obtained on other patients in the past" [15].

On the other hand, randomization is often credited with a numer of almost mystical properties.

It is said

  • to be the basis for statistical significance tests
  • to be the basis for causal inferences on treatment effects
  • to be an opportunity for blinding.
  • to lead to balanced groups (in German literature the even more stringent term "structural equality" is widely used.)

[e.g., 3, 5, 20, 29, 46, 47, 48]. All these properties are believed to be theoretical advantages of randomization, and they seem to tell us why a nonrandomized study is of little value when compared to a randomized one. There is only one problem with these claims. Except the last one, which is vague up to the point of being meaningless, they are incorrect. Since they are being stubbornly repeated in the literature on clinical studies we will briefly discuss them one by one.

Without additional assumptions, such as identical observation in both groups, randomization does not "guarantee the validity of statistical tests" or whatever similar statement one can read in the literature. This is simply because if there are observational differences between the groups, the distributions of the outcome variable will be different even in the absence of any treatment effect.

Randomization is also not necessary for testing. This not only follows from the fact that hypothesis testing is a mathematical theory that does not rely on any physical property or activity. It also becomes obvious when one looks at real-world research which does not use randomization, such as epidemiology, or the evaluation of prognostic factors, or diagnostic tests. As Feinstein [17] put it: "If randomization was really required for stochastic decisions on statistical significance, a massive amount of scientific literature would have to be expunged of all the p-values". In reality, p-values are conditional probabilities with the condition being a theoretical one regarding probability distributions that cannot be guaranteed in practice. In both randomized and nonrandomized studies a significant p-value has essentially the same interpretation, namely, it indicates that the results are not merely due to chance alone [23].

As for the second claim, to say that randomization is the basis for causal inferences is both vague and entirely erroneous, whatever the interpretation may be. This lacks a precise statement of what causal inference means, although precise statements do exist [25]. It also reflects an overestimation of the implications and meaning of randomization, which Royall [42] has mockingly called "the closurization principle": You randomize and then you close your eyes". In truth, randomization is not a panacea and is by no means sufficient for causal inferences. Apart from chance imbalance, randomized trials can also suffer from many sorts of severe systematic bias and mistakes, and indeed there are many examples of incredibly poor randomized trials in the literature. Indeed doctors themselves are well aware of the limits of randomized studies and are not necessarily convinced of a treatment effect even if this effect is shown in a series of successive randomized studies

Take, for example, magnesium infusion in suspected acute myocardial infarction. A meta-analysis of seven randomized studies involving more than 1300 patients indicated a highly significant advantage for magnesium compared to no treatment in terms of 1-month mortality. In fact, the overall death rate was only half that in the control group. Nevertheless the value of magnesium remained controversial, so that further studies were implemented. The positive finding was then knocked over by the mega trial ISIS-4 including about 58,000 patients, which gave a nonsignificantly adverse result for magnesium [38]. Incidentally, if one assumes that ISIS-4 finally hit the truth regarding magnesium in acute MI (again this is still controversial !), then this example is reason for serious concern because one may wonder how often such an enormous trial can be done to correct false-positive results from a series of medium-sized randomized studies.

That randomization is not necessary for causal inferences is quite clear. It follows not only from the history of epidemiology but also from the early medical breakthroughs like penicilline or insulin, which were introduced without randomized studies and for which there has hardly been any doubt about causal effects.

The third claim concerning blinding is somewhat amusing. It exploits the fact that the expressions "make possible", "are the basis for", or "are an opportunity" are ambiguous, both in English and in German. They all leave open whether they designate a necessary condition, a sufficient condition or both. Clearly randomization is not sufficient for blinding. And, although it is a reasonable requirement for blinding it is not a necessary one in the logical sense. Even if it were necessary, this could hardly be sold as an advantage of randomization, just as it is not an advantage of beer to be a necessary requirement of the Oktoberfest. Rather vice versa: it is an advantage of double-blind studies that they are randomized.

The last point which is about balance is the most important one. Obviously randomization has something to do with the concept of comparability of treatment groups. Although most biostatistician constantly talk about comparability in clinical trials, the vast majority are unable to give a really precise definition in statistical terms of what comparable means. There seems to be no definition in the literature, either. However, one can derive one from the excellent paper by Holland published in JASA in 1986 [25], which is based on earlier work by Rubin [44]. Holland precisely explains what a causal treatment effect is and what the conditions are, so that in a comparison of two groups one obtains an unbiased estimate of the causal effect. By analyzing Holland’s argument, it becomes clear how a precise and reasonable definition of comparability should look like.

It is simple but not obvious. Comparability of two groups is a context-dependent term. The context consists of the two treatments to be compared and the outcome variable of interest. Within a given context, one can define two groups as comparable if the distribution of the outcome variable conditional on the choice of treatment T1 or T2 does not depend on the treatment group. This definition does not make use of words like "factors" or "structure" or something else that one does not quite understand. It is nice in another respect: Following Holland’s ideas one can show that if groups are comparable in this sense then one obtains an unbiased estimate of the causal treatment effect of T1 relative to T2. This is a property that the concept of comparability should indeed have.

Intuitively it is obvious that comparability as we have defined it can be violated if, apart from the therapy, the groups have further differences. These aspects have been analyzed and illustrated in an large number of publications [e.g., 1, 6, 16, 18, 21, 22, 27, 28, 33, 40, 45, 46, 51, 54]. Table 1 gives a fairly comprehensive list of relevant causes for treatment group differences in an outcome variable. The most important ones can be classified under the catchwords: "differences in structure, in observation, and in experimental environment".

The achievements of randomisation

Now, let us see what randomization does achieve.

It guarantees a control of imbalance in the sense that for all patient variables measurable at the time of randomization, the probability distributions are the same in all treatment arms of the study.

This implies that randomization enables one to makes probability statements on differences between the groups regarding these variables.

Randomization by itself does not guarantee balance with respect to any other aspect listed in Table 1. In particular, it does not guarantee comparability in the strict sense defined above.

Table 1: Causes for observing an "effect" (a difference between two treatmentgroups regarding an outcome variable) in a comparative clinical study
  1. treatment effects (differences in efficacy)
  2. random errors
  3. differences in basic patient characteristics
    • definition of disease
    • diagnostic procedures (e.g. stage migration, zero-time shift)
    • patient selection by the doctor (e.g., exclusion of cases with poor prognosis)
    • selection due to the patients (preferences for certain therapies, compliance)
    • origin and referral of patients to the institution
  4. differences in quality of treatment and doctors’ commitment
  5. differences in the patients’ motivation
  6. differences in general patient care and experimental environment
    • accompanying treatment and ancillary care
    • life-style
    • background and environment (familiy, job, etc.)
  7. differences in observation
    • definition of outcome
    • measurement of outcome
    • quality of data collection and follow-up

One should emphasize, however, that balance with respect to prognostic variables is indeed a point of eminent importance. Lack of balance in these variables is usually the main objection raised against nonrandomized studies, for in theses studies an adjustment is not possible for unknown prognostic variables. This is particularly disturbing if the treatment effects to be investigated are small [37]. Several investigations have shown that these unknown variables may be of considerable importance [e.g., 49, 21].

But even an adjustment for known variables may fail if the ways to measure these variables change. In oncology, for example, it is often overlooked that if stage migration [19] has occurred in the past then matching with respect to stage does not only fail to reduce imbalance but may even be counterproductive because it results in comparisons of nominally matched pairs that in reality are not matched at all.

It is not amazing, therefore, that some authors come to a very negative and pessimistic judgment on nonrandomized studies. Thus Sacks et al. [46] wrote: "...biases in patient selection may irretrievably weight the outcome of the HCT" (HCT=historically controlled trial). And "Can the accuracy of HCT’s be increased? We fear there is little room for improvement in this area."

However, very often, randomized studies are credited with nice properties that are not actively induced by the act of random allocation itself. As it were, they are "inherited" properties, following from the fact that randomized studies are part of a larger category of high-quality studies, namely prospective parallel comparisons with a written protocol, specifying important aspects of patient enrollment, treatment, observation, analysis, and other procedures. One can also put it like this: The advantages of randomized studies are not identical to the advantages conferred by randomization.

Whatever the reasons, randomized trials have a good reputation, they are well accepted by the scientific community and have a relatively high impact in medicine. Therefore, needlessly refraining from randomization is absolutely unwise if one wants to convince other scientists [5].

Note, however, that the consequences of not randomizing depend on the situation. While in the planning phase of a study, "human" aspects like the impact of the results have to be taken into account, the only aspect that matters to the reader of the finished and published study is an epistemological one, namely, the potential bias due to imbalance.

Let us summarize the arguments developed so far.:Although randomization clearly adds to the conclusiveness and credibility of studies, especially when it comes to small treatment effects, we have shown that there is no fundamental difference in the conclusiveness of randomized and nonrandomized studies. This is simply because there is only a loose, vague link between the balance induced by randomization and the comparability of the groups.

Therefore, a rejection of nonrandomized studies is unjustified on theoretical grounds! This is important to note because there are situations in which randomizing is impossible or inappropriate [4], or in which it is possible but no randomized study exists (an example of this is high-dose chemotherapy of many carcinomas), or in which there are both randomized and nonrandomized studies of the same question

In surgery, randomization seems to be especially problematic and relatively infrequent [30, 31, 35, 45, 50, 52]. This shows that there is a definite demand for nonrandomized treatment evaluations. So when biostatisticians are reluctant to deal with nonrandomized studies and neglect their further methodological development, they are not only somewhat unrealistic but also partly responsible if in these situations studies are worse than they could be.

Randomized vs nonrandomized studies: empirical comparisons

So far, we have analyzed theoretical differences between randomized and nonrandomized studies and have tried to unveil some unfounded dogmas about the value of randomization. Let us now see, if there is good empirical evidence against nonrandomized studies.

The published material on systematic bias in nonrandomized studies is of the following three types:

1. Horror stories.
The term "horror story" [53] shall designate anectodal examples of a therapy which, at some time, was deemed efficacious, based on observational studies, but which later turned out much less efficacious, valueless, or even detrimental in a randomized study.

2. Systematic investigations into the literature of how observed treatment effects depend on the study design.
In particular, this includes the question of whether in historically controlled studies the observed treatment effects are typically larger than in randomized trials.

3. Investigations of the "history effect" or "chronology bias" [22].
The question asked here is: To what extent are study results influenced by secular changes in possibly unknown factors?

Horror stories are extremely popular not only with doctors but even more so with biostatisticians. The reason for this is above all an educational one, because horror stories are wonderfully suited for making clear the dangers that loom when one strays from the virtual path of randomization. They are an instrument for threatening unruly clinicians who refuse to randomize.

Table 2: Some "Horror Stories"
  1. Steroid therapy in severe viral hepatitis
  2. Gastric freezing for peptic ulcer
  3. Portocaval shunt procedures for hepatic cirrhosis with esophageal varices
  4. Diethylstilbestrol for habitual abortion
  5. Internal mammary artery ligation in the treatment of angina pectoris
  6. Hyperalimentation regimens for small cell lung cancer
  7. Clofibrate in coronary heart disease
  8. Thoracic radiotherapy for locally advanced, unresectable non-small cell lung cancer
  9. Extracranial-intracranial arterial bypass for preventing ischemic stroke
  10. Vitamin C in the treatment of advanced cancer
  11. 5-FU adjuvant therapy for colon cancer
  12. Estrogen therapy of prostate cancer
  13. Chemotherapy of advanced breast cancer (outcome: survival)
  14. Active-specific immunotherapy of cancer

Table 2 gives a catalogue of more or less well-known horror stories. Some of the therapies on the list are still being discussed are even in widespread use today.

For illustration let us look at one rather dramatic example. In the mid 70’s Cameron and Pauling [7] published a nonrandomized study of high-dose vitamin C in terminally ill cancer patients compared to no further treatment. The study included 100 patients in the vitamin-C group and 1000 controls, 10 per case from the same hospital, matched for age, sex, tumor site, and histology. It was found that the vitamin-C group had a highly significant survival advantage over the controls, with a mean survival time that was more than 3 times longer. Some years later, a Mayo clinic group did two randomized doubleblind trials using the same inclusion criteria as Cameron and Pauling [13, 32]. Both studies gave a perfect null result, and in the second study, survival of the vitamin-C group was even nonsignificantly shorter than that of the control group.

In the light of such an example, any claim as to treatment efficacy based on a nonrandomized trial must necessarily appear as unfounded and almost a violation of critical science.

However, this may be not the whole truth. No doubt, horror stories are impressive and easily remembered. But there are at least three reasons why they do not tell us much about the importance of randomization for the conclusiveness of studies.

Firstly, one may safely assume that the published horror stories are the result of a biased selection. If, on the contrary, a randomized trial is done that confirms the positive result of a previous observational study. Then this is not very exciting from a methodological point of view and will hardly become known among biostatisticians.

The second point is publication bias for the single studies constituting the basis of the horror stories. Poorly controlled studies with a null result are more easily withheld from publication than randomized studies with a null result. This contributes to a tendency that published nonrandomized studies more often show positive results than the published randomized studies.

The third objection is that in general the published horror stories are comparisons of randomized studies with historically controlled studies, many of them of poor quality by today’s standards. Often the fact that randomization is needlessly refrained from is in itself a strong indicator for poor science and for extensive methodological defects in the studies. So horror stories do not tell us anything about the role and impact of bias in well-designed nonrandomized studies, that is, in studies which are planned and conducted with the same methodological care as a randomized trial.

One must emphasize that it is absolutely possible and it has occurred that results obtained in nonrandomized studies are convincing.

To give just one example. In 1991 Cassileth et al. [8] published a prospective parallel-group matched pair study of unconventional treatment compared to conventional treatment of advanced cancer. The study gave a perfect null result which is probably not much less convincing than a null result from a randomized study of the same size.

In this context, an often overlooked asymmetry of conclusiveness is important. There are several reasons why null results form a large study, whether randomized or not, are more convincing than positive results. This also has an implication for the motivation to initiate studies. Since null results undoubtedly contain information, it can be justified to do a nonrandomized study (rather than no study at all), even if it is clear from the start that only a null result will be accepted by the scientific community without further investigations.

Note also, that there are some new proposals for designing and evaluating nonrandomized studies in a way that the results can be convincing, whether positive or negative. One powerful instrument is the evaluation of the total impact of a new therapy in an institution, just as in the paired availability design [2], combined, in addition, with an analysis of the immediate and sudden intervention effect caused by the introduction of the new treatment [24].

In the light of such developments, the sweeping negative opinion about nonrandomized studies, which is probably determined by a simplistic idea of historical two-group comparisons, is too pessimistic. It is perhaps understandable given the impression from horror stories, but it is not really justified. However is this negative view supported by systematic investigations? The association between the study design and the observed treatment effect has been examined in at least four papers.

Chalmers et al. [9] analyzed 145 published parallel-goup studies of treatment of acute myocardial infarction. Short-term mortality was chosen as the variable of interest. The percentages of positive results, that is, of studies with a significant advantage of the therapy group over the control group, were as follows:

  • 8.8 % for studies with blinded randomization,
  • 24% for studies with unblinded randomization, and
  • 58.1% in case of nonrandomized studies.

According to Chalmers et al., this result indicates that in all studies where the treatment allocation is not unknown to the clinician, a distinct selection bias must be reckoned with.

Colditz, Miller, and Mosteller analyzed published reports of 113 studies of medical therapies [12], and - in another paper - 221 studies of surgical therapies [31]. In these studies, an innovation was compared to a standard therapy. It was found that the standardized treatment effect did depend on the study design. As one may expect, the largest mean effect was found for studies with external controls. Surprisingly, however, the mean effect found for observational studies using retrospective record reviews was on the average smaller than that found for randomized studies. Finally, Ottenbacher [36] analyzed 30 randomized and 30 nonrandomized parallel comparisons of a therapy group with a no-treatment control group that had been published 1989 or earlier in JAMA or the New England Journal. He did not find any influence of randomization on the mean observed treatment effect.

The results of these investigations are difficult to interpret, not only because they are somewhat contradictory. At best, they can give a vague idea of the order of magnitude of bias in nonrandomized studies. However, the investigations themselves are biased in three respects.

Firstly, the therapies were not identical in the randomized and nonrandomized studies. One cannot exclude that the true treatment effect depends on the type of therapy, so that the differences in observed effects possibly just reflect the reality.

Secondly, true effects in randomized trials may be smaller because randomization is done only if, a priori, clinicians are not convinced that any one of the treatments to be compared is superior to the others. Since their prior judgement, based on experience, is often correct, positive results in randomized studies are rare [10].

And finally, publication bias probably contributes to the differences between randomized and nonrandomized studies.

Chronology bias was investigated by Pocock [39]. Pocock identified 19 instances where co-operative groups used the same entry criteria for two successive randomized cancer chemotherapy trials which both included the same control treatment. When comparing the identical treatment arms in these pairs of trials Pocock found that the differences in annual death rates ranged from -46% to +24%. Four comparisons yielded differences that were significant on the 2%-level, the smallest p-value was 0.0001 [37]. One should note however that these comparisons were not adjusted for explanatory information or for secular trends.

In summary, the existing investigations do not give any information about the value of carefully designed and conducted nonrandomized studies. In particular, they do not tell us to what extent, if any, possible imbalance in these studies might influence the judgement on the therapies.

One way to address this question systematically would be to embed synthetic nonrandomized studies in randomized trials and to compare the results obtained with the different designs.

Synthetic parallel-group studies, for example, can be carried out in multicenter trials by comparing the results of treatment groups obtained in different institutions. If apparent imbalance occurs then of course it should be adjusted for. Likewise synthetic historically controlled studies can be implemented within long-term randomized trials by partitioning the period of patient entry into different intervals and comparing the results obtained in these intervals.

References

[1]
Abel U. Chemotherapie fortgeschrittener Karzinome. 2nd ed. Stuttgart: Hippokrates; 1995
[2]
Baker S, Lindeman KS. The paired availability design: a proposal for evaluating epidural analgesia during labor. Stat Med 1994; 13: 2269-2278
[3]
Biefang S, Köpcke W, Schreiber MA. In: Koller S, Reichertz PL, Überla K, Eds.. Manual für die Durchführung von Therapiestudien. Medizinische Informatik und Statistik.Vol. 13. Berlin: Springer-Verlag; 1979
[4]
Black N. Why we need observational studies to evaluate the effectiveness of health care. Br Med J 1996; 312: 1215-1218
[5]
Byar DP, Simon RM, Friedewald WT, Schlesselman JJ, DeMets DL, Ellenberg JH, Gail MH, Ware JH. Randomized clinical trials. Perspectives on some recent ideas. N Engl J Med 1976; 295: 74-80
[6]
Byar DP. Why data bases should not replace randomized clinical trials. Biometrics 1980; 36: 337-342
[7]
Cameron E, Pauling L. Supplemental ascorbate in the supportive treatment of cancer: Prolongation of survival times in terminal human cancer. Proc Nat Acad Sci 1976; 73: 3685-3689
[8]
Cassileth BR, Lusk EJ, Guerry DP, Blake AD, Walsh WP, Kascius L, Schultz DJ. Survival and quality of life among patients receiving unproven as compared with conventional cancer therapy. N Engl J Med 1991; 324: 1180-1185
[9]
Chalmers T, Celano P, Sacks HS, Smith H. Bias in treatment assignment in controlled clinical trials. N Engl J Med 1983; 309: 1358-1361
[10]
Ciampi A, Till JE. Null results in clinical trials: The need for a decision-theoretic approach. Br J Cancer 1980; 41: 618-629
[11]
Cochrane AL. Effectiveness and efficiency. Abingdon: The Nuffield Provincial Hospital Trust; 1971
[12]
Colditz GA, Miller JN, Mosteller F. How study design affects outcomes in comparison of therapy. I: Medical. Statist Med 1989; 8: 441-454
[13]
Creagan ET, Moertel CG, O’Fallon JR Scutt AJ, O’Connell MJ, Rubin J, Frytar S. Failure of high-dose vitamin C (ascorbic acid) therapy to benefit patients with advanced cancer. N Engl J Med 1979; 301: 687-690
[14]
Department of Clinical Epidemiology and Biostatistics, McMaster Univ. Health Science Centre. How to read clinical journals. V: To distinguish useful from useless or even harmful therapy. Can Med Assoc J 1981; 124: 1156-1162
[15]
Doll R. Summation of the conference. N Y Acad Sci 1994; 703: 310-313
[16]
Dupont WD. Randomized vs. historical clinical trials. Am J. Epidemiol 1985; 122: 940-946
[17]
Feinstein AR. An additional basic science for clinical medicine: II. The challenges of comparison and measurement. Ann Intern Med 1983; 99: 705-712
[18]
Feinstein AR. Clinical biostatistics. XXIV. The role of randomization in sampling, testing, allocation, and credulous idolatry (conclusion). Clin Pharmacol Therapeutics 1983; 14: 1035-1051
[19]
Feinstein AR, Sosin DM, Wells CK. The Will Rogers Phenomenon. Stage migration and new diagnostic techniques as a source of misleading statistics for survival in cancer. N Engl J Med 1985; 312: 1604-1608
[20]
Green SB. Patient heterogeneity and the need for randomized clinical trials. Controlled Clin Trials 1982; 3: 189-198
[21]
Green SB, Byar DP. Using observational data from registries to compare treatments: the fallacy of omnimetrics. Statist Med 1984; 3: 361-370
[22]
Haines SJ. Randomized clinical trials in the evaluation of surgical innovation. J Neurosurg 1979; 51: 5-11
[23]
Hlatky MA, Lee KL, Harrell FE, Calif RM, Pryor DB, Mark DB, Rosati RA. Tying clinical research to patient care by use of an observational database. Statist Med 1984; 3: 375-384
[24]
Heuer C, Abel U. The analysis of intervention effects using observational databases. Proceedings of a workshop on nonrandomized comparative studies. Heidelberg, April 10/11, 1997; in press
[25]
Holland PW. Statistics and causal inference. JASA 1986; 81: 945-970
[26]
Horwitz RI. The experimental paradigm and observational studies of cause-effect relationships in clinical medicine. J Chron Dis 1987; 40: 91-99
[27]
Johnson FN, Johnson S. Organisation of clinical trials: The pretrial period. In: Johnson FN, Johnson S. Eds. Clinical Trials. Oxford: Blackwell Scientific Publ.; 1977; 36-82
[28]
Laupacis A, Rorabeck CH, Bourne RB, Feeny D, Tugwell P, Sim DA. Randomized trials in orthopaedics: why, how, and when? J Bone Joint Surg 1989; 71-A: 535-543
[29]
Lorenz W, Ohmann C, Immich H, Schreiber HL, Scheibe O, Herfarth C, Feifel, G, Deutsch E, Beger HG. Patientenzuteilung bei kontrollierten klinischen Studien. Chirurg 1982; 53: 514-519
[30]
Love JW. Drugs and operations: some important differences. JAMA 1975; 232: 37-38
[31]
Miller JN, Colditz GA, Mosteller F. How study design affects outcomes in comparisons of therapy. II: Surgical. Statist Med 1989; 8: 455-466
[32]
Moertel CG, Fleming TR, Creagan ET, Rubin J, O’Connell MJ, Ames MM. High-dose vitamin C versus placebo in the treatment of patients with advanced cancer who have had no prior chemotherapy. N Engl J Med 1985; 312: 137-141
[33]
Morgan PP. Clinical trials on trial: I. Must we always do a randomized trial? Can Med. Ass J 1981; 125: 1309-1311
[34]
Olkin I. Statistical and theoretical consideration in meta-analysis. J Clin Epidemiol 1995; 48: 133-146
[35]
Oettinger W, Beger HG. Commentary on: The rise and fall of the random controlled trial in surgery. Theor Surg 1989; 4: 170.
[36]
Ottenbacher K. Impact of random assignment on study outcome: An empirical examination. Controlled Clin Trials 1991; 13: 50-61
[37]
Peto R. Clinical trial methodology. Biomedicine Special Issue 1978; 28: 24-36
[38]
Peto R, Collins R, Gray R. Large-scale randomized evidence: large simple trials and overviews of trials. J Clin Epidemiol 1995; 48: 23-40
[39]
Pocock SJ. Randomized clinical trials. Letter to the Editor. Br Med J 1977; i: 1661
[40]
Pocock SJ. Clinical trials. A practical approach. Chichester: Wiley; 1983
[41]
Rimm AA, Bortin M. TRIALISM: the belief in the Holy Trinity clinician - patient - biostatistician. Biomedicine Special Issue 1978; 28: 60-63
[42]
Royall RM. Current advances in sampling theory: implications for human observational studies. Am J Epidemiol 1976; 104: 463-474
[43]
Royall RM. Ethics and statistics in randomized clinical trials. Statist Science 1991; 6: 52-88
[44]
Rubin DB. Estimating causal effects of treatment in randomized and nonrandomized studies. J Educ Psychol 1974; 66: 688-701
[45]
Rudicel S, Esdaile J. The randomized clinical trial in orthodpaedics: obligation or option? J Bone Joint Surg 1985; 67-A: 1284-1293
[46]
Sacks H, Chalmers TC, Smith H. Randomized versus historical controls for clinical trials. Am J Med 1982; 72: 233-240
[47]
Schäfer H. Methodik kontrollierter klinischer Therapiestudien. Schriftenreihe des Inst. f. Med. Biometrie u. Med. Informatik, Univ. Heidelberg. Vol. 19; 1993
[48]
Schumacher M, Schulgen G. Planung und Auswertung klinischer Studien. Schriftenreihe des Inst. f. Med. Biometrie und Med. Informatik, Univ. Freiburg, Vol. 1, Version 2.0; 1991
[49]
Simon R. Randomized clinical trials and research strategy. Cancer Treatm Rep 1982; 66: 1083-1087
[50]
Solomon MJ, Leod RS. Clinical studies in surgical journals - have we improved? Dis. Colon Rectum 1993; 36: 43-48
[51]
Tannock IF. Some problems related to the design and analysis of clinical trials. Int J Radiation Oncol Biol Phys 1992; 22: 881-885
[52]
van der Linden W. Pitfalls in randomized surgical trials. Surgery 1980; 87: 258-262
[53]
Weinstein MC. Allocation of subjects in medical experiments. N Engl J Med 1974; 291: 1278-1285
[54]
Zelen M. The role of statistics in the design and evaluation of trials in cancer medicine. In: Veronesi U, Bonadonna G. Eds. Clinical trials in cancer medicine. Orlando: Academic Press; 1985; 561-568