Contents

Contributors

Editors:
U. Abel,
A. Koch

Search
Linklist

© Copyright

Published by
symposion logo

Nonrandomized Comparative Clinical Studies -

Proceedings of the International Conference on Nonrandomized Comparative Clinical Studies in Heidelberg, April 10 -11,1997

Order printed volume

Experimental Study versus Non-Experimental Study:
The Non-Experimental (Non-Randomized) Study as a Methodological Compromise

K. Dannehl
(Translated by Christoph Trautner)

Abstract

Most methodologists agree that the experimental study is not only the best method for physics, chemistry, and biology, but also for medical research. However, often one has to be satisfied with non-experimental, i.e., less than optimal, designs for gaining knowledge. This is due to organisational and economic, as well as legal and ethical limits which we often meet when we conduct experiments in humans and which we can not, may not, or do not want to go beyond.

On the other hand, in this contribution, however, the supposition is supported that the experimental (randomized) study is not the best method, but the only real testing method available. If this claim is correct, the non-experimental (non-randomized) study has to be considered a methodological compromise in every particular case of testing a therapy. There is no alternative to the experimental (randomized) study - there are only shades of methodological compromise!

However, a compromise cannot be generalised. It can only be worked out, evaluated, and supported in a concrete, specific situation. For every concrete clinical study, one has to work out from scratch by which methodological compromise "on a small scale" it can (still) be carried out as an experimental study (experiment as "testing method"). If this proves not to be feasable one has to work out - oriented at the experimental study design (!)- by which types of methodological compromise "on a large scale" (performing a non-experimental study) the specific situation of conflict may be sufficiently alleviated (experiment as "guiding method").

1. Introduction

"The controlled (=randomized; K.D.) clinical trial is the via regia of making decisions about the effectiveness of drugs." [6, p.45] Indeed, methodologists agree largely that the experimental study is not only the best method for physics, chemistry, biology, but also for medical research. However, non-experimental designs, i.e. second and third methods of choice, are also in use.

Personally, I tend to go even a step further. I take the view that the experimental (randomized) study is not the best method, but the only real testing method. If my claim is correct, this means that in the particular case of a clinical trial, the non-experimental (non-randomized) study has to be considered a methodological compromise, with the experimental study serving as the guiding method. I will try to give reasons for my view in the following text.

2. Internal validity of empirical studies

2.1 Experimental study

Is the experimental study in fact the only real testing method, i.e. not just the best among several methods? To discuss this question, we need to consider the following

Theorem: If one wants to test the effect of a possible factor of influence on a target parameter, one has to vary (manipulate) this "target" factor of influence , and, at the same time, one has to control all other - known or unknown (!)  factors of influence that are not the targets of the investigation but that might interfere with the investigation (called confounders).

This theorem has two important characteristics:

  1. The statement of the theorem is obviously true.
  2. The statement of the theorem is invariant towards any variation of the empirical scientific context.

This means, the statement of the theorem is true, independent of whether the question is physical, chemical, biological, psychological, sociological or medical. Any concrete study design that achieves just this, i.e. complies with the stated theorem, is an "experimental" study. This is the same in all empirical sciences, from physics to medical research.

The control of confounders is carried out using so-called techniques of control:

  • Holding constant
  • Running the compared groups at the same time (forming time-blocks of probands)
  • Forming blocks of characteristics of probands
  • Randomized assignment of probands to the target levels of the factor of influence (comparison treatments) within the blocks of probands (randomization) and
  • Blinding of probands and/or investigators.

In this process, the technique of control "randomized assignment" is always necessary! The other techniques of control, however, are only useful, not necessary. They are only useful if the respective confounding factors are present. Strictly speaking, they are not techniques of control, but techniques of precision. The only exception here is the blinding technique. Like "randomized assignment", it is necessary, not only useful. However, the blinding technique is only necessary if the target parameters are subjective variables, and/or if the relationships that are being investigated are phenomena with, at least in part, psychological causes.

2.2 Randomization

Obviously, only the two real techniques of control "randomized assignment of patients to the compared treatments" and "blinding of patients and / or physicians" pose actual problems in medical research. It is by them that the ethical and legal problems are sparked off. Why randomized assignment of probands to treatments (randomization)? Why not systematic assignment of probands to treatments by expert opinion?

To investigate this part of the question, it is very informative to put oneself into the position of an empirical scientist who did not have access to randomization as a technique of control. This means going back to the time before 1935 [2]. In performing his study, our scientist could keep manipulable external factors of influence constant, in order to control varying confounders. This method was known from physics. Our scientist could observe the probands in the study at about the same time (form time blocks) to control non-manipulable external factors of influence. This was not so well known from physics, but it suggested itself. Finally, our scientist could form blocks of investigation on the basis of the probands’ characteristics to control factors of influence that manifest themselves in the variability among probands. The term "block" was still largely unknown. However, the method commonly used then, e.g. running an investigation separately in men and women, but still in a parallel way, in principle meant the same.

However, in a study, the technique of forming blocks on the basis of the probands’ characteristics, can be applied to only a few characteristics, maybe 3 to 5 at the maximum. For practical reasons, many confounders must remain unaccounted for. Therefore, our scientist had to ask the following question:

How is it possible to control the confounders that manifest themselves in the variability between probands, and that cannot be accounted for by forming blocks? The simple and at the same time brilliant idea was:

The probands should be assigned to the compared treatments in a way that the variability among probands is equally distributed between the treatment groups formed in this way. Then all confounders are controlled. Errors are excluded!

Several procedures have been proposed as a technique of control that we would like to call "neutralisation". Disregarding details, they may be classified in two groups:

  1. Systematic assignment of probands to treatments by expert opinion (expert method)
  2. Randomized assignment of probands to treatments (randomization method).

The expert method had only supporters among empirical scientists. It was consistent with their thinking, their actions, and, more importantly, it was consistent with their entire conviction of how one had to carry out research. The randomization method did not have any supporters. It was not consistent with any thoughts, actions, or convictions. On the contrary: it was an affront.

Imagine: somebody was arriving [2], who seriously suggested to educated, even scientifically thinking people, that they should set aside their entire accumulated expert knowledge, leave it outside at the cloakroom, so to speak, at a specific point in their research activity. Instead they were supposed to "throw the dice". Only a madman could have thought out such a thing! It was obvious that in throwing the dice it might well happen that, for example, all female probands would end up in one treatment group, and all male probands in the other group. This would be the exact opposite of the intended solution of the problem, i.e. the equal distribution of the probands’ variability. Such a thing could never happen using the expert method. Consequently, everybody was able to see that the expert method was much more precise than the dice method.

Perhaps the expert method is really more precise than the randomization method. But the problem was just here. The method was, if at all, only more precise, not precise. Gradually it turned out that there is no method of assignment (and there will never be one) that succeeds in equally distributing the probands’ variability among the compared treatments without any error. In other words: the equal distribution of the probands’ variability among the treatment groups is desirable but not at all possible. Any method of assignment necessarily leads to its own inherent mistakes in assignment that are typical for this method.

This had made it clear that the absolutely equal distribution of the probands’ variability could not be the aim any more, although the original idea of control was still correct. On the contrary, the following goal had to be pursued: select one of the methods of assignment that existed (or that would be newly developed in the future) and for which, in addition, one has the mathematics for calculating the error inherent in the method of assignment with respect to the target variable.

Among all methods of assignment proposed to date, randomization is the only method where mathematics of error (i.e. probability theory and mathematical statistics) are available. This fact enables this method to calculate its inherent error of assignment with respect to the target variable. For this simple reason, finally, "throwing the dice" (and by this, the method of assignment that is possibly the least precise of all) has invaded, as it were, empirical and, therefore, also medical research. This includes the associated mathematics of error, called "statistical test", with the misleading name of "significance" test.

It should be added for further clarification: any alternative procedure proposed in the future that should be better than randomization (which could only mean being more precise than randomization) should include specific mathematics for calculating its specific error of assignment. Only if this was the case, would the procedure be debatable as a serious competitor of randomization. An alternative procedure must fulfil an additional requirement: it must be able to explain how it deals with the unknown confounding factors. The randomization procedure does not have any problems with the unknown confounding factors because it deals with the unknown confounding factors in exactly the same way as it deals with the known ones. If both conditions are considered, the necessary specific mathematics of error, and the crucial question about the unknown confounding factors, it becomes apparent that the prospect of success of alternative assignment procedures is gloomy, or, expressed in subjective probabilities, approaches zero.

Here it becomes apparent that the technique of controlling the probands’ variability does not only consist of randomization, but of randomization (in the very beginning of the study) plus statistical test (at the very end of the study). Consequently, the "significance" test, equipped with such an impressing name, is only an appendix to randomization! In fact, we do not randomize to have a basis for our statistical tests, as it is often said [e.g. 3, p. 50]. The opposite is true. At the end of the studies, we carry out statistical tests to complete the technique of control, called "neutralization", that consists of randomization plus error mathematics. With respect to the logic of research, this is a fundamental difference.

Conclusion: there is no alternative to the randomized assignment of patients to the compared treatments! In general: for testing possible factors of influence with respect to their effect on target parameters, there is no alternative to the experimental study. An indispensable part of it is the technique of control, called "neutralization", that consists of randomization plus statistical test.

2.3 Non-experimental study

Let us now have a look at the non-experimental, i.e. non-randomized studies (see figure 1). There are the following classes of non-experimental studies: quasi-experimental studies, observational studies, prospective ex-post-facto studies, retrospective ex-post-facto studies (case-referent studies), and finally the very simple correlational studies. Despite all the differences between these types of non-randomized studies, after all, their differences are only gradual in nature. The real, great leap of quality lies between "experimental" and "non-experimental", between "randomized" and "non-randomized". They are worlds apart!

Types of studies
Figure 1: Types of studies

The experimental, and only the experimental study, tests the factor of influence under investigation with respect to its effect on the target variable, together with a probability of error in determining an effect in the respective experiment. This probability of error, say a = 5%, is defined ex ante by the researcher. On the other hand, the non-experimental study types can only - if at all - test differences as such between compared groups with respect to the target variable. The probability of error, say a = 5%, in determining a difference as such in the respective study is defined ex ante by the researcher. Necessarily, it remains totally open where the observed (statistically significant) difference stems from, whether it actually stems from the target factor under investigation, or whether it is due to other, so-called third factors. This weakness of non-experimental studies is due to their construction. To counteract this problem in non-experimental studies, in addition to the medical hypothesis under investigation, alternative explanations for the possibly observed difference have to be formulated ex ante (!). Then one has to try and refute one alternative explanation after the other. This procedure of so-called eliminative induction is logically endless!

Furthermore, in the case that no (statistically significant) difference between the compared groups is observed, alternative explanations of a possible compensation have to be formulated ex ante. Also in this case, one has to try and refute one alternative explanation for compensation after the other, following the procedure of so-called eliminative induction. As stated, this procedure is logically endless!

A non-experimental study in which this procedure of eliminative induction is missing - and it is almost always missing - must be considered incomplete. The reason is that this procedure, already prepared in the planning phase, is an undispensable part of the non-experimental study - in contrast to the experimental study. Since the procedure of eliminative induction is logically endless, this leads to the following conclusion: the whole class of non-experimental studies consists of research methods that are suitable to form hypotheses about therapies, but are not suitable per se to test hypotheses about therapies.

Both the derivation of the experimental study, and the remarks about the non-experimental study - show that the experimental study is in fact the only method that makes it possible to test the effects and effectiveness of therapies. If, nevertheless, the diverse non-experimental studies are used for testing, these may be at best methodological compromises. There is no alternative to the controlled (randomized) clinical trial, but only methodological compromises.

3. Antinomy

In the practice of medical research, the experimental study comes up against many limiting factors. Technical, organisational, economical, but mainly legal and ethical reasons may be obstacles to a controlled clinical trial. In this context, one always has to be aware of the difference between

treating patients by means of therapies

on the one hand, and

testing therapies by means of patients

on the other. The relationship between the two of them is that of an antinomy. The medical person in the situation of scientific testing has to fulfill totally different requirements (e.g. techniques of control, randomization, and possibly necessary blinding) than the medical person in the situation of a medical treating (e.g. the usual doctor-patient relationship, legally described by the individual contract of treating between the patient and the physician). The conflict inherent in this situation may perhaps be concealed, but not resolved, if we simply try to mix the situations of treating and testing, and if we pretend that we must only treat well in order to research well, or vice versa: if we just carried out good research we would also provide good treatment. It is true that when one wants to carry out optimal research in patients, one is not able to treat these patients in an optimal way, and vice versa! Because clinical research has to rely on the "treating physician and his patients" the conflict mentioned above is, more or less, always present.

4. Methodological compromise

From the discussion of "experimental and non-experimental studies", and of the "antinomy between scientific testing and medical treating", it follows that for every concrete clinical trial it has to be worked out anew by which methodological compromises "on a small scale" it can be (still) carried out as an "experimental" study (experiment as "testing method"). If this turns out to be unfeasable it has to be worked out - oriented at the experimental study design (!) - by which methodological compromises "on a large scale" the conflict can be sufficiently defused (experiment as "guiding method"). This must be done in every single case with a still acceptable relativization (dilution) of the expected testing results. All this is the original task in planning the study in the particular case of a clinical trial.

Among the methodological compromises "on a small scale", I count all measures (concessions) that in the particular case of planning a study are suitable for preserving the experimental status of the study. The so-called "Zelen plans" have to be seen in this context [7, 8]. This is true when they are applied to the situation before the beginning of the study (Zelen). This is also true if some patients demand the alternative therapy after the study has already begun. Strictly speaking, all implemented experimental studies are methodological compromises "on a small scale"; because implemented studies without any compromise do not exist! Above all, the analysis of randomized studies according to the "intention to treat" principle in the case of violations of the study protocol by the patients has to be seen in this context.

The whole discussion about this principle that has been going on for years literally vanishes into thin air in the context of the concept of compromise, with the "experiment as testing and guiding method", proposed here. The data of the study patients are analyzed despite violations of the protocol, i.e. according to "intention to treat" (as randomized), in order to save randomization, and thereby the experimental status of the study. At this point, it becomes clear that the question whether one analyzes "as randomized" or "according to protocol" has nothing at all to do with the question whether the study is a "pragmatic" or an "explanatory" design in the sense of Schwartz and Lellouch [5]. The latter question boils down to how the problem is defined, so that the limits are fluid. In addition, it becomes obvious that it does not make any sense to carry out double analyses as a standard procedure: one "according to protocol", and, in parallel, one according to "intention to treat". Either one analyzes according to "intention to treat" (as randomized), or one analyzes "according to protocol". Which of the two procedures is used in a particular case depends on the extent of the methodological compromise. Of course, in the case of violations of the study protocol by the patients, a methodological compromise "on a small scale", i.e. analysis "as randomized", is only useful as far as interpretable testing results can still be obtained by this method. Randomization, i.e. the experimental status of the study, is always just a means to an end, never the end in itself.

I count among the methodological compromises "on a large scale" those that force us, in the particular case of planning a study, to switch to non-experimental, i.e. non-randomized, study designs (see figure 1). This is also the case if one decides to analyze the data "according to protocol" (if the protocol is violated on a large scale by the patients in randomized studies). As a rule, this can only lead to an observational study.

Working out a solution of methodological compromise is always a task in planning a study. A clinical study that has been carried out experimentally, i.e. in a randomized fashion, is not a methodological compromise, but a faulty (irreparably defective) experimental study, if one realizes only afterwards that, in addition, it should have been blind or double-blind. To say it clearly: a non-experimental study that is the result of a methodological compromise worked out in the planning phase, is, in principle, scientifically acceptable. On the other hand, a flawed experimental study, including all belated rescue attempts, is, in principle, scientifically not acceptable.

5. Final remarks

In the German law relating to the manufacture and distribution of medicines [1], more precisely, in the guideline based upon it [4, paragraph 2.2], the controlled (randomized) clinical trial is, in principle, requested for the clinical evaluation of manufactured drugs. "In principle" means, even if it does not have to be added explicitly: It is possible to depart from this rule. However, this has to be justified. This principle allowing departures from the rule, together with the obligation to justify the departure, is a logical consequence of the widespread conviction that the experimental (randomized) study is the best among several methods. On the contrary, the concept of compromise outlined here is based on the fact - proved above - that the experimental (randomized) study is not the best method, but the only real testing method. In view of this fact, it is methodologically not acceptable if one just gives reasons for departing from the experimental study design. The final form of a study, departing from the indicated experimental study design, must be the result of a methodological compromise that has to be intelligible as such. The methodological compromise, as a rule, becomes necessary in a study. In the planning phase of a study, one has to go through this methodological compromise with all its implications. Only then will one be able to adequately interpret the biostatistical result of such a "compromise" study.

On the basis of the concept of compromise, with the "experiment as testing and guiding method", the diverse classes of non-experimental study designs actually represent some sort of generalized compromise designs. However, compromise cannot be generalized. It can only be worked out, evaluated, and actually justified in a particular case of a study being planned (if a concrete research question is already formulated!). Putting emphasis on this point is not at all supposed to belittle the efforts of many authors who propose alternative approaches to randomized studies. It is just meant to highlight the goal that we, medical researchers and biometricians, still have to reach: the development of a culture of methodological compromise, oriented at the experimental study design.

References

[1]
AMG - Gesetz über den Verkehr mit Arzneimitteln, in der Fassung des Gesetzes zur Neuverordnung des Arzneimittelrechts vom 24. August 1976 (BGBl. 1, S. 2445-2448), zuletzt geändert durch das Fünfte Gesetz zur Änderungn des Arzneimittelgesetzes vom 9. August 1994 (BGBl. 1, S. 2071-2087).
[2]
Fisher, R.A. (1935): The Design of Experiments. Oliver and Boyd, Edinburgh. (1st ed.).
[3]
Pocock, S.J. (1983): Clinical Trials, A Practical Approach. John Wiley & Sons, Chichester, New York, Brisbane, Toronto, Singapore 1983.
[4]
Prüfrichtlinie (1987): Bundesminister der Justiz (Hrsg.) Grundsätze für die ordnungsgemäße Durchführung der klinischen Prüfung von Arzneimitteln. Vom 9. Dezember 1987. Bundesanzeiger 243 (Jahrg. 39), S. 1617.
[5]
Schwartz, D.; Lellouch, J. (1967): Explanatory and Pragmatic attitudes in Therapeutical Trials. Journal of Chronic Diseases 20, S. 637-648.
[6]
Überla, K. (1980): Methoden der Urteilsbildung: Statistische Verfahren.
In: Bock, K.D., (Hrsg.) (1980): Arzneimittelprüfung am Menschen. Vieweg & Sohn, Braunschweig/Wiesbaden, S. 41-47.
[7]
Zelen, M. (1979): A New Design for Randomized Clinical Trials. The New England Journal of Medicine 300, S. 1242-1245.
[8]
Zelen, M. (1990): Randomized Consent Designs for Clinical Trials: an Update. Statistics in Medicine 9, S. 645-656.