Contents

Contributors

Editors:
U. Abel,
A. Koch

Search
Linklist

© Copyright

Published by
symposion logo

Nonrandomized Comparative Clinical Studies -

Proceedings of the International Conference on Nonrandomized Comparative Clinical Studies in Heidelberg, April 10 -11,1997

Order printed volume

Problems of Randomized Controlled Trials (RCT) in Surgery

R. Lefering, E. Neugebauer

Abstract

Randomized controlled trials (RCT) are widely accepted as the gold standard for comparing different therapeutic modalities. The random allocation of patients avoids a selection bias and cares for an equal distribution of conscious as well as unsconscious prognostic factors among the sutdy groups, provided the number of patients included is large enough. The credibility of study results is further enhanced by applying techniques like independent investigators, blinding techniques, or homogenisation of patients and therapy.

Problems with RCT in Surgery: The basic principles of trial methodology hold true for surgical studies as well, and there are several areas of research in surgery where RCTs are usually applied (e.g. double-blind placebo-controlled drug testing; comparison of suture techniques, etc.). Three out of four published RCTs in surgery are comparisons of medical therapies. For other areas of research like the evaluation of diagnostic tests or the determination of prognostic factors, RCTs are not suitable. The crucial point in surgery is that there are some situations which require a RCT but where several factors complicate or even prevent its conduct. These factors are:

Patients’ preferences: Patients often strongly favour one type of operation (e.g. laparoscopy). This may lead to a substantial selection bias since enrollment of eligible patients is less than 75%.

Preferences of the surgeon: A new surgical procedure needs a training phase before being tested in a trial. Sometimes, after this initial phase, there is already a firm opinion on the value of that procedure. The heterogeneity of preferences among surgeons participating in a randomized trial is a further source of bias.

Blinding: The surgeon, and in most cases the patient as well, is not blind for the procedure performed. This is true especially in endoscopic surgery (trocar placement vs. incision), in colon surgery (colostomy), or when amputations are performed. Sometimes, special wound dressings have been used in order to keep at least the ward personnel blind. This problem is even greater if a surgical treatment is compared with a non-surgical therapy (e.g. extracorporal shock wave lithotripsy vs. cholecystectomy).

"Placebo": Today, it is not possible to perform sham operations. But former trials clearly show a placebo effect of all surgical procedures.

Alternatives to RCT: The above mentioned problems are further illustrated by a trial which we performed to compare open vs. laparoscopic cholecystectomy in 1990. After an initial learning phase of seven months (104 patients), a randomized trial was planned. It soon became obvious, however, that randomisation was impossible since most of the patients refused to enter the study. Historical controls were discussed but dismissed, and a hospital near by was selected and convinced to follow the protocol and to include patients with an open operation. However, within a few weeks, this hospital also started with endoscopic surgery. Thus, we were not able to establish a (comparable) control group for the new laparoscopic technique.

In situations where randomisation is very difficult or even impossible, every documentation of patients should contain sufficient information to generate subgroups with a defined profile or to enable a matching with other similar cases. Scores and scales are very helpful in this case.

Introduction

The term "randomisation" or "random allocation" was first used by Fisher [11, 23] for the design of agricultural field trials. The first clinical trial that used randomisation for patient allocation was the trial of streptomycin for the treatment of pulmonary tuberculosis performed in 1946. The introduction of this innovative design was largely due to Sir Austin Bradford Hill who was the statistician on the trial committee [19, 23]. The first randomised trial in surgery was performed by Goligher [13] who compared three different surgical strategies for the elective treatment of duodenal ulcer, namely vagotomy with gastroenterostomy, vagotomy with antrectomy, and subtotal gastrectomy.

Since that time randomized controlled trials (RCT) are widely accepted as the gold standard for comparing different therapeutic modalities. Random allocation of patients avoids selection bias and provides for an equal distribution of concious as well as unconcious prognostic factors. But in spite of its well-known theoretical and methodological advantages, it is sometimes hard or even impossible to perform a RCT, especially in surgery. The number of RCT published in the British Journal of Surgery (figure 1) seems to show a declining trend in the 90s. For the interpretation of these numbers it is important to know that the total number of articles published in this journal ranges between 200 and 300 per year.

This unsatisfactory situation requires a closer look at the problems involved in the conduct of an RCT in surgery and the possible alternatives.

Figure 1: Number of randomised controlled trials (RCTs) published in the British Journal of Surgry in selected years. The total number of articles published in this journal ranges between 200 and 300 (updated from [24])

Problems with RCT in surgery

In a detailed analysis, Solomon at al. [26] identified 202 RCT in surgery published in 1990. It is interesting that 76% of these trials compare medical therapies in surgical patients, and only 18% compare surgical procedures. Furthermore, among the above mentioned trials, there were only 11 trials (6%) comparing a medical and a surgical treatment. Indeed, there are some general drawbacks of RCT like the long duration of the trial, the high costs, the inappropriateness for rare diseases, or the limited generalizability if the inclusion criteria are rather strict [18]. But these limitations hold true for any RCT. In surgery, there are some specific aspects that might explain the scarce use of this methodology, at least to some degree. These aspects are specific preferences of patients as well as surgeons, the impossibility of blinding a surgical procedure, and the placebo effect of surgery. The following paragraphs discuss these aspects and give examples from the literature.

Patients’ preferences

Usually, if a patient has a disease he or she consults a physician of his or her choice and will follow the therapy suggested. If several therapies are available and if there is uncertainty about the most effective one (which is one of the prerequisits for randomisation) the patient may choose to follow his or her own preferences. Those preferences exist especially in situations where the possible therapies differ substantially, e.g. surgical versus medical treatment, or if one of the therapies has gained a good standing in the public’s opinion. A good example for the latter situation is the introduction of laparoscopic techniques for the removal of the gallbladder in the beginning of the 1990s. Initial reports and articles in the lay press suggested this new method may be far superior than the conventional open procedure. Many patients were biased by these reports and strongly opted for the laparoscopic operation. Randomised trials set up in order to test the pretended advantages of the new operation had big problems in recruiting enough patients. Barkun et al., for example, published a RCT of laparoscopic versus mini-cholecystectomy in 1992 [1], and he stated that there was a high rate of withdrawl after randomisation, especially in the group for open procedure. He stated that "the trial was stopped because patient recruitment had become difficult". Kunz, a German surgeon, published a RCT on the same topic in 1992 [15]. He was able to enroll only 24% of all eligible patients. With an increasing difference between the therapeutic alternatives, patients’ preferences increase as well. Plaisier et al. published a randomised comparison of open cholecystectomy with a non-surgical therapy (extracorporeal shock wave lithotripsy) [22]. He was able to include only 8.4% of all eligible patients and accused patients’ preferences, biased information in the lay press, and the introduction of the laparoscopic technique as the most important factors.

Sugeons’ preferences

Not only patients but also physicians have preferences regarding different therapeutic alternatives. This is even more important in surgery since the surgeon is part of the "drug". If surgical techniques are compared, strategies have to be followed in order to achieve a minimum standard of surgery [18], including agreement on technical aspects of the procedure, teaching sessions, and documentation of details of the procedure performed. To avoid bias induced by different levels of education and skill among surgeons, the ideal method would be to let each surgeon perform both operations. In practice, however, a design with two groups of surgeons with different preferences is more likely to be accepted. The reasons why surgeons are sometimes reluctant to randomise patients is further enlightened by an investigation of Taylor et al. [27]. She found that two third of the participating physicians did not include all eligible patients in a trial of surgery for breast cancer. A questionnaire sent to these physicians gave the following responses: 73% were concerned about the doctor-patient relationship; 38% had trouble with the informed consent; 23% disliked an open discussion about uncertainty; etc. In summary, participating in a clinical trial and especially performing a random allocation of therapy may potentially compromise the authority of the physician and question his expert knowledge, at least in the view of some physicians[27].

Blinding

Although a blinded assessment of outcome is not necessarily a prerequisite for randomisation, the actual impossibility to do so sometimes prevents the conduct of a trial at all. Systematic reviews show that if no double-blinded conditions were applied, the average treatment effect is about 17% higher [25]. Missing possibilities for blinding are less important for endpoints like mortality, but become even more important for subjective outcome assessment of the patient (e.g. pain assessment, fatigue, satisfaction, or return to work) or the physician (e.g. classification of success or hospital stay). Only one third of all surgical trials performed have had an adequate blinding of patients or physicians [26]. Especially if surgical procedures are compared, at least the surgeon who performs the operation is not blinded. Thus, the observer in the postoperative period should be a different person. In some trials, even uncommon techniques were applied in order to keep the outcome assessment blinded. Majeed et al., for example, compared (again) laparoscopic and open cholecystectomy, and they used identical opaque dressings stained with iodine solution and bloodstained fluid in order to keep the ward nurses blind to the procedure performed [16]. But even in this situation the patient may easily guess his group by simply pressing on the dressings.

Placebo effect of surgery

The effect of a treatment observed is well-known to be composed of several components [12]. Besides the net effect of the therapy, there is an effect due to the natural course of disease (e.g. the postoperative relief of pain), the so-called Hawthorne effect which summarizes the alterations under study conditions as compared to normal routine care, and there is a placobo effect. Whenever a subjective endpoint like pain or quality of life is chosen, the possibility of results influenced by a placebo effect has to be discussed [28]. Some authors postulate surgery by itself has a placebo effect that influences outcome [2, 14]. This has to be regarded if a surgical and a non-surgical therapy are compared. A well-known and frequently cited example of this effect was published already in 1958 [7, 9]. In order to disprove the effectiveness of internal mammary artery ligation for treatment of angina pectoris, a sham operation was performed in a randomised design. The surgeon was informed just before the skin incision has been done when whether to proceed with ligation or not. At the end, all patients (with and without ligation) identically reported a marked improvement.

Alternatives to RCT, example 1: laparoscopic cholecystectomy

The introduction of the laparoscopic technique for cholecystectomy at the beginning of the 1990s is a recent well documented example of the discrepancy between reality and what was desired from the methodological point of view. In 1989, Cushieri wrote an editorial about the "laparoscopic revolution" [8], and he called for randomized controlled trials to define the indications for this new approach. Chalmers is well-known for his principle that the first patient has to be randomized if a new technology is introduced [5]. He strongly argued that a pilot trial may prevent the conduct of a subsequent controlled comparison.

But like any new drug applied to a patient, new surgical procedures need a carefull introduction as well. Information about its safety and feasibility is a prerequistite for its application, and some kind of standardization of technique is required. The effect of a learning curve on the duration of an operation and the incidence of complications is well-known. Therefore, after having done the first laparoscopic cholecystectomy in Cologne-Merheim in October 1989, an initial observational trial of 100 cases was performed, which was to be followed by a RCT. However, after the initial learning phase of 7 months (104 patients), randomization became impossible since most of the patients refused to enter the study[21]. Those surgeons performing laparoscopy also became strongly convinced that this technique was superior. As an alternative, historical controls were discussed but dismissed, since most of the important endpoints could not be evaluated retrospectively (pain, fatigue, stress response). Consequently, a hospital nearby was selected and convinced to follow our protocol and to include patients with an open operation. But within a few weeks, this hospital started with endoscopic surgery, too. Thus, we were not able to establish a comparable control group and could only establish a well documented prospective cohort trial of 500 cases including follow-up results for future comparisons.

In the situations described above where randomised trials are scarce and the existing ones are of poor quality, consensus conferences might be the best way to draw at least some preleminary conclusions, provided the methodological guidelines for performing such conferences have been followed [17]. In the case of the laparoscopic technique, this approach has been followed by our European society [10].

Alternatives to RCT, example 2: prehospital trauma care

The evaluation of prehospital care is a further example where randomisation is extremely difficult. In Cologne, for example, 72 physicians and about 1400 paramedics work on 1 helicopter and 4 physician staffed emergency cars and 23 ground ambulances. Initial therapy has to be administered without delay. A careful evaluation of inclusion and exclusion criteria with subsequent randomisation is nearly impossible. In the literature, very few controlled trials are found. A recent example of a pseudo-randomised trial was published by Bickel et al. in 1994 [3]. He compared immediate versus delayed fluid resuscitation for hypotensive patients with penetrating injuries. All patients injured on even-numbered days of the month were enrolled in the immediate resuscitation group, whereas those injured on odd-numbered days received delayed resuscitation.

It is important for the evaluation of initial trauma care whether a detailed documentation of what happened can help to find answers. In 1993, the German Sociaty for Trauma Surgery (DGU) had set up a documentation sheet including prehospital and initial hospital care as well as outcome assessment. More than 1000 trauma cases have been prospectively documented with this instrument and entered in a database until June 1996. It has been questioned if conclusions can be drawn about the effectiveness of aggressive preclinical treatment of severe trauma which includes early intubation and fluid administration of at least 1000 ml.

Among the 1037 patients documented in the database, 466 had had severe trauma (Injury Severity Score ³ 16; transfers were excluded). 329 Patients had a complete prehospital documentation. Among these, 159 patients received "aggressive" therapy as described above, whereas 170 patients did not. Of course, these two groups were not expected to be comparable since the decision for aggressive preclinical therapy was influenced largely by the severity of injuries. In fact, the "aggressive" group had a higher ISS (30 vs. 25), a lower blood pressure (107 vs. 122) and more head-injured patients (Glasgow Coma Scale 8.5 vs. 12.4). Thus the mortality in the "aggressive" group was 30.2% (95%-confidence interval [23.1-37.3]) as compared to 15.3% (95%-CI [9.9-20.7]) in the other group.

The application of trauma-specific score systems may be one attempt to compensate for these differences. A score system, usually developed with data from a large population, may be used to serve as an external standard for your own group of patients. In the example above, the TRISS methodology [6] which is a combination of anatomic severity of injuries (Injury Severity Score), physiologic reaction of the patient (Revised Trauma Score) and age, constitutes a well validated tool for estimation of prognosis. The application of this score provides a comparison to standard norms of outcome with respect to the severity of trauma (before intervention). In the above mentioned example, the "aggressive" group had an expected mortality of 32.0% (due to the TRISS Score), whereas the other group had an expected mortality of 13.5%. In both cases, the predicted mortality fits quite well with the actually observed mortality. Thus there is no evidence that the current practice of intubation and resuscitation is detrimental.

The application of score systems may substitue for a control group in situations where a randomised or contemporary design is impossible. However, every score needs a careful validation [4], and the methodological limitations should clearly be considered [20].

References

[1]
Barkun JS, Barkun AN, Sampalis JS, Fried G, Taylor B, Wexler MJ, Goresky CA, Meakins JL (1992) Randomised controlled trial of laparoscopic versus mini cholecystectomy. Lancet 340: 1116-1119
[2]
Beecher HK (1961) Surgery as a placebo. JAMA 176: 1102-1107
[3]
Bickel WH, Wall MJ, Pepe PE, Martin RR, Ginger VF, Allen MK, Mattox KL (1994) Immediate verus delayed fluid resuscitation for hypotensive patients with penetrating torso injuries. N Engl J Med 331: 1105-1109
[4]
Bouillon B, Lefering R, Vorweg M, Tiling T, Neugebauer E, Troidl H (1997) Trauma score systems - the Cologne validation study. J Trauma (1997, in press)
[5]
Chalmers TC (1975) Randomisation of the first patient. Med Clin North Am 59: 1035-1038
[6]
Champion HR, Copes WS, Sacco WJ, Lawnick MM, Keast SL, Bain LW, Flanagan ME, Frey CF (1990) The Major Trauma Outcome Study: establishing national norms for trauma care. J Trauma: 30: 1356-1365
[7]
Cobb LA (1959) Evaluation of internal mammary artery ligation by double-blind technique. N Engl J Med, 260: 1115-1118
[8]
Cushieri A (1989) The laparoscopic revolution - walk carefully before we run. J Royal Coll Surg Edinb 34: 295
[9]
Dimond EG, Kittle CF, Crocett JE (1958) Evaluation of internal mammary artery ligation and sham procedure in angina pectoris. Circulation, 18: 712-713
[10]
Educational Committee of the European Association for Endoscopic Surgery (1995) The E.A.E.S Consensus Development Conference on laparoscopic cholecystectomy, appendectomy, and hernia repair. Theor Surg 9: 550-563.
[11]
Fisher RA (1926) The arrangement of field experiments. J Minist Agric 33: 503-513
[12]
Fletcher RH, Fletcher SW, Wagner EH (1988) Clinical epidemiology - the essentials. Second edition. Williams and Wilkins, Baltimore
[13]
Goligher JC, Pulvertaft CN, Watkinson G (1964) Controlled trial of vagotomy and gastroenterostomy, vagotomy and antrectomy, and subtotal gastrectomy in elective treatment of duodenal ulcer. Interim report. Br Med J 1: 455-460
[14]
Johnson AG (1994) Surgery as a placebo. Lancet 344: 1140-1142
[15]
Kunz R, Orth K, Vogel J, Steinacker JM, Meitinger A, Brüchner U, Beger HG (1992) Laparoskopische Cholezystektomie versus Mini-Lap-Cholezystectomy. Ergebnisse einer prospektiven, randomisierten Studie. Chirurg 63: 291-295
[16]
Majeed AW, Troy G, Nicholl JP, Smythe A, Reed MWR, Stoddard CJ, Peacock J, Johnson AG (1996) Randomised, prospective, single-blind comparison of laparoscopic versus small-incision cholecystectomy. Lancet 347: 989-994
[17]
McGlynn EA, Kosecoff J, Brook RH (1990) Format and conduct of consensus development conferences. Int J Technol Assess Health Care 6: 450-469
[18]
McLeod RS, Wright JG, Solomon MJ, Hu X, Walters BC, Lossing A (1995) Randomized controlled trials in surgery: issues and problems. Surgery 119: 483-486
[19]
Medical Research Council (1948) Streptomycin treatment of pulmonary tuberculosis. Br Med J 2: 769-782
[20]
Neugebauer E, Bouillon B (1994) Was können Scoresysteme leisten? Unfallchirurg 97: 172-176
[21]
Neugebauer E, Troidl H, Spangenberger W, Dietrich A, Lefering R, and the Cholecystectomy Study Group (1991) Conventional versus laparoscopic cholecystectomy and the randomized controlled trial. Br J Surg 78: 150-154
[22]
Plaisier PW, Berger MY, van der Hul RL, Nijs HGT, den Toom R, Terpstra OT, Bruining HA (1994) Unexpected difficulties in randomizing patients in a surgical trial: a prospective study comparing extracorporeal shock wave lithotripsy with open cholecystectomy. World J Surg 18: 769-773
[23]
Pollock AV (1989) The rise and fall of the random controlled trial in surgery. Theor Surg 4: 163-170
[24]
Pollock AV (1993) Surgical evaluation at the crossroads. Br J Surg 80: 964-966
[25]
Schulz KF, Chalmers I, Hayes RJ, Altman DG (1995) Empirical evidence of bias. Dimensions of methodological quality associated with estimates of treatment effects in controlled trials. JAMA 273: 408-412
[26]
Solomon MJ, Laxamana A, Devore L, McLeod RS (1994) Randomized controlled trials in surgery. Surgery 115: 707-712
[27]
Taylor KM, Margolese RG, Soskolne CL (1984) Physicians’ reasons for not entering eligible patients in a randomized clinical trial of surgery for breast cancer. N Engl J Med 310: 1363-1367
[28]
Turner JA, Deyo RA, Loeser JD, von Korff M, Fordyce WE (1994) The importance of placebo effects in pain treatment and research. JAMA 271: 1609-1614