|
A Nonparametric Test for Evaluating Coherent Alternativesin Nonrandomised Studies O. Gefeller, L. Pralle
When considering the effect of treatments or exposures on some outcome variable in a nonrandomised study, the presence of coherence provides supporting evidence that an observed relationship between the factors of interest might reflect a causal treatment or exposure effect. In our understanding, coherence means that we have a specific and detailed description of what an actual treatment or exposure effect would look like. The concept of coherence can then be used to formulate a "coherent pattern" of expected results, indicative of a real effect of the treatment or exposure under study, that can be tested using the observed data. In the paper, we review a simple nonparametric rank test, developed by Rosenbaum, for testing the null hypothesis of no treatment/exposure effect against arbitrarily complicated coherent alternatives. In addition, we introduce a new measure of coherence to summarise quantitatively the coherence present in the data. Two empirical examples, one epidemiological investigation and one nonrandomised clinical trial, illustrate the application of the methodology. |
||||||
| 1. Introduction When the results of nonrandomised experiments or observational studies have to be interpreted, the dilemma often arises whether the apparent difference between the groups to be compared can be causally attributed to the characteristic defining membership in the group or not. No sophisticated statistical testing procedure comparing nonrandomised samples with respect to the distribution of some primary endpoint variable(s) can remedy the lack of randomisation at the design stage of the study. Any nonrandomised comparison is potentially subject to (overt and hidden) biases that can distort the picture of results. However, supporting evidence for the assertion that the observed difference might reflect a real treatment or exposure effect can be drawn from the presence of "coherence" in the pattern of results. Coherence means that we have a specific and detailed description of what an actual treatment or exposure effect would look like. Many epidemiological textbooks [1-4] and historical papers about epidemiological inference [5,6] discuss coherence as one of the criteria for judging causality when interpreting empirical findings from observational studies on the relationship between some exposure factor and an outcome variable. In all references, the concept of coherence is illustrated through examples rather than defined formally. In this paper, we do not attempt to give a formal description of coherence and leave it for section 4, giving two practical examples, to delineate what a "coherent pattern of results" means in the framework of the two corresponding studies. Instead, we discuss a simple nonparametric approach, developed by Rosenbaum [7], to test for coherence and introduce a new measure of coherence that provides a quantitative description of the degree of coherence present in the observed data. The rest of the paper is organized as follows: in the next section we derive the so-called poset statistic, which can be viewed as a generalisation of standard two-sample rank tests, for testing the null hypothesis of no treatment/exposure effect against arbitrarily complicated coherent alternatives. In section 3, we introduce a simple procedure for estimating the level of coherence that yields a quantitative summary measure of the compatibility of the data with the pattern of results specified in the definition of the coherent alternative. Section 4 consists of two empirical examples illustrating the application of the methodology. One example is drawn from an observational study in occupational epidemiology [8], the other uses data from a nonrandomised clinical trial in neonatology [9]. The final section gives additional discussion of the concept of coherence and points to further topics that can be addressed in this framework. 2. Derivation of the Poset Statistic Consider the typical structure of the two-sample layout: there are N units of
observation, numbered i=1,...,N. The observations consist of K-dimensional vectors Now consider the random variables Uij, i,j=1,...,N, defined by
which can be viewed as indicator-like variables providing the information whether yi
and yj can be ordered using "<c" and, if yes, in which
direction. Note that Uii=0 by asymmetry and Uij=-Uji by
definition. For a fixed i the sum These rank scores can now be used to test the null hypothesis of no treatment/exposure
effect against a coherent alternative. Consider the poset test statistic The poset test has been proposed by Rosenbaum in 1991 [7] and has been discussed
further by the same author in [10] and [11]. His original proposal uses a different test
statistic which is, however, algebraically equivalent to our version here as can be seen
easily using the device of Mantel [12]. The poset approach generalises several familiar
nonparametric tests. If the outcome is one-dimensional and "<c" is
defined as the ordinary inequality " In the preceding section we formulated a rank score statistic T to test the hypothesis P(yi <c yj) = P(yj <c yi), for i=1,...,N1, j=N1+1,...N. A straightforward way to measure deviation from this null hypothesis is to consider the quantity
Both situations reflect a poor coherence of the expected result pattern and the
observed data with respect to the given partial ordering "<c". The
coherence coefficient The idea of this coherence coefficient resembles the correlation coefficients that
describe the degree of joint variation to two variables. In fact, the definition of We can give an unbiased estimate of
This is an unbiased estimator of The variance of
This quantity might be used to give an upper bound of the non-null variance, similar to known expressions for Kendall's t , of the form
where k is some positive constant. Such a result can be used to compute conservative
approximations of confidence intervals for 4.1 An Example from a Clinical Trial As a first example explaining and illustrating the concept of coherence in practical data analysis we use data of a small nonrandomized clinical trial comparing two regimes for treating 26 premature neonates suffering from severe respiratory distress syndrome. The two treatment regimes both involved the application of a natural porcine surfactant (Curosurf) as a surfactant replacement therapy, however, one regimen consisted of administering the surfactant dose early (i.e. within 15 hours after birth), whereas members of the other treatment group received their surfactant dose later (i.e. between 15 and 48 hours after birth). The intention of the trial was to analyse whether the severely diseased neonates can benefit from an early start of treatment. The primary endpoint for the evaluation was defined as survival of the patients up to 28 days after birth. However, important additional outcome variables were the total time on supplemental oxygen (O2) during these first four weeks and the acute effect of the therapy on the neonates' respiratory situation (measured by the fraction of inspired oxygen (FIO2) value at 24 hours after starting the surfactant therapy). Obviously, the information on the survival status dominates any information that can be drawn from the other two outcome variables, i.e. only among surviving patients the duration on O2 and the FIO2 values at 24 hours provide meaningful pieces of information for analysing treatment effects. Given the three-dimensional outcome data a coherent pattern of responses indicative of a beneficial effect of starting the treatment early would thus show a reduced proportion of deaths in this group combined with a shorter duration on O2 and lower FIO2 values at 24 hours among the survivors in this treatment group, always compared to patients of the late-treatment group. Formally, X1i denotes the survival status (0 = alive, 1 = dead), X2i
the total time on O2 (continuous variable, potential range: 0 - 672 hours) and
X3i the FIO2 value at 24 hours (continuous variable, potential
range: 0.21 - 1.0) of the i-th child. Then the outcome vectors
where at least one of the inequalities " The application of the poset test to the data of this small clinical trial yields a
value of 84 for the test statistic T and an estimate of 1787.2 for its variance under H0.
This gives a standardised T-value of 1.987 which leads to an asymptotical (two-sided)
p-value 0.047. Thus, there is a significant difference in the response pattern of outcomes
into the direction of the coherent alternative between the two treatment groups
demonstrating that those in the early-treatment group benefit from this therapeutic
regime. The estimated coefficient of coherence takes a value of The results of this poset approach are now compared to a separate analysis of the three outcome variables. The proportion of death in the late-treatment group (1/7 = 14.3%) is slightly higher than in the early-treatment group (2/19 = 10.5%), however, due to the small numbers this difference is far away from statistical significance. The analysis of the duration on O2 uses the values of dead subjects as censored observations. A comparison of the drastically different distributions in the two groups by an exact version of the logrank test yields a p-value of 0.01. The samples of FIO2 measurements at 24 hours show only minor differences resulting in a p-value of 0.48 obtained by a Wilcoxon-Mann-Whitney test restricted to the subgroup of survivors. Thus, only in one of the three outcome variables the univariate analysis revealed a significant benefit for the early-treatment group. The strength of the poset approach is that it not only combines the evidence of the three outcome variables in some formal way but gives the opportunity to look for a coherent pattern of results with respect to all three outcome factors. Given that our understanding of what a beneficial treatment effect has to look like is correct, the result of the poset analysis in our clinical trial strengthens and justifies the claim that the early-treatment regime has beneficial effects when treating neonates with severe respiratory distress syndrome by surfactant replacement therapy. This finding based on our small nonrandomised clinical trial has later on been confirmed in larger randomised trials [15]. It is now the accepted standard treatment regime to start the surfactant replacement therapy in this clinical situation as soon as possible. 4.2 An Epidemiological Example The second example illustrating the application of the techniques introduced in
sections 2 and 3 deals with data from an observational study in occupational epidemiology.
Briefly, Morton et al. [8] examined the distribution of lead in the blood of children
whose parents were employees in a factory that used lead in the production of batteries.
Only children of employees were enrolled in the study, no sample of controls was
available. The aim of the investigation was to analyse a potential relationship between
the level of lead in the children's blood and the intensity of lead exposure at the
parents' workplace. An important confounder of this relation, which has been measured in
the study, is the parents' individual hygiene practices that can reduce the lead
contamination of the children's home environment. Information on the parents' lead
exposure has been dichotomised into two groups of high and low exposure, respectively,
whereas data on hygiene practices were available on an ordinal three-point-scale. A
coherent pattern in the data indicative of an exposure effect of the parental lead
contamination on the children's level of lead in the blood would look like the following:
children in the group of highly exposed parents should have higher levels of blood lead
than those in the low-exposure group and simultaneously the lead values from
children with parents showing poor hygiene practices should also be higher than those in
the medium hygiene category which themselves should be higher than those in the good
hygiene stratum. Thus, if yi = (X1i , X2i) denotes the
pair of observations on the i-th child in the low-exposure group consisting of the lead
level (X1 , continuous variable) and the parents' hygiene practices (X2
, 0=poor, 1=medium, 2=good) and yj= (X1j , X2j) denote
the data of child j in the high-exposure group, then if Given this definition of coherence for the data under study, the poset test statistic T
comparing the high-exposure group (n=19) with the low-exposure group (n=15) attains a
value of 120 and an estimated variance of 2745.5. Hence, the standardised value of T is
2.29 yielding an asymptotical (two-sided) p-value of 0.022. The estimate of Despite the small samples there is compelling evidence in this epidemiological study that the parents' lead exposure at the workplace affects their children's blood level of lead. The same data analyzed with a simple Wilcoxon-Mann-Whitney test (ignoring the confounding influence of the parents' hygiene practices which seems to be only modest) yields a similar p-value of 0.016. Thus, in this case, the application of the poset idea to this problem does not materially change the conclusion that can be drawn from the data. Separate two-sample comparisons in the strata defined by the confounder are not possible due to the limited sample sizes. 5. Discussion In this paper, we have described a simple nonparametric approach to test for coherent alternatives. This methodology has its special domain in the analysis of nonrandomised studies since coherence is of specific importance in nonrandomized investigations to support the claim of attributability of the observed effect to the treatment/exposure under study. The idea can, however, equally be applied to randomized experiments offering the opportunity to look for coherent patterns of treatment effects in vector-valued outcome structures. In doing so, the poset provides a simple alternative to conventional multivariate statistical techniques. In a randomised trial the result of the poset test can then be directly interpreted as reflecting the strength of evidence for a discrepancy in the response patterns between the groups attributable to the treatment. Contrary to the nonrandomised case, no further attempts to clarify the sensitivity of the results to hidden biases have to be made (given that the randomisation has been performed properly). In our opinion, the general idea of this approach has a high potential of practical applicability in clinical and epidemiological trials. On the one hand, the mathematical and computational complexity of the methodology is very low, for moderate sample sizes the test statistics can even be computed using a pocket calculator. On the other hand, the interpretation of the test results is straightforward and gives useful application-oriented information on the topic of interest. Of course, the validity of the whole procedure with respect to providing supporting evidence for the presence of treatment/exposure effects depends critically on the correct specification of the coherent alternative. In other words, if our understanding of how some treatment/exposure affects the outcome variables is wrong and consequently the specified alternative does not adequately describe the pattern in outcome factors indicative of a causal treatment/exposure effect, the results of the poset test can be misleading. Furthermore, the concept of coherence is by no means the solution of all problems connected with nonrandomised studies. The problem of (overt and hidden) biases applies to the interpretation of poset test results as well, especially in the case of strongly correlated outcome variables where the presence of a specific source of bias affecting one outcome factor is automatically carried over to the other ones. Rosenbaum [10] addressed the problem of hidden bias by extensive sensitivity analyses. He argued that in most cases there is a substantial gain in insensitivity to bias for the results of a test against a coherent alternative when compared to the sensitivity analyses for the individual outcomes used in formulating the coherent alternative. The original poset test idea has been accompanied here by a straightforward suggestion
to measure the degree of coherence present in the data, which is interpretable as the
difference of two probabilities. Moreover, the proportion of decidable comparisons can
give useful information on the appropriateness of the partial order relation. For
practical use of the coherence measure Further topics to be addressed in future work on this issue are the extension to
multiple group comparisons and the construction of an exact poset test for small
sample sizes by deriving the finite distribution of the test statistic T under the null
hypothesis. Both areas seem to pose no serious difficulties so that first results on the
generalisation of the poset approach to these problems should be available soon.
|