RESEARCH FORUM--Making Inferences in
Research
Michael B. Raney, PHD, CO
ABSTRACT
The concept of making inferences from research data is discussed in this article. Three hypothetical examples of research projects in O&P are developed. In each example, the
emphasis is on drawing inferences from research data rather
than the statistical methods used.
The first research example compares patient ratings of two
spinal orthoses using the t-test for independent means. A hypothetical data set is used to illustrate common statistical inferences made from the analysis. Practical inferences also are
examined.
The second research example uses a one-way analysis of
variance (ANO VA) and post-test comparisons to evaluate
four lengths of a wrist-hand orthosis for carpal tunnel syndrome. A hypothetical data set is analyzed, and common statistical and practical inferences are illustrated.
The third research example uses the method of multiple
regression analysis. This example introduces the concept of
significant explainer variables, percentage of variance explained and prediction models. In each case, appropriate statistical and practical inferences are made from the hypothetical data.
The three research examples distinguish between the important concepts of statistical inference and inferences of
practical significance. These two key concepts are developed
and discussed in some detail.
Introduction
O&P practitioners encounter many important questions
during the course of performing their normal duties. For
example, which custom knee orthosis provides the best rotational control? Is a custom TLSO body jacket better than
a prefabricated one for postoperative care? An understanding of the methods of research design and statistical
analysis provides tools for finding answers to these important questions in an unbiased and scientifically accepted
manner.
During the research process, many inferences-or conclusions-are drawn from the data, giving the data meaning
and practicality. Data have little value without these inferences. Knowledge of statistics and research methods establishes guidelines for making statistical inferences while clinical knowledge provides the basis for making practical inferences from the data and analysis.
The following three research examples illustrate some of
the ways researchers infer from the data and show how
these inferences help answer research questions. The data
used in these examples are hypothetical and were generated for instructional purposes only.
t-Test for Comparing Two Means
A physician calls his local orthotist and asks, "Which is the
better orthosis for mild lumbar compression fractures: the
cruciform-style hyperextension orthosis or the lumbosacral
(LS) corset?" The orthotist notes that in the past physicians
have prescribed the cruciform-style hyperextension orthosis and LS corset in nearly equal proportions for this type
of injury. Since there is no clear answer, the orthotist and
the physician decide to research the subject.
First, they design a questionnaire to help the patients explore their ratings of the orthosis with respect to comfort,
ease of use, pain reduction and other factors. A straight line
marked with assigned values ranging from 1 to 10 is constructed, with 1 representing extremely dissatisfied and 10
representing extremely satisfied. The patients are asked to
mark the point on the line that best summarizes their responses on the survey. The researchers measure the length
of the line as a quantified measure of satisfaction/dissatisfaction to be used in the analysis.
The next 100 patients with mild to moderate lumbar
compression fractures are selected as subjects for the study
and are randomly assigned to either the cruciform-style hyperextension orthosis group or the LS corset group (50 per
group). After wearing the orthosis for six weeks, each patient is asked to complete the satisfaction survey. His or her
rating score of 1 to 10 is recorded along with his or her
group identification (cruciform-style=1; LS corset=2).
Next, descriptive statistics are calculated for the two
groups (see Table A
).
The mean satisfaction ratings for the two groups indicate
a higher average rating for the cruciform-style orthosis
(cruciform-style=6.15, corset=4.57). However, analysis of
the data using a t-test for independent means is necessary
to determine if that difference is statistically significant or
due to chance (1-3). In addition, the standard deviation
(variation) is larger for the corset group, meaning their responses were more varied (cruciform-style=1.56,
corset=1.97). Similarly, a real difference in the variation of
the two groups cannot be inferred without first performing
the appropriate test of statistical significance (1,4).
The t-test assumes equal variances (standard deviation
squared) for both groups. This assumption is tested using
Levene's test for equality of variances (5,6). This test yields
an F-ratio of 2.847, which is not statistically significant at the
p=.05 level. An F-ratio is a ratio of two variances used to determine if the difference between the two variances is statistically significant (1). From this test, it can be inferred that
the variances for the cruciform-style and corset group are
equal, and the researchers can proceed with the t-test to determine if the mean satisfaction ratings differ significantly.
The t-test yields a t-value of 4.458, which is statistically
significant well beyond the p=.05 level. From this result, it
can be inferred that the difference between the means for
the cruciform-style and corset groups is not a chance difference but is a statistically significant difference due to design differences in the two orthoses. The patients in the cruciform-style group rate their satisfaction significantly higher than those of the corset group.
Once it has been established that the difference in means
is statistically significant, the more important question of
practical significance must be addressed (7). A statistically
significant difference always can be achieved by making the
sample size large enough. There is even an equation to estimate how large a sample is necessary to assure a statistically significant difference (7). (A statistically significant
difference simply indicates the difference in means is
greater than would be expected due to chance alone.)
Practical significance, on the other hand, is a judgment by
the researcher that the difference in means is large enough
to be of some practical or clinical value. The mean satisfaction scores of 6.15 for the cruciform-style group and 4.57
for the corset group show a difference of 1.58 on a scale of
1 to 10-enough to lift the rating from a slightly negative
rating to a positive rating. An increase of 1.58 on a scale of
10 yields a 15.8-percent increase in patient satisfaction. The
researcher probably would conclude the difference in
means is large enough to be of some practical significance
in the life of the patient wearing the orthosis.
Based on this limited study, the physician and orthotist
might conclude the cruciform-style hyperextension orthosis is a better orthosis than the LS corset as judged by patients who wear it. However, from the relatively mediocre
mean ratings of each group it could be inferred ample room
exists for improvement in orthotic design for mild to moderate lumbar compression fractures.
One-way Analysis of Variance (ANOVA)
A major manufacturer of prefabricated orthoses has noticed sales of prefabricated static wrist-hand orthoses
(WHO) for carpal tunnel syndrome (CTS) have risen dramatically during the past five years. However, this manufacturer's sales have not kept pace with this growth. The
manufacturer has received considerable feedback indicating its WHO, which is six inches in length, is too short. The
manufacturer, in conjunction with local physicians and orthotists, decides to fund a study to establish the optimum
length for a WHO for CTS.
Optimum length is defined in terms of a significant decrease in symptoms in correlation with an increase in
length of the WHO. The four lengths to be used in this
study are six, seven, eight and nine inches.
The next 200 adult patients with CTS are randomly assigned to wear one of the four lengths of WHO, resulting in
four groups of 50 patients each. After four weeks of wearing the WHO, each patient is asked to rate his or her reduction in symptoms on a scale of 1 to 10, with 1=no symptom relief and 10=total symptom relief. Symptoms include
pain, burning, tingling and numbness.
A straight line marked with values ranging from 1 to 10
is constructed. Each patient is asked to mark the point on
the line that best represents his or her rating for symptom
relief as described above. The researcher measures the
length of the line and records the rating for that patient.
One-way ANOVA generally is used to determine if two
or more means differ significantly from one another (1,8,9).
In this study, the four means of the four WHO length
groups will be compared. With one-way ANOVA, a significant F-ratio indicates two or more of the means differ significantly from each other, but it does not specifically identify which means differ from which. A post-test comparison
method called Tukey's honestly significant difference
(HSD) can be used to compare each mean to every other
mean to determine where the significant differences occur
(2,3,5).
After all data were collected for the study, a one-way
ANOVA was performed, and descriptive statistics were
calculated (see Table B
). The means for symptom relief
show some differences, but those differences cannot be interpreted until the appropriate statistical test of significance has been performed-in this case, one-way ANOVA.
Similarly, the standard deviations show some differences;
so the variances must be tested for statistically significant
differences using the Levene statistic (5,6). The Levene statistic tests for homogeneity of variance. A statistically significant difference in variances requires an adjustment to
the F-ratio in the ANOVA calculations.
One-way ANOVA assumes homogeneity of variances
for the WHO length groups being analyzed. In testing for
homogeneity, a Levene statistic of .802 was obtained with a
level of significance of p=.494. This indicates the variances
(standard deviation squared) for the four groups are homogeneous. Thus, the one-way ANOVA now can be performed.
The one-way ANOVA with four WHO length groups indicates an F-ratio of 69.7, which is statistically significant
well beyond the p=.05 level. This significant F-ratio indicates one or more statistically significant differences exist
among the four group means. To identify which means differ significantly from which, post-test comparisons were
performed using Tukey's HSD (see Table C
).
Table C
presents a comparison of each mean with every
other mean and shows the differences. The last column is
the significance column (calculated p-value) and shows only one difference that is not statistically significant: the difference between groups 3 and 4 (p=.809). This result indicates the increases in WHO length from six to seven inches
and from seven to eight inches yield statistically significant
increases in symptom relief. The increase in length from
eight to nine inches did not yield a statistically significant
increase in symptom relief. Since the lack of statistical significance indicates the difference between groups 3 and 4 is
due to chance rather than a real difference, it can be inferred that an increase in WHO length from eight to nine
inches is not warranted with respect to symptom relief.
Before any inferences are made about an increase from
six to seven inches or seven to eight inches in length, the issue of practical significance must be examined (7). Table B
shows a mean of 3.14 for the six-inch group and a mean of
4.99 for the seven-inch group. The difference between
means is 1.85. On a rating scale of 1 to 10 this would be an
18.5-percent increase in symptom relief. The inference
could be made that this level of symptom relief has practical significance to the patient and that an increase in WHO
length to seven inches is justified.
Table B
shows a mean of 6.84 for the eight-inch group.
The difference between the seven-inch group and the
eight-inch group also is 1.85, an 18.5-percent increase in
symptom relief. As before, this increase in pain relief would
be judged of practical significance to the patient. Overall,
the inference could be made that an increase in WHO
length from six to eight inches is warranted. However, a further increase to nine inches in length would not be justified
since the increase from eight to nine inches yields no statistically significant increase in symptom relief.
Multiple Regression with Two Independent Variables
A team of certified orthotists seeks information about
which variables are important in understanding or explaining the initial acceptance process of patients required to
wear orthoses. Some patients easily adjust to wearing orthoses, and others never do.
The variables the team is trying to understand or explain
are called dependent variables. In this study, the dependent
variable is in the acceptance rating, which is a measure of
how well each patient has accepted an orthosis during the
first month of use. Each patient is asked to rate how well he
or she has accepted the orthosis on a scale of 1 to 5, with
1="great difficulty accepting the orthosis" and 5="accepted
it very easily." A straight line marked with values ranging
from 1 to 5 is constructed, and the patient is asked to mark
the point on the line that best describes his or her acceptance rating as described above. The researcher measures
the length of the line to get a rating for that patient.
The independent variables are those factors used to explain or account for the variance in the dependent variable.
The independent variables also can be used to predict how
well a patient will accept the orthosis. In this study, the team
decides to examine age and gender as the two independent
variables.
Over the next year, 1,327 adult patients wearing a variety
of qualifying orthoses for at least one month were included
in the study. Multiple regression analysis was performed on
the data with acceptance rating as the dependent variable
and age and gender as the two independent variables. The
multiple regression analysis (10,11) will be used to:
- determine which independent variables (age and gender) are statistically significant in explaining the variance in
the dependent variable,
- determine what percentage of variance in the dependent variable (acceptance rating) is explained by each of
the independent variables, individually and cumulatively,
and
- set up a model (equation) that can be used to predict
how well a patient will accept an orthosis based on his or
her age and gender.
The results of the multiple regression analysis are found
in Table D
. The first independent variable entered into the
regression model is age. The F-ratio of 458.9 for age is statistically significant well beyond the p=.05 level. From this
value, it can be inferred that age explains a statistically significant amount of the variance in the dependent-variable
acceptance rating. The R-square column in Table D
indicates the cumulative proportion of variance of the dependent variable explained as each independent variable is entered into the regression model (1,4).The R-square value of
.258 for age in Table D
indicates 25.8 percent of the variance in acceptance rating is explained by age. The statistically significant F-ratio also indicates that the variable age
should be included in the prediction model.
The second independent variable, gender, also is statistically significant (F=391.7) well beyond the p=.05 level.
From this figure, it can be inferred that gender also is a statistically significant variable in explaining the variance in
the dependent-variable acceptance rating. The R-square
value of .372 for gender shows that when age and gender
are combined in the regression model, a total of 37.2 percent of the variance in acceptance rating is explained by
these two variables together. The increase in R-squared
from .258 to .372 shows the gender variable explains 11.4
percent of the variance above the 25.8 percent previously
explained by the variable age. The significant F-value for
the gender variable means it also should be included in the
prediction model.
Both age and gender were found to be statistically significant in explaining a portion of the variance in acceptance
rating. Having passed the hurdle of statistical significance,
the more important question now can be asked: What is the
practical or clinical significance of age and gender as explainers of the dependent variable, acceptance rating?
Since age explains or accounts for 25.8 percent of the
variance, it can be considered an important variable in explaining acceptance rating and would have practical significance. The gender variable only accounts for 11.4 percent
of the variance in acceptance rating when combined with
age. This would be considered less important than age but
probably would still be considered of practical significance
or importance in explaining the variance in acceptance rating. The team of certified orthotists probably would infer
both age and gender are important variables as explanations of the variance in the dependent-variable acceptance
rating. If other independent variables were found to explain higher percentages of variance in the dependent variable, they might replace age and/or gender.
While the age and gender variables have explained 37.2
percent of the variance in the variable acceptance rating,
62.8 percent of the variance remains unexplained. It can be
inferred that other independent variables are needed in the
model. It is the responsibility of the astute researcher/orthotist to determine what these other variables might be
and identify them through further research.
The result of the regression analysis provides the regression coefficients for the prediction model. Prediction in this
context refers to the process of estimating the value of the
dependent variable from an equation that includes the value of each independent variable. In the prediction equation, each independent variable is multiplied by its respective regression coefficient, and a constant term also is
added in. The prediction equation for the regression example above is taken from regression analysis output not
shown here. The prediction equation is:
acceptance rating = (-.059)(age) +
(-1.338)(gender) + 7.536
This equation would allow substitution of values for age
and gender (1=male, 2=female) for a new patient and prediction of an acceptance rating for the patient before he or
she ever tried the orthosis. However, the prediction is subject to error. The more variance in the dependent variable
explained by the independent variables, the more accurate
the prediction model. Also, the more variance explained,
the better the understanding of the dependent variable and
its relationship to the independent variables.
Summary and Conclusions:
This brief introduction to making inferences in research
shows examples of two types of inferences. The first is a statistical inference, which generally involves interpreting the
outcome of a test of statistical significance. With statistical
inference, a judgment is made regarding a difference between means, a difference between variances, the magnitude of a regression coefficient or one of many other possible statistical judgments.
With the second type of inference, a judgment is made as
to the practical or clinical significance associated with a statistically significant outcome.
Statistical significance is a prerequisite to inferences
about practical significance. However, practical significance
is the more important area for making inferences from the
research.
Special knowledge of statistics and research is necessary
when calculating results for statistical inferences. However,
personal computers and relatively inexpensive statistical
software packages have greatly simplified this process. In
addition, when making inferences of practical significance,
the orthotist and healthcare provider must have essential
clinical knowledge and experience.
MICHAEL E. RANEY PHD, CO, practices at Conner Brace Co.,
3829 Medical Pkwy., Austin, TX 78756. Prior to entering the O&P
field, he worked as a research analyst and computer programmer
for 12 years.
References:
- McNemar Q. Psychological statistics, 4th ed. New York: John
Wiley and Sons Inc., 1969.
- Daniel WW. Biostatistics: a foundation for analysis in the
health sciences, 5th ed. New York: John Wiley and Sons Inc., 1991.
- Snedecor GW, Cochran WG. Statistical methods, 6th ed.
Ames, Iowa: The Iowa State University Press, 1967.
- Hays WL. Statistics, 4th ed. San Francisco: Holt, Rinehart
and Winston Inc., 1988.
- Winer BJ. Statistical principles in experimental design, 2nd
ed. New York: McGraw-Hill Book Co., 1971.
- SPSS Inc., SPSS base 7.0 for Windows: users' guide. Chicago:
SPSS Inc., 1996.
- Borg WR, Gall MD. Educational research: an introduction,
3rd ed. New York: Longman Inc., 1979:409.
- Zolman IF. Biostatistics: experimental design and statistical
inference. New York: Oxford University Press, 1993.
- Bland M. An introduction to medical statistics. New York:
Oxford University Press, 1987.
- Draper NR, Smith H. Applied regression analysis, 2nd ed.
New York: John Wiley and Sons Inc., 1981.
- Kerlinger FN, Pedhazur EJ. Multiple regression in behavioral research. New York: Holt, Rinehart and Winston Inc., 1973.
|