RESEARCH FORUM -- Simple Regression:
A Statistical Technique
in the Investigation of a
Relationship Between Two
Variables
C. Don Rossi, MS
ABSTRACT
Simple regression models sometimes are employed by researchers to analyze the relationship between a dependent
variable and an independent variable.
The most common relationship analyzed is a linear association expressed by the following equation:
y = b0 + b1x
where y represents the dependent variable, x represents the
independent variable, b0 represents a constant, and b1 represents the slope of the linear fit.
This article demonstrates the use of simple regression applied to gait analysis. The relationship of energy consumption in milliliters of oxygen per kilogram per minute (ml/kg/
mm) as a function of gait velocity in meters per minute
(in/mm) is studied.
Introduction
In the field of orthotics and prosthetics, investigators may
research the relationship between energy consumption of a
subject and speed in level walking. For example, researchers may want to study a sample of subjects to examine the relationship between energy consumption and
speed within three age groups-children, teens and adults.
Regression is a statistical technique for the investigation of
such a relationship.
Regression in its purest form pertains to the situation
in which the investigator chooses the values of the independent variable. A classic example is in the laboratory when the experimenter predetermines the doses of an agent to be administered to laboratory animals. He administers these doses and observes some
measures of response. His chief interest is with regression analysis for determination of a dose-response relationship, where the dose is the independent variable, and the response the dependent variable. The assay of many pharmacological and biological substances depends on just such a procedure and
the use of regression analysis for the quantification of
the dose-response relationship (1).
Application
To illustrate the technique, Table A
presents data on energy consumption and speed of walking for three age groups
(19 subjects in each age group). The objective with these
data is to examine the relationship between two variables
to determine in particular what happens to energy consumption as speed increases for each of the three age
groups.
Regression studies the change in the mean value of one
variable (the dependent variable) as the other variable (the
independent variable) changes. We begin by finding the
straight line that best fits the data so it closely approximates
the true relationship between the speed, x, and energy consumption, y, within a specific age group. A two-dimensional plot known as a scatter diagram shows the (x,y) pairs of
observations for each of the three groups (see Figures la,
1b and 1c
). Following are four statistical assumptions required to determine the best-fitting line:
- independence;
- the straight-line assumption;
- the variance of y is the same for any fixed value of x;
- for any fixed value of x, y has a normal distribution.
Method
The method of least squares, a mathematical process of estimating parameters in an equation, provides the best-fitting straight line represented by the following equation:
where b0 is a constant called the y intercept, and b1 is a constant called the slope of the regression line and denotes the
change in y brought about by a unit change in x. The principle of least squares involves minimizing the sum of
squares for error, resulting in a set of normal equations. The
solutions to these equations yield the unique value for the
slope b1 and the intercept b0 for the regression line associated with a set of bivariate data.
The term to be minimized is the sum of squares for error
(SSE) given by the following equation:
The normal equations that result from the minimization
process are as follows:
Solving these two equations for b0 and b1, we get
where x and y are the means of the sets of x and y values.
These estimates, b0 (the y intercept) and b1 (the slope), generate the straight lines (linear fits, see Figures 2a, 2b and 2c
)
overlaid on the plotted points of the scatter diagrams to exhibit the theoretical relationship between the speed and energy for each age group.
Results
What can be inferred from these linear models? Note b1,
the coefficient of the x terms, represents the rate of change
in y given x; in other words, it's the rate of energy consumption given speed.
We see the children's rate is 0.188; the teens' rate is 0.191;
and the adults' rate is 0.162.
The linear model assumption provides rates that are constant over the range of x. Nonlinear models would be more
appropriate for teens and adults; y increases more rapidly
where x exceeds 100 (see Figures 2b and 2c
). Nevertheless,
the simple linear regression allows, for example, a pre-/
post-investigation of kinds of brace intervention on rates of
energy consumption with respect to gait velocity within age
groups.
Summary
Regression is a mathematical technique of establishing a
relationship between variables. The variables generally are
considered to be dependent and independent, i.e., one is
dependent on the other. The regression identifies this dependency in terms of a mathematical relationship. Moreover, regression can indicate a "quasicausal" relationship
between the variables.
The simplest form of regression is a "first-order regression," where a straight line can be drawn through the points
on graphs where the variables have been plotted. This type
of regression is said to be linear. More complicated relationships result in nonlinear relationships that appear as
curved-line plots of the variables.
This article demonstrates the use of simple (linear) regression to relate the energy expended (ml of oxygen) during gait as a function of walking velocity. This technique is
of value because, to the practitioner given the correlation
between energy expenditure and velocity, it is possible to
clinically measure velocity and determine the associated
energy expenditure without the cost of an expensive laboratory to measure oxygen consumption.
With this type of data analysis it is possible to clinically
assess the energy expenditure of an orthosis or prosthesis.
Optimally, the practitioner could make design changes, reassess the velocity and fine-tune the intervention.
C DON ROSSI, MS, is assistant professor of biostatistics at the department of physical medicine and rehabilitation at Baylor College of Medicine, Houston, TX 77030.
References:
- Colton, T. Statistics in medicine. Boston; Little, Brown and
Co., 1974.
|