RESEARCH FORUM--Methodology-
Measurements, Part I:
Principles and Theory
Thomas M. Gavin, CO
ABSTRACT
Measurement refers to the assessment, estimation, observation, evaluation, appraisal or judgment of an event. Measurement in research (also considered a dependent variable)
is the process of assigning numerals to objects to represent
quantities of characteristics according to certain rules. Those
involved in conducting a study must choose the most appropriate measurement scale.
A study should contain one or more scales of measurement that meet the logical requirements of measurement. The
least powerful measurement scale is the nominal scale, which
consists of descriptive variables in no particular order. By
contrast, ordinal scales have all of the requirements of nominal scales but also have the property of order. Nominal and
ordinal scales usually are subject to the less powerful statistical tests such as the chi -square or the Mann-Whitney U tests.
Interval and ratio scales are much more powerful than are
nominal and ordinal scales because the variables provide
more information about the phenomenon of interest. Parametric tests such as the t-test or A NOVA can be used for interval and ratio scales.
When conducting an experiment, researchers must take
steps to ensure the statistical test is reliable and valid, minimize error and biases, and use measures that are precise and
accurate. Accomplishing such tasks usually requires a multidisciplinary research team but can be done by the single researcher. Orthotic and prosthetic practitioners should become more involved in this type of research, using a multidisciplinary approach, since physicians are currently less active in O&P research than they were in the past.
Introduction
Research is the making of observations in a systematic
fashion. Measurement refers to the assessment, estimation,
observation, evaluation, appraisal or judgment of an event.
Measurement in research (also considered a dependent
variable) is the process of assigning numerals to objects to
represent quantities of characteristics according to certain
rules (1). To scientifically analyze the affect of an orthosis
on low back pain or the effect of a new foot on a transtibial amputee's gait, meaningful measurements that assign
numerals to the levels of back pain or the events of gait
must be made so they may be analyzed statistically.
In the day-to-day clinical practices of orthotics and prosthetics, practitioners are exposed to a vast amount of information regarding the effect of O&P intervention on the
quality of life of their patients. Orthotists and prosthetists
make subjective or judgmental measurements to assess and
compare gait patterns, back pain outcomes, seating stability, patient satisfaction, etc. Practitioners usually record in
their clinical records data that may have meaning to them
as clinicians but may not have any meaning to researchers.
For example, a practitioner's chart may read, "Fitted
KAFO for genu varum; patient's knee alignment appears
to be neutral on the coronal plane." These data indicate the
KAFO is a clinical success; however, it is unknown how the
alignment "neutral" compares to the contralateral limb or
the affected limb before applying the KAFO. If a practitioner uses a hand-held goniometer, measures the alignment and writes on his or her chart, "Fitted KAFO for genu
varum; patient's affected limb corrects from 10-degree
genu varum to 5-degree genu valgum, and the contralateral knee measures at 5 degrees of genu valgum," then there
is a comparison to the coronal knee alignment before and
after orthotic intervention from the affected limb to the unaffected limb. A measurement instrument was used to gain
a numerical value that can be useful to both a clinician and,
with some validation, to a researcher.
To go beyond charting vital measurements that assess
the effectiveness of an orthosis or prosthesis, the researcher
may collect data in a more meaningful fashion by developing and validating an instrument so the data vital to the
research question or problem may be collected from the
target and control groups in an organized, reliable and
valid fashion. The most common instrument for clinical research is the questionnaire or survey.
The questionnaire (or survey) usually is developed by the
research team, which should be comprised of the orthotist-prosthetist, a statistician or psychometrician, and a research
methodologist. Technical specialists such as engineers, kinesiologists. physiologists, psychologists, etc. are helpful if specialized measurements are required.
While the orthotist-prosthetist alone may know what measurements are necessary to record an event, the questionnaire instrument should be developed objectively by someone in the research group who will not be overly optimistic
and biased or ask misleading questions that may suggest the
"right" answer to the subject. This approach will allow for
objective data collection that will account for most, if not all,
of the variables that may influence the outcome, thus providing an appropriate basis for statistical analysis (2).
Measurements
Researchers conducting a study must choose the most appropriate measurement scale. The test should contain one
or more scales of measurement that meet the logical requirements of measurement. Figure 1
depicts the four most
common scales of measurement from the least to the most
powerful. The chosen measurement scale must contain precise and accurate values, and the instrument should be
practical, valid and reliable (3).
Scales
Figure 2
is a typical design of a study showing classification
of measurements and the resulting information. The type of
variable produced by the measurement determines the
type of scale used. Some types of scales are quite simple but
may not be appropriate for more informative and powerful statistics. Other scales may exceed the amount of information needed as well as exceed necessary power. Thus,
unnecessary increases in the amount of time the research
project requires also may result.
Nominal Scale
The lowest level of measurement is the nominal scale, also
referred to as the classificatory scale (4). This scale consists
of mutually exclusive categories, meaning once something
placed in one category it cannot be put into any other category.
The categories of a nominal scale can have either verbal
or numeric labels. However, if numeric labels are used for
any categories, the researchers cannot sensibly apply their
knowledge of numbers to the categories except to say that
different numbers represent different measures.
An example of nominal variables are the numbers on foot.
ball players' jerseys that are used to identify each player. No
two players on the same team have the same number; so one
can say each number is different. However, if one player's
number is 8l, and another player's number is 55, subtracting
or averaging the numerical values of these numbers is possible but would have no valuable meaning in this context. The
names on the backs of these same football jerseys also are an
example of nominal variables. Each name is different and can
be used only to identify the player. Other examples of nominal variables are side of amputation (left or right) or type of
scoliosis (idiopathic, neuromuscular or congenital).
The nominal scale contains named or labeled categories
that are not ordered or ranked; within these categories, the
frequency of observations may be tallied. This scale permits
the two most basic measurement operations: classifying observations along a dimension and counting the frequency of
the observations within each category.
The nominal scale is used to measure qualitative variables and yields frequency data that can be subjected to
nonparametric statistical tests such as the chi-square test.
The nominal scale is the weakest scale since it yields data
that are not ordered and cannot be added or subtracted
sensibly and therefore cannot be analyzed with the more
powerful statistical procedures.
Ordinal Scale
Ordinal scales have all of the requirements of nominal
scales but also include the property of order. If the categories of a scale are ordered, they constitute an ordinal
scale. For categories that use numbers, the numbers must
correspond to the order of the categories. Although ordinal
scales contain named or labeled categories that are ordered
or ranked, they do not represent quantities. Nothing is
known about the size of the interval between any two numerals. The ordinal scale permits the classification of observations along the dimension of interest and the counting
of the frequency of observations within each category. This
scale is used to measure qualitative variables that can be
subjected to nonparametric statistical tests.
An example of an ordinal scale is the ranking of a race.
The winner comes in first, "1," and the next one to finish is
second, "2," etc. The order of numbers represents the order
in which the contestants finished the race. In the race example, the smaller number represents the best finish.
In an alternate example of an ordinal scale, four restaurants can be ranked from most-liked to least-liked. They
can be assigned the numbers 112, 88, 4 and 1, with 112 representing the most-liked and 1 representing the least-liked.
As long as "the higher the number, the more liked" remains
true, they are ranked on an ordinal scale.
Another example of an ordinal scale is the numerical
ranking of three different types of prosthetic feet by a prosthetist or patient. He or she is using an ordinal scale if a
"1" is assigned to foot A because it is the least-liked, a "2"
is assigned to foot B because it is liked more than foot A
but less than foot C, and a '3" is assigned to foot C because it is liked the best. The patient or practitioner may
like foot B much more than foot A but only slightly prefer foot C to foot B. An ordinal scale has still been used
since the components were assigned an ordered numeric value corresponding to how much each foot is liked;
however, the interval or distance between the amount
each foot is liked cannot be assumed to be equal and is an
unknown quantity.
The ordinal scale is a stronger form of measurement than
the nominal scale because variables are ordered and ranked
in the former. Without known quantities between each variable, however, many of the more powerful statistical tests
cannot be used with this scale. Ordinal data can be subjected to nonparametric tests such as the chi-square test. For
more information on nonparametric tests see References 4
and 5.
Interval Scale
Interval scales have all the requirements of ordinal scales
with the added benefit of a known, fixed quantity as the distance between items. An interval scale has a constant unit
that makes the distance between values meaningful. The interval or distance between any two adjacent units on the
scale are assumed to be equal to the interval between any
other two adjacent units on the same scale. This scale has
no fixed zero that represents a zero quantity of the dimension of interest.
The Fahrenheit temperature scale is the most common
example of an interval scale since the difference between 20
and 21 degrees can be assumed to be equal to the distance
between 100 and 101 degrees, and zero does not represent
the absence of the dimension of interest that is temperature.
More powerful statistical procedures can be used with interval scales because data obtained on this scale may be subjected to all mathematical operations except the forming of
ratios. This scale allows the data to be subjected to more
powerful parametric tests such as the t-test and analysis of
variance (ANOVA) since it yields quantitative data. For
more information on parametric tests see Reference 5.
Ratio Scale
A ratio scale has all of the properties of an interval scale plus
a true zero point. The value of zero on a ratio scale indicates
the complete absence of the characteristic of interest.
The importance of having a true zero point is that ratios
of values on the scale can be meaningfully constructed. On
the ratio scale, 10 truly is twice as much of the characteristic of 5.
For example, a 40-degree scoliotic curve is twice as large
as a 20-degree curve, and an object that weighs 200 lbs is
twice as heavy as an object that weighs 100 lbs. A ratio scale
in these two examples requires the following to be true: 1) a
zero-degree Cobb angle represents the absence of scoliosis;
2) zero lbs represents the absence of weight; and 3) each increment on the scale is equal. These statements are true in
both instances since there are true zeros, and the increments
are intervals that can be assumed to be of equal values.
However, it is not true that 20 degrees Fahrenheit is twice
as warm as 10 degrees Fahrenheit even though the intervals
are assumed to be equal. This is because zero degrees
Fahrenheit does not represent the absence of temperature.
Other examples of ratio scales are length, time, force, volume and area. Obtained ratio data may be subjected to all
mathematical operations including ratio forming and
therefore may be subjected to parametric tests such as the
t-test or ANOVA since it yields quantitative data.
Quantitative data such as interval or ratio data can be
converted to categories on a qualitative scale. Such data
can be characterized as above and below the mean or median (upper and lower half); within upper, middle or lower
thirds; etc. However, categories on a qualitative scale cannot be converted to quantitative categories. The conversion
from quantitative to qualitative measurement is useful
when determining the relationship between measures on a
quantitative scale and measures on a qualitative scale.
Error
In all measurements there is error. For example, if a patient
with idiopathic scoliosis returns to a clinic to learn her
curve measures 27 degrees (whereas it measured 25 degrees three months earlier), the curve may be considered
unchanged since the random error of the measurement is 5
degrees. However, if the curve measured 21 degrees three
months prior to the 25-degree visit, the measurement has
changed more than 5 degrees in a six-month period, meaning the curve is considered progressive since the measured
difference is outside of the error range. Knowing the error
of measurement is important in determining the meaning
of a change and is important to determine measurement
precision.
The overall degree to which measurements in a sample
represent the phenomenon or event of interest in the population is a function of two sources of error: sampling error
(6) and measurement error (7). Both of these errors have
random (wrong result due to chance) and systematic
(wrong result due to bias) components. Minimizing measurement error will enhance the validity of drawing inferences from a study by rendering that study relatively free of
random error (precise). The greater the random error, the
less precise the measurement.
Precision
A measurement that has nearly the same value each time it
is taken is a precise measurement. Precision refers to the
exactness of a measure. A ruler can give a precise measurement of length whereas an interview designed to measure
quality of life is much more likely to produce values that
vary from one occasion to the next.
The sensitivity of a scale increases with the number of possible values that can be reliably used. A questionnaire that
uses a scale containing two reliable variables such as like"
and "dislike" is less sensitive than a scale that contains four
reliable variables such as "strongly like," "somewhat like,'
"somewhat dislike" and "strongly dislike." If the scale could
have 100 values of "like" that were reliable (which is unlikely), it would be even more sensitive and precise.
Precision has a major influence on the power of a study.
The more precise the measurement, the greater the statistical power at a given sample size to estimate mean values
and to test the hypotheses. The greater the error, the less
precise the measurement.
For example, a scoliotic curve measured at 40 degrees is
more precise than a curve measured at 40.35 degrees. Al.
though using a scale that contains decimal units yields a
scale with a greater number of variables, the value of 40.35
degrees would be more precise if it could be reproduced
with minimum variability. This is unlikely since there al
ready are 5 degrees of variability when using integer units
for the Cobb measurement method.
Three main sources of error reduce precision in measurements:
- Observer variability refers to variability in measurement caused by the observer and includes the choice of
words, body language, and voice inflections of the observer
during the interview or the eye-hand coordination in using
a mechanical instrument.
- Subject variability refers to the intrinsic biological variability in the study subjects due to fluctuations in mood,
barometric pressures, psychological stress and length of
time since last medication. These factors are relevant considerations for a study involving measurements of the effect of an orthosis on low back pain. Frequently, depression
that results from the pain itself will affect the precision of
the measurement.
- Instrument variability refers to variability in measurement due to fluctuations in the environment such as season
change, temperature change and background noise. The
precision of a variable is described as the standard deviation of repeated measurements.
Accuracy
Being relatively free of systematic error (accurate) also will
enhance the validity of drawing inferences from a study
The accuracy of a variable is the degree to which it actually represents what it is intended to represent. This has a major impact on the internal and external validity of a study
(the degree to which observed findings lead to correct inferences about the phenomenon taking place in the study
sample and in the universe).
Accuracy is different from precision; precision has to do
with measuring a variable several times and coming up with
nearly the same value each time, whereas accuracy involves
determining how well a variable measurement compares
with another accepted measurement.
If a hand-held goniometer were used to measure the
knee-flexion angle in a knee-ankle-foot orthosis and after
20 measurements the goniometric reading was nearly equal
on every measurement, this would be considered precise
but not necessarily accurate. However, if the goniometric
measurement were compared to a more sophisticated measurement using a Watsmart? camera system measuring infrared sensors placed along the lateral aspect of the leg and
thigh and both measurements were nearly equal, the goniometric measurement could then be considered accurate
but not necessarily precise.
Accuracy is a function of systematic error or bias. The
greater the error, the less accurate the variable. The three
main classes of measurement error have counterparts for
accuracy: observer bias, subject bias and instrument bias.
- Observer bias refers to consistent distortion, conscious
or unconscious, in the perception or reporting of the measurement by the observer.
For example, at the end of a long day an orthotist may
measure the dorsiflexion attitude of 10 leg casts and note
they all seem to be perpendicular. The next morning repetition of the same measurements may indicate some were
93 degrees, some were 95 degrees, and very few were actually perpendicular.
Since the desired outcome was a 90-degree dorsiflexion
angle and the orthotist did not want to continue modifying
the cast, an unconscious error was made in the direction of
the orthotist's desire (or bias) to have an outcome of 90 degrees. This is an example of error due to observer bias; the
orthotist had a goal of achieving a perpendicular outcome.
- Subject bias refers to consistent distortion of the measurement due to subject bias.
For example, a patient may fight a manual muscle test and
bias the result by blocking the true range of dorsiflexion he
or she is capable of attaining. This will yield an inaccurate
measurement of range of motion of the ankle. Subject bias
also may occur when only the perception of the phenomenon of interest has changed, as in the selective recall or reporting of an event.
Instrument bias can result from faulty function of a mechanical or electromechanical device used for measurement. A force-sensitive resistor used to measure forces in
orthoses may creep after a prolonged usage and drift downward (record less than actual force) as the relaxation of the
resistor between tests no longer matches the creep. This will
yield faulty data until the resistor is recalibrated.
Instrument bias also can result from misleading questions
on a questionnaire. For example, a question could be stated,
"Does the energy-storing foot called the Carbon Copy II?
improve your gait more than the SACH foot?" This question implies one foot is better than the other since more description (energy-storing) is given for the Carbon Copy II,
suggesting it is the "right" answer. A more appropriate way
to ask the question is to label the feet "A" and "B" and ask,
"Which foot do you feel gives you the best gait?"
Conclusion
When deciding to conduct research using a questionnaire
or survey, it's important to use a multidisciplinary team approach to develop unbiased questions. O&P professionals
usually are biased toward one outcome over another. Practitioners usually want the patient to achieve deviation-free
gait, contracture control, pain reduction, fracture stabilization, etc.
While the practitioner should help formulate the types of
questions to be asked, the methodologist or statistician
should actually word the questions to avoid bias. The less
the methodologist knows about O&P, the more objective
the questionnaire.
Once the instrument has been formulated and edited by
the methodologist, it must be validated. Unfortunately, not
many valid questionnaires are available for O&P applications; so once a new questionnaire is validated, it should be
made available to other practitioners with the same interest. The methodologist and practitioner will perform the
validation, yielding data that will allow for adjustment of
the method to minimize error.
A power analysis is subsequently made to determine the
sample size (number of subjects needed in the study). The
sample size, type of study and type of variable will then determine the appropriate statistical test to be done.
While it is possible to conduct these types of studies
without the help of a methodologist, any study is best done
by a multidisciplinary group. This will ensure error is reduced by minimizing biases and using proper validation
techniques and statistical methods.
The results of a well-planned multidisciplinary study are
difficult to refute and are well-received by O&P colleagues
as well as professionals in other medical disciplines.
THOMAS M. GAVIN, CO, is an ABC-certified orthotist at the
Orthotic-Prosthetic Center of BioConcepts Inc. in Burr Ridge, Ill.
He also works in the department of orthopaedic surgery at Loyola
University Chicago in Maywood, Ill., and in the orthopaedic biomechanics laboratory at the Rehabilitation Research and Development Center of the VA Hospital in Hines, Ill.
References:
- Stevens SS. Mathematics, measurement and psychophysics.
In: Stevens SS [ed.]. Handbook of experimental psychology. New
York: Wiley, 1951.
- Gavin TM, Patwardhan AG. Getting started in prosthetic-orthotic research. JPO 1993;5:4:39-41.
- Hulley SB, Cummings SR. Planning the measurements: precision and accuracy. In: Hulley SB, Cummings SR [eds]. Designing clinical research. Baltimore: Williams & Wilkins, 1988.
- Siegel 5: Nonparametric statistics for the behavioral sciences. New York: McGraw-Hill, 1956.
- Hulley SB, Cummings SR: Planning for data management
and analysis. In: Hulley SB, Cummings SR [eds]. Designing clinical research. Baltimore: Williams & Wilkins, 1988.
- Lunsford TR, Lunsford BR. The research sample, part II:
sample size. JPO 1 995;7:4:137-41.
- Nolinske T. Survey research and measurement error. JPO
1995;7:2:68-78.
|