American Academy of Orthotists & Prosthetists - Providing Better Care Through Knowledge
Glossary of Research Terminology

Online Learning Center

Search

 oandp.org  JPO
 Glossary


O&P Links

ABC
O&P Care
AOPA
NAAOP
NCOPE
ACA
OPAF
ACPOC

Home > JPO > 1996 Vol. 8, Num. 2 > pp. 45-49

View Options
Print Options
E-Mail Options

RESEARCH FORUM--Methodology- Measurements, Part I: Principles and Theory

Thomas M. Gavin, CO

ABSTRACT

Measurement refers to the assessment, estimation, observation, evaluation, appraisal or judgment of an event. Measurement in research (also considered a dependent variable) is the process of assigning numerals to objects to represent quantities of characteristics according to certain rules. Those involved in conducting a study must choose the most appropriate measurement scale.

A study should contain one or more scales of measurement that meet the logical requirements of measurement. The least powerful measurement scale is the nominal scale, which consists of descriptive variables in no particular order. By contrast, ordinal scales have all of the requirements of nominal scales but also have the property of order. Nominal and ordinal scales usually are subject to the less powerful statistical tests such as the chi -square or the Mann-Whitney U tests. Interval and ratio scales are much more powerful than are nominal and ordinal scales because the variables provide more information about the phenomenon of interest. Parametric tests such as the t-test or A NOVA can be used for interval and ratio scales.

When conducting an experiment, researchers must take steps to ensure the statistical test is reliable and valid, minimize error and biases, and use measures that are precise and accurate. Accomplishing such tasks usually requires a multidisciplinary research team but can be done by the single researcher. Orthotic and prosthetic practitioners should become more involved in this type of research, using a multidisciplinary approach, since physicians are currently less active in O&P research than they were in the past.

Introduction

Research is the making of observations in a systematic fashion. Measurement refers to the assessment, estimation, observation, evaluation, appraisal or judgment of an event. Measurement in research (also considered a dependent variable) is the process of assigning numerals to objects to represent quantities of characteristics according to certain rules (1). To scientifically analyze the affect of an orthosis on low back pain or the effect of a new foot on a transtibial amputee's gait, meaningful measurements that assign numerals to the levels of back pain or the events of gait must be made so they may be analyzed statistically.

In the day-to-day clinical practices of orthotics and prosthetics, practitioners are exposed to a vast amount of information regarding the effect of O&P intervention on the quality of life of their patients. Orthotists and prosthetists make subjective or judgmental measurements to assess and compare gait patterns, back pain outcomes, seating stability, patient satisfaction, etc. Practitioners usually record in their clinical records data that may have meaning to them as clinicians but may not have any meaning to researchers.

For example, a practitioner's chart may read, "Fitted KAFO for genu varum; patient's knee alignment appears to be neutral on the coronal plane." These data indicate the KAFO is a clinical success; however, it is unknown how the alignment "neutral" compares to the contralateral limb or the affected limb before applying the KAFO. If a practitioner uses a hand-held goniometer, measures the alignment and writes on his or her chart, "Fitted KAFO for genu varum; patient's affected limb corrects from 10-degree genu varum to 5-degree genu valgum, and the contralateral knee measures at 5 degrees of genu valgum," then there is a comparison to the coronal knee alignment before and after orthotic intervention from the affected limb to the unaffected limb. A measurement instrument was used to gain a numerical value that can be useful to both a clinician and, with some validation, to a researcher.

To go beyond charting vital measurements that assess the effectiveness of an orthosis or prosthesis, the researcher may collect data in a more meaningful fashion by developing and validating an instrument so the data vital to the research question or problem may be collected from the target and control groups in an organized, reliable and valid fashion. The most common instrument for clinical research is the questionnaire or survey.

The questionnaire (or survey) usually is developed by the research team, which should be comprised of the orthotist-prosthetist, a statistician or psychometrician, and a research methodologist. Technical specialists such as engineers, kinesiologists. physiologists, psychologists, etc. are helpful if specialized measurements are required.

While the orthotist-prosthetist alone may know what measurements are necessary to record an event, the questionnaire instrument should be developed objectively by someone in the research group who will not be overly optimistic and biased or ask misleading questions that may suggest the "right" answer to the subject. This approach will allow for objective data collection that will account for most, if not all, of the variables that may influence the outcome, thus providing an appropriate basis for statistical analysis (2).

Measurements

Researchers conducting a study must choose the most appropriate measurement scale. The test should contain one or more scales of measurement that meet the logical requirements of measurement. Figure 1 depicts the four most common scales of measurement from the least to the most powerful. The chosen measurement scale must contain precise and accurate values, and the instrument should be practical, valid and reliable (3).

  • Practicality: The instrument to be administered should be practical, easy to administer and as economical as possible. Scoring, recording, retrieving, reading and interpreting the test should be well-planned, easy and cost-effective.
  • Validity

    : Both external and internal validity are important. External validity refers to how well the variables designed for the study represent the phenomenon of interest (the event). How well does a fasting blood sugar level represent control of diabetes? How well does a circumducting gait represent dropfoot? How well does spasticity represent upper motor neuron syndromes?

    Internal validity refers to how well the actual measurements represent these variables. How well does the observed blood sugar level represent the true level? How well does the Cobb measurement represent scoliosis?

  • Reliability: Reliability refers to the stability or consistency of the obtained values. It is the degree to which a test yields the same results when given on two different occasions (intra-rater reliability) or by two different examiners to the same group of individuals (inter-rater reliability). Correlational studies commonly are used to determine test reliability. The logical requirements of measurements help assure reliability. Reliability is a necessary but not sufficient condition to prove the validity of a test. If a test is not reliable it cannot be valid. However, a test that is reliable is not necessarily valid.

Scales

Figure 2 is a typical design of a study showing classification of measurements and the resulting information. The type of variable produced by the measurement determines the type of scale used. Some types of scales are quite simple but may not be appropriate for more informative and powerful statistics. Other scales may exceed the amount of information needed as well as exceed necessary power. Thus, unnecessary increases in the amount of time the research project requires also may result.

Nominal Scale

The lowest level of measurement is the nominal scale, also referred to as the classificatory scale (4). This scale consists of mutually exclusive categories, meaning once something placed in one category it cannot be put into any other category.

The categories of a nominal scale can have either verbal or numeric labels. However, if numeric labels are used for any categories, the researchers cannot sensibly apply their knowledge of numbers to the categories except to say that different numbers represent different measures.

An example of nominal variables are the numbers on foot. ball players' jerseys that are used to identify each player. No two players on the same team have the same number; so one can say each number is different. However, if one player's number is 8l, and another player's number is 55, subtracting or averaging the numerical values of these numbers is possible but would have no valuable meaning in this context. The names on the backs of these same football jerseys also are an example of nominal variables. Each name is different and can be used only to identify the player. Other examples of nominal variables are side of amputation (left or right) or type of scoliosis (idiopathic, neuromuscular or congenital).

The nominal scale contains named or labeled categories that are not ordered or ranked; within these categories, the frequency of observations may be tallied. This scale permits the two most basic measurement operations: classifying observations along a dimension and counting the frequency of the observations within each category.

The nominal scale is used to measure qualitative variables and yields frequency data that can be subjected to nonparametric statistical tests such as the chi-square test. The nominal scale is the weakest scale since it yields data that are not ordered and cannot be added or subtracted sensibly and therefore cannot be analyzed with the more powerful statistical procedures.

Ordinal Scale

Ordinal scales have all of the requirements of nominal scales but also include the property of order. If the categories of a scale are ordered, they constitute an ordinal scale. For categories that use numbers, the numbers must correspond to the order of the categories. Although ordinal scales contain named or labeled categories that are ordered or ranked, they do not represent quantities. Nothing is known about the size of the interval between any two numerals. The ordinal scale permits the classification of observations along the dimension of interest and the counting of the frequency of observations within each category. This scale is used to measure qualitative variables that can be subjected to nonparametric statistical tests.

An example of an ordinal scale is the ranking of a race. The winner comes in first, "1," and the next one to finish is second, "2," etc. The order of numbers represents the order in which the contestants finished the race. In the race example, the smaller number represents the best finish.

In an alternate example of an ordinal scale, four restaurants can be ranked from most-liked to least-liked. They can be assigned the numbers 112, 88, 4 and 1, with 112 representing the most-liked and 1 representing the least-liked. As long as "the higher the number, the more liked" remains true, they are ranked on an ordinal scale.

Another example of an ordinal scale is the numerical ranking of three different types of prosthetic feet by a prosthetist or patient. He or she is using an ordinal scale if a "1" is assigned to foot A because it is the least-liked, a "2" is assigned to foot B because it is liked more than foot A but less than foot C, and a '3" is assigned to foot C because it is liked the best. The patient or practitioner may like foot B much more than foot A but only slightly prefer foot C to foot B. An ordinal scale has still been used since the components were assigned an ordered numeric value corresponding to how much each foot is liked; however, the interval or distance between the amount each foot is liked cannot be assumed to be equal and is an unknown quantity.

The ordinal scale is a stronger form of measurement than the nominal scale because variables are ordered and ranked in the former. Without known quantities between each variable, however, many of the more powerful statistical tests cannot be used with this scale. Ordinal data can be subjected to nonparametric tests such as the chi-square test. For more information on nonparametric tests see References 4 and 5.

Interval Scale

Interval scales have all the requirements of ordinal scales with the added benefit of a known, fixed quantity as the distance between items. An interval scale has a constant unit that makes the distance between values meaningful. The interval or distance between any two adjacent units on the scale are assumed to be equal to the interval between any other two adjacent units on the same scale. This scale has no fixed zero that represents a zero quantity of the dimension of interest.

The Fahrenheit temperature scale is the most common example of an interval scale since the difference between 20 and 21 degrees can be assumed to be equal to the distance between 100 and 101 degrees, and zero does not represent the absence of the dimension of interest that is temperature.

More powerful statistical procedures can be used with interval scales because data obtained on this scale may be subjected to all mathematical operations except the forming of ratios. This scale allows the data to be subjected to more powerful parametric tests such as the t-test and analysis of variance (ANOVA) since it yields quantitative data. For more information on parametric tests see Reference 5.

Ratio Scale

A ratio scale has all of the properties of an interval scale plus a true zero point. The value of zero on a ratio scale indicates the complete absence of the characteristic of interest.

The importance of having a true zero point is that ratios of values on the scale can be meaningfully constructed. On the ratio scale, 10 truly is twice as much of the characteristic of 5.

For example, a 40-degree scoliotic curve is twice as large as a 20-degree curve, and an object that weighs 200 lbs is twice as heavy as an object that weighs 100 lbs. A ratio scale in these two examples requires the following to be true: 1) a zero-degree Cobb angle represents the absence of scoliosis; 2) zero lbs represents the absence of weight; and 3) each increment on the scale is equal. These statements are true in both instances since there are true zeros, and the increments are intervals that can be assumed to be of equal values.

However, it is not true that 20 degrees Fahrenheit is twice as warm as 10 degrees Fahrenheit even though the intervals are assumed to be equal. This is because zero degrees Fahrenheit does not represent the absence of temperature.

Other examples of ratio scales are length, time, force, volume and area. Obtained ratio data may be subjected to all mathematical operations including ratio forming and therefore may be subjected to parametric tests such as the t-test or ANOVA since it yields quantitative data.

Quantitative data such as interval or ratio data can be converted to categories on a qualitative scale. Such data can be characterized as above and below the mean or median (upper and lower half); within upper, middle or lower thirds; etc. However, categories on a qualitative scale cannot be converted to quantitative categories. The conversion from quantitative to qualitative measurement is useful when determining the relationship between measures on a quantitative scale and measures on a qualitative scale.

Error

In all measurements there is error. For example, if a patient with idiopathic scoliosis returns to a clinic to learn her curve measures 27 degrees (whereas it measured 25 degrees three months earlier), the curve may be considered unchanged since the random error of the measurement is 5 degrees. However, if the curve measured 21 degrees three months prior to the 25-degree visit, the measurement has changed more than 5 degrees in a six-month period, meaning the curve is considered progressive since the measured difference is outside of the error range. Knowing the error of measurement is important in determining the meaning of a change and is important to determine measurement precision.

The overall degree to which measurements in a sample represent the phenomenon or event of interest in the population is a function of two sources of error: sampling error (6) and measurement error (7). Both of these errors have random (wrong result due to chance) and systematic (wrong result due to bias) components. Minimizing measurement error will enhance the validity of drawing inferences from a study by rendering that study relatively free of random error (precise). The greater the random error, the less precise the measurement.

Precision

A measurement that has nearly the same value each time it is taken is a precise measurement. Precision refers to the exactness of a measure. A ruler can give a precise measurement of length whereas an interview designed to measure quality of life is much more likely to produce values that vary from one occasion to the next.

The sensitivity of a scale increases with the number of possible values that can be reliably used. A questionnaire that uses a scale containing two reliable variables such as like" and "dislike" is less sensitive than a scale that contains four reliable variables such as "strongly like," "somewhat like,' "somewhat dislike" and "strongly dislike." If the scale could have 100 values of "like" that were reliable (which is unlikely), it would be even more sensitive and precise.

Precision has a major influence on the power of a study. The more precise the measurement, the greater the statistical power at a given sample size to estimate mean values and to test the hypotheses. The greater the error, the less precise the measurement.

For example, a scoliotic curve measured at 40 degrees is more precise than a curve measured at 40.35 degrees. Al. though using a scale that contains decimal units yields a scale with a greater number of variables, the value of 40.35 degrees would be more precise if it could be reproduced with minimum variability. This is unlikely since there al ready are 5 degrees of variability when using integer units for the Cobb measurement method.

Three main sources of error reduce precision in measurements:

  • Observer variability refers to variability in measurement caused by the observer and includes the choice of words, body language, and voice inflections of the observer during the interview or the eye-hand coordination in using a mechanical instrument.
  • Subject variability refers to the intrinsic biological variability in the study subjects due to fluctuations in mood, barometric pressures, psychological stress and length of time since last medication. These factors are relevant considerations for a study involving measurements of the effect of an orthosis on low back pain. Frequently, depression that results from the pain itself will affect the precision of the measurement.
  • Instrument variability refers to variability in measurement due to fluctuations in the environment such as season change, temperature change and background noise. The precision of a variable is described as the standard deviation of repeated measurements.

Accuracy

Being relatively free of systematic error (accurate) also will enhance the validity of drawing inferences from a study The accuracy of a variable is the degree to which it actually represents what it is intended to represent. This has a major impact on the internal and external validity of a study (the degree to which observed findings lead to correct inferences about the phenomenon taking place in the study sample and in the universe).

Accuracy is different from precision; precision has to do with measuring a variable several times and coming up with nearly the same value each time, whereas accuracy involves determining how well a variable measurement compares with another accepted measurement.

If a hand-held goniometer were used to measure the knee-flexion angle in a knee-ankle-foot orthosis and after 20 measurements the goniometric reading was nearly equal on every measurement, this would be considered precise but not necessarily accurate. However, if the goniometric measurement were compared to a more sophisticated measurement using a Watsmart? camera system measuring infrared sensors placed along the lateral aspect of the leg and thigh and both measurements were nearly equal, the goniometric measurement could then be considered accurate but not necessarily precise.

Accuracy is a function of systematic error or bias. The greater the error, the less accurate the variable. The three main classes of measurement error have counterparts for accuracy: observer bias, subject bias and instrument bias.

  • Observer bias refers to consistent distortion, conscious or unconscious, in the perception or reporting of the measurement by the observer.

    For example, at the end of a long day an orthotist may measure the dorsiflexion attitude of 10 leg casts and note they all seem to be perpendicular. The next morning repetition of the same measurements may indicate some were 93 degrees, some were 95 degrees, and very few were actually perpendicular.

    Since the desired outcome was a 90-degree dorsiflexion angle and the orthotist did not want to continue modifying the cast, an unconscious error was made in the direction of the orthotist's desire (or bias) to have an outcome of 90 degrees. This is an example of error due to observer bias; the orthotist had a goal of achieving a perpendicular outcome.

  • Subject bias refers to consistent distortion of the measurement due to subject bias.

    For example, a patient may fight a manual muscle test and bias the result by blocking the true range of dorsiflexion he or she is capable of attaining. This will yield an inaccurate measurement of range of motion of the ankle. Subject bias also may occur when only the perception of the phenomenon of interest has changed, as in the selective recall or reporting of an event.

  • Instrument bias can result from faulty function of a mechanical or electromechanical device used for measurement. A force-sensitive resistor used to measure forces in orthoses may creep after a prolonged usage and drift downward (record less than actual force) as the relaxation of the resistor between tests no longer matches the creep. This will yield faulty data until the resistor is recalibrated.

    Instrument bias also can result from misleading questions on a questionnaire. For example, a question could be stated, "Does the energy-storing foot called the Carbon Copy II? improve your gait more than the SACH foot?" This question implies one foot is better than the other since more description (energy-storing) is given for the Carbon Copy II, suggesting it is the "right" answer. A more appropriate way to ask the question is to label the feet "A" and "B" and ask, "Which foot do you feel gives you the best gait?"

Conclusion

When deciding to conduct research using a questionnaire or survey, it's important to use a multidisciplinary team approach to develop unbiased questions. O&P professionals usually are biased toward one outcome over another. Practitioners usually want the patient to achieve deviation-free gait, contracture control, pain reduction, fracture stabilization, etc.

While the practitioner should help formulate the types of questions to be asked, the methodologist or statistician should actually word the questions to avoid bias. The less the methodologist knows about O&P, the more objective the questionnaire.

Once the instrument has been formulated and edited by the methodologist, it must be validated. Unfortunately, not many valid questionnaires are available for O&P applications; so once a new questionnaire is validated, it should be made available to other practitioners with the same interest. The methodologist and practitioner will perform the validation, yielding data that will allow for adjustment of the method to minimize error.

A power analysis is subsequently made to determine the sample size (number of subjects needed in the study). The sample size, type of study and type of variable will then determine the appropriate statistical test to be done.

While it is possible to conduct these types of studies without the help of a methodologist, any study is best done by a multidisciplinary group. This will ensure error is reduced by minimizing biases and using proper validation techniques and statistical methods.

The results of a well-planned multidisciplinary study are difficult to refute and are well-received by O&P colleagues as well as professionals in other medical disciplines.


THOMAS M. GAVIN, CO, is an ABC-certified orthotist at the Orthotic-Prosthetic Center of BioConcepts Inc. in Burr Ridge, Ill. He also works in the department of orthopaedic surgery at Loyola University Chicago in Maywood, Ill., and in the orthopaedic biomechanics laboratory at the Rehabilitation Research and Development Center of the VA Hospital in Hines, Ill.

References:

  1. Stevens SS. Mathematics, measurement and psychophysics. In: Stevens SS [ed.]. Handbook of experimental psychology. New York: Wiley, 1951.
  2. Gavin TM, Patwardhan AG. Getting started in prosthetic-orthotic research. JPO 1993;5:4:39-41.
  3. Hulley SB, Cummings SR. Planning the measurements: precision and accuracy. In: Hulley SB, Cummings SR [eds]. Designing clinical research. Baltimore: Williams & Wilkins, 1988.
  4. Siegel 5: Nonparametric statistics for the behavioral sciences. New York: McGraw-Hill, 1956.
  5. Hulley SB, Cummings SR: Planning for data management and analysis. In: Hulley SB, Cummings SR [eds]. Designing clinical research. Baltimore: Williams & Wilkins, 1988.
  6. Lunsford TR, Lunsford BR. The research sample, part II: sample size. JPO 1 995;7:4:137-41.
  7. Nolinske T. Survey research and measurement error. JPO 1995;7:2:68-78.


 

Home > JPO > 1996 Vol. 8, Num. 2 > pp. 45-49

 

Copyright © American Academy of Orthotists & Prosthetists (AAOP)
All rights reserved. See disclaimer

oandp.com - Orthotics & Prosthetics Industry Information

Website built by oandp.com

oandp.com - Orthotics & Prosthetics Industry Information