The health care industry does not compete on the basis of accepted, valid measures of clinical quality. Experts have argued that quality improvement should be undertaken by health care facilities on behalf of their patients and communities, not solely as an accreditation exercise. Health care quality instruments are rarely developed with the care and precision taken for granted in other industries; thus, we do not know what we are measuring. Without this knowledge, we cannot manage effectively. Even if some providers establish the reliability and construct validity of their quality measures, they are not using a common unit, making it difficult or impossible to know if we are managing the same construct or if we are doing so with the same degree of effectiveness in different settings. The orthotics and prosthetics (O&P) service delivery system urgently needs a means of measuring and enhancing health care quality given the fragmented nature of the industry. No industry-wide instrument is accepted for quality improvement. This article describes the development of the Orthotics and Prosthetics Survey (OPUS), an instrument that evaluates the typical goals of O&P providers: reduction of activity limitations, enhancing quality of life, and assuring patient satisfaction with services and devices. Widespread adoption of an instrument such as OPUS would help make quality measurement "Job Zero" in the improvement of care processes and outcomes for orthotics and prosthetic users.
Outcome measurement has been the subject of several articles in O&P trade publications just as it has been throughout health care.1–3 A recurring concern is the industry's need to quantify outcomes as a means of justifying the cost of services to payers and of responding to growing pressure from consumer groups. The industry would benefit from a set of instruments that can accurately and conveniently measure important and relevant outcomes. Such an assessment could provide many benefits: assist the field in developing evidence-based practice and clinical pathways, assure client satisfaction, supplement earnings reports, enhance payer relations, and provide a means of implementing program accreditation.
One of the few industry-sponsored efforts to evaluate program outcomes was conducted by Focus on Therapeutic Outcomes (FOTO). Dennis Hart was engaged by the Orthotics and Prosthetics National Office in collaboration with ABC to develop an outcome tool to assess health status, client satisfaction, and prosthetists' perception of function for lower extremity amputees.4 The Orthotics and Prosthetics National Office Outcomes Tool (OPOT) was built around the Medical Outcome Study—Short Form 36, a generic health-related quality-of-life instrument.5 The tool also included 13 satisfaction questions and prosthetists' report of clients' ambulation. Although psychometric analyses were conducted as part of its development, the cross-sectional nature of the study did not allow assessment of the instrument's sensitivity to change or its ability to detect subtle changes in lower extremity function. This one-time survey generated considerable provider interest but used paper forms and created an unacceptable level of patient and provider burden for routine clinical use.
During the past several years we engaged O&P patients and providers in developing and refining a set of instruments to assess functional status, quality of life, and satisfaction with devices and services.6 With funding from the National Institute on Disability and Rehabilitation Research through a Rehabilitation Engineering Research Center on Prosthetics and Orthotics, we completed a comprehensive literature search using MEDLINE, CINAHL, and Recal to identify generic and O&P-specific outcome instruments. This search yielded several dozen instruments, but an informal survey revealed little use of any outcome measure by clinicians. After input from an advisory committee that included clients, orthotists, prosthetists, physical therapists, occupational therapists, and physiatrists, we decided to focus the instruments on the following constructs: upper extremity and lower extremity functional status, health-related quality of life, and client satisfaction.
Selecting items from a variety of existing instruments, we developed and revised four instruments that differentiate patients with varying levels of lower extremity function, quality of life, and satisfaction. We conducted two field tests of what we came to call the Orthotics and Prosthetics Users' Survey (OPUS). The first field test of the instruments consisted of telephone interviews with a sample of past recipients of O&P services at the Rehabilitation Institute of Chicago (RIC). The sample of 66 respondents included 52 adults and 14 parents answering on behalf of their child. Pediatric clients receiving outpatient services at Shriners Hospital for Children in Chicago, adult clients receiving O&P services at RIC, and past recipients of RIC O&P services formed the second field test sample. The combined sample consisted of 164 subjects, including 80 adults and 84 children. In the adult group, 43 were orthotics users and 37 were prosthetics users; in the child group, 36 were orthotics users and 48 were prosthetics users. Table 1 7 summarizes the psychometric properties of each of the four components of OPUS.
Calibration of the lower extremity functional status instrument produced desirable internal consistency statistics. The easiest items are "get on and off toilet," "get up from a chair," and "walk in-doors." Items of average difficulty include "pick up an object from the floor while standing," "get on and off an escalator," and "walk out-doors on uneven ground." The most difficult items are "walk up to two hours" and "run one block." The items are well-targeted to the sample, as indicated by the average measure's (0.58 logits) proximity to the center of the item scale (0.00 logits) and its position within the overall item calibration range (1.50 to 2.50).
Calibration of the health-related quality of life responses also yields desirable measurement statistics. The easiest items are "how often during the past week have you been happy," "how often during the past week have you felt calm and peaceful," and "how often during the past week did you have a lot of energy." Items of average difficulty include "how often during the past week have you felt downhearted and depressed," "how much does pain interfere with your activities (including both work outside the home and household duties)," and "how much does your physical condition restrict your ability to do chores." The most difficult items are "how often during the past week did you feel worn out" and "how often during the past week did you feel tired." The items are reasonably targeted to the sample as indicated by the average of measure of 1.06 logits' position within the overall item calibration range (1.50 to 1.50).
Calibration of the satisfaction with device responses yields acceptable measurement statistics. The easiest items to endorse are "the weight of my prosthesis (or orthosis) is manageable" and "my prosthesis (or orthosis) is durable." Items of average difficulty were "it is easy to put on my prosthesis (or orthosis)" and "my clothes are free of wear and tear from my prosthesis (or orthosis)." The most difficult items to endorse are "my skin is free of abrasions and irritations" and "my prosthesis (or orthosis) is pain free to wear." The items are less well targeted to the sample, as indicated by the average of measure of 1.01 logits and its position within the overall item calibration range (1.00 to 1.00).
Calibration of the satisfaction with services responses also yields acceptable measurement statistics. The items easiest to endorse are "I was shown the proper level of courtesy and respect by the staff" and "I received an appointment with a prosthetist/orthotist within a reasonable amount of time." Items of average difficulty are "I am satisfied with the training I received in the use and maintenance of my prosthesis/ orthosis" and "the prosthetist/orthotist gave me the opportunity to express my concerns regarding my equipment." The items most difficult to endorse are "I was a partner in decision-making with clinic staff regarding my care and equipment" and "the prosthetist (orthotist) discussed problems I might encounter with my equipment."
None of the items misfit the construct. As is common with many other satisfaction instruments, the items are mistargeted to the sample (mean measure of 2.91 logits), revealing a high level of satisfaction with services. Given the rarity of disagreeable responses, it is likely that the rating scale is not optimally configured, in the sense of establishing an ordered sequence progressing across measurement ranges in which each rating scale option in turn is the most likely response.8,9 Should the rating scale be optimized so as to realize such an ordered sequence, so far as this proves possible, the targeting of the instrument is likely to improve.
The psychometric properties of OPUS are promising as the instruments demonstrate the ability to detect a wide range of function, quality of life, and satisfaction, and possess good internal consistency and construct validity. The instruments' relevance to quality improvement could be improved by refocusing the satisfaction construct toward patient activation, and by increasing the reliability of the measures. Reliability has a "cash value" in the substantive leverage over the construct that it affords. Unfortunately, this value is rarely understood and even more rarely capitalized on.
Our ability to produce and measure meaningful quality improvements follows directly from the precision and sensitivity of our instruments. Reliability can be improved by adding items and/or by better targeting the instruments at the patient populations. The latter is the more efficient strategy and involves lowering measurement error by centering the measures relative to the item scale. Opportunities for better centering are indicated in Table 1 by the average person measures, which would be 0.00 if perfectly centered. Future studies will focus in part on determining how reliable the measures have to be to support quality improvement efforts and in then calibrating the instruments to the needed degree of precision. In other words, into how many statistically distinct groups do the instruments have to consistently divide the measures to support improved quality?
Beyond improvements to the instruments, the next steps are to implement data collection in routine clinical practice, provide outcomes information to facilities, and evaluate ways in which clinical practice is improved with routine outcomes reporting. Of vital importance is the integration of the assessment system with the diagnostic and treatment routines of clinical practice. Patient-reported measures are not typically presented to clinicians in a manner that maximizes their utility for outcomes-based quality improvement. Clinical measures, such as temperature, body weight, or heart rate, provide information that is directly acted upon and managed, whether as a symptom of an underlying condition or as a problem in itself. In contrast, patient-reported measures usually are obtained more out of a sense of ethical obligation or as a means of securing impressionistic, qualitative information about patients as individuals. The next section describes how contemporary measurement approaches can help implement routine data collection practice, provide outcomes information, and evaluate clinical practice.
An item bank is comprised of carefully calibrated questions that develop, define, and quantify a common theme and thus provide an operational definition of a trait.10,11 With a calibrated item bank in place, we can easily create full-length test instruments, short forms, and computerized adaptive tests (CATs). CAT is scored in real time, and results may be presented in graphic or written reports immediately to the clinical researcher, physician, or patient, enabling immediate use of the information to inform research and clinical decision making.12–15 Overall, CATs offer several advantages, including speed of assessment; fewer items for the same level of precision; a mechanism for completing routine assessments; immediate data entry; easy scoring and interpretation; and immediate presentation of results in real time.
CATs using a precalibrated item bank produce individual person scores that are comparable to full-length instruments. For example, working with oncology, HIV, stroke, multiple sclerosis, and general population samples, researchers have developed numerous item banks for the Functional Assessment of Chronic Illness Therapy (FACIT) Measurement System.16 The FACIT is a collection of health-related quality of life questionnaires targeted to the management of chronic illness, many of which are translated into as many as 50 languages, enabling cross-cultural comparisons. To facilitate the clinical utility of the FACIT system, researchers developed laptop, touch screen, and web-based methods for administration, scoring, and display of data.
Chang and Cella17 demonstrated comparability and, in some cases, equivalence of CATs to existing instruments. This has allowed pooling of item sets to create draft item banks from available data.17–19Table 2 illustrates a systematic framework for item bank development that guides item bank development. Literature review and input from clinicians and patients are used to select domains (step 1). Next, one surveys the availability of relevant existing data sets (step 2). Common items are identified, which serve as the linking items across data sets (step 3). In cases in which there are available data sets with sufficient sample size (minimum 200, preferably 500), a common item-linking strategy expedites the process and informs subsequent bank development. It is critical to examine unidimensionality, that is, the extent to which all items measure a single construct, such as lower extremity functional status. This procedure allows one to determine if there are separate subconstructs that should be distinguished, perhaps mobility and activities of daily living. Evaluation of item fit also helps evaluate construct dimensionality. Given a single dimension, examining item fit allows test developers to identify items that do not "fit" that dimension. Next, items are calibrated on the continuum to be measured (step 4). The relative hierarchical order of items is then examined, providing information regarding potential statistical deficiency, i.e., gaps in the distribution (step 5). Construct deficiency is also identified. New items are written or acquired from existing questionnaires to eliminate construct deficiencies (step 6). For banks that do not have existing data sets, step 6 is a more intensive process. Before field testing, content validity issues are addressed by obtaining clinical input to examine content relevance and representativeness (step 7). For field testing (step 8), items are programmed for computer-based testing (CBT) and administered in clinical settings. The data are analyzed (step 9) to determine unidimensionality, item fit, and item locations on the continuum. Item parameter equivalence (step 10) is evaluated by examining the existence of DIF across subgroups (e.g., orthosis and prosthesis users). Then analysts' recommendations are reviewed and clinical input sought to establish the operational item bank (step 11). The item bank is now ready to be used for a CAT implementation (step 12). Test-level parameters are proposed (e.g., item selection rules and stopping conditions) and simulated for trait levels across the relevant range of clinical interest. Finally, short forms can be created to cover the entire continuum or to target specific criterion reference points, in this case, literacy levels (step 13).
Davis and colleagues20 successfully piloted a provisional CAT platform for fatigue assessment in an oncology clinic. They elicited feedback from patients and providers about in-clinic CAT assessment and assessed the utility and understandability of computer-generated graphic reports of scores. They also monitored the ability of these reports to promote discussions between patients and providers regarding treatment planning and options for ongoing care. Both patients (n = 157) and providers (n = 22) reported that graphs depicting fatigue scores were understandable and could be useful in clinical practice.20,21 CAT provides an unprecedented mechanism for brief, yet precise, routine symptom monitoring, and also may be useful for the O&P industry.
The National Institutes of Health (NIH), as part of its Roadmap for Medical Research, funded approximately $6 million in 2004 to six primary research sites and a statistical coordinating center for a Patient-Reported Outcomes Measurement Information System (PROMIS) network.22 This trans-NIH initiative is managed by the National Institute of Arthritis and Musculoskeletal and Skin Diseases and seeks to develop ways to measure patient-reported symptoms such as pain and fatigue and aspects of health-related quality of life across a wide variety of chronic diseases and conditions. Advances in measurement using IRT and advances in computer technology make it possible to develop, maintain, and improve item banks that can advance health status measurement. PROMIS focuses on several symptoms and health status domains that have relevance across a wide variety of chronic diseases. Item banks enable item comparison and selection as well as CAT tools for tailored individual assessment without loss of scale precision or content validity. Valid, generalizable item banks and CAT tools can stimulate and standardize clinical research across NIH programs and grants dealing with patient- reported outcomes. CAT may also assist individual clinical practitioners to assess patient response to interventions and modify treatment plans.
Our collaborators at Evanston Northwestern Healthcare's Center for Research, Outcomes and Education were funded to manage the Statistical Coordinating Center (SCC). The SCC provides computerized systems to administer, collect, and report patient-reported outcome data, serve as a data repository for all primary research site collaborators in the cooperative group network, and conduct and advance state-of-the-science psychometric and statistical analyses. They will make available a range of assessment technologies that are useful in clinical settings and are developing additional standardized automated systems to ensure high quality, efficient collection and reporting of data that will allow analysis of pooled data. They have developed an accessible, proven, user-friendly CAT platform with plans for evaluation and dissemination across the PROMIS network. This level of sophisticated data collection is needed to support outcomes reporting in the O&P industry.
The O&P industry would benefit from the routine and widespread adoption of a common measure to manage the same outcomes and to learn if we are doing so with the same degree of effectiveness in different settings. The Orthotics and Prosthetics Survey (OPUS) measures the kinds of outcomes that typically are evaluated by O&P providers: reduction of activity limitations, enhancement of quality of life, and assurance of patient satisfaction with services and devices. Developing a measure of patient activation—defined as the belief in the importance of taking an active role in health care, confidence in one's ability to take action, active participation, and persistence in these actions under stress—would fill a critical gap. Widespread adoption of an instrument such as OPUS would help make quality measurement "Job Zero" in the improvement of care processes and outcomes for orthotics and prosthetic users.
Funding for the development of OPUS and development of this article was provided by the National Institute on Disability and Rehabilitation Research through a Rehabilitation Engineering Research Center on Prosthetics and Orthotics awarded to Northwestern University (H133E980023 and H133E030030), Feinberg School of Medicine. Dr. Dudley Childress is the Principal Investigator. Material for this article was drawn from a National Institutes of Health grant application developed by William Fisher (principal investigator), Richard Gershon, Jin-Shei Lai, Jack Stenner, Michael Bass, and Allen Heinemann (coinvestigators) titled "Deploying the Orthotics and Prosthetics Users' Survey."
Correspondence to: Allen W. Heinemann, PhD, Rehabilitation Institute of Chicago, 345 East Superior Street, Chicago, IL 60611; e-mail: .
ALLEN W. HEINEMANN, PhD, is affiliated with the Feinberg School of Medicine, Northwestern University, and the Rehabilitation Institute of Chicago, Chicago, Illinois.
RICHARD GERSHON, PhD, is affiliated with Evanston Northwestern Healthcare, Evanston, Illinois.
WILLIAM P. FISHER, JR., PhD, formerly affiliated with Metametrics Inc., Durham NC, is affiliated with Avatar International, Sanford, Florida.