Outcome measures in facial prosthesis research: A systematic review

Statement of problem Facial prosthesis research uses a wide variety of outcome measures, which results in challenges when comparing the effectiveness of interventions among studies. Consensus is lacking regarding the most appropriate and meaningful outcome measures to use in facial prosthesis research to capture important perspectives. Purpose The purpose of the systematic review was to identify and synthesize outcome measures used in facial prosthesis research. Material and methods Electronic searches were performed in 11 databases (including nonpeer-reviewed literature). The citations were searched, and expert societies were contacted to identify additional studies. Inclusion criteria comprised studies of participants with facial defects who required or had received prosthetic rehabilitation with an external facial prosthesis. Exclusion criteria comprised participants with ocular prostheses, case reports, case series with fewer than 5 participants, laboratory-based studies, and studies published before 1980. Study selection was performed independently by 2 reviewers. Discrepancies were resolved through discussion or by a third reviewer. Outcome measures were synthesized with a categorization approach based on the perspective, theme, and subtheme of the outcome measures. Quality assessment was performed with an appraisal tool that enabled evaluation of studies with diverse designs. Results Database searching identified 13 058 records, and 7406 remained after duplications were removed. After initial screening, 189 potentially relevant records remained, and 186 full texts were located (98% retrieval rate). After full-text screening, 124 records were excluded. Citation searches and contact with expert societies identified 4 further records. In total, 69 articles (grouped into 65 studies) were included. Studies were categorized as per the perspective of their outcome measures, with the following findings: patient-reported (74% of studies), clinical indicators (34%), clinician-reported (8%), multiple viewpoints (6%), and independent observer-reported (3%). Patient-reported outcome measures included tools to assess satisfaction, quality of life, and psychologic health. Variability in the choice of outcome measures was evident among the studies, with many self-designed, unvalidated, condition-specific questionnaires reported. A greater number of outcome measure themes emerged over time; themes such as service delivery and health state utility have recently been evaluated. Conclusions Over the past 40 years, facial prosthesis research has focused on patient-reported outcome measures. Outcome measures relating to other perspectives have been used less frequently, although new themes appear to be emerging in the literature. Future research should use outcome measures with appropriate measurement properties for use with facial prosthetics.

Facial defects may result from congenital or acquired conditions 1 and can result in multiple psychosocial and functional impairments. 2 The 2 main approaches to rehabilitating patients with facial defects are surgical reconstruction or prosthetic rehabilitation. 1 Surgical reconstruction can provide a long-term solution to replacing the missing tissue. However, it may be unsuitable depending on the extent of tissue loss, the availability of donor tissue, the patient's psychophysical condition, and technical challenges. 1,3 Removable facial prostheses can provide an esthetic and functional outcome without the associated risks of reconstructive surgery. 1 Studies have evaluated the impact of facial prostheses on quality of life (QoL), [4][5][6][7][8][9] psychologic health, 4 and satisfaction. 2,10 From a service delivery perspective, the conventional manufacture of facial prostheses is regarded as time consuming, labor intensive, and technically challenging. 11,12 The ongoing impact on patients and healthcare services is evident with the need for regular maintenance and replacement. 12 A variety of innovations in facial prosthesis rehabilitation have occurred in recent decades. In the late 1970s, osseointegrated implants were introduced to overcome some of the limitations of conventional retention methods. 13 From the late 1990s, digital technology has been introduced to supplement or replace steps in conventional manufacturing, 14 as summarized in a recent systematic review. 15 Clinical management of patients with facial defects should adopt an evidence-based approach. Facial prosthesis research uses a wide variety of outcome measures, which results in challenges when comparing the effectiveness of interventions among studies. In addition, a consensus is lacking regarding the most appropriate and meaningful outcome measures to use in facial prosthesis research to capture important perspectives and outcomes.
The purpose of this systematic review was to identify and synthesize outcome measures used in facial prosthesis research. The scope of the review was purposefully broad to map the outcome measures used over time. Quality assessment was planned to provide a holistic overview of the quality of studies and to identify broad areas where reporting was lacking. Anticipating a heterogeneous group of studies, the Quality Assessment Tool for Studies of Diverse Designs (QATSDD) was selected. 16 To the best of the authors' knowledge, a similar systematic review had not been undertaken previously or registered on prospective databases.

MATERIAL AND METHODS
The systematic review was based on established guidance. 17 Recently published systematic reviews that synthesized outcome measures or outcomes from the dental literature were also consulted. 18,19 The protocol was registered in an international prospective register of systematic reviews. 20 The review question was "What outcome measures are used to capture the outcomes of facial prosthesis provision in patients with facial defects requiring prosthetic rehabilitation?" Table 1 summarizes the eligibility criteria. The population of interest was participants with facial defects who required or had received an external facial prosthesis. Studies of ocular prostheses were excluded because of anticipated differences in treatment delivery and evaluation. There were no age restrictions, and facial defects of any underlying etiology, extent, and recency were included. Differences in these factors were considered as sources of clinical diversity and potential reasons for variability in the outcome measures used. Studies published over the last 40 years (January 1980 to 2020) were included as a comprehensive overview. This time period might also identify trends commensurate with changes in retention and manufacturing methods. 13,15 Electronic searches were performed in EMBASE, MEDLINE, PsycINFO, Web of Science Core Collection, Cochrane Library, and CINAHL from inception to the present day. Nonpeer-reviewed literature databases were searched to minimize publication bias by using the International Clinical Trials Registry Platform, Clinicaltrials. gov, Opengrey, ProQuest Dissertation and Theses A&I, and Networked Digital Library of Theses and Dissertations. Reference lists of included articles were manually searched, and citations were searched in Scopus. In addition, 2 societies (American Academy of Maxillofacial Prosthetics and Institute of Maxillofacial Prosthetists Technologists) were contacted through e-mail to identify missing or unpublished studies.
The search strategy was developed and tailored to each database with support from an information

Clinical Implications
The wide variety of outcome measures used in facial prosthesis research highlights the need for validated, standardized outcome measures that capture a range of perspectives. Evidence-based approaches that use validated, condition-specific, patient-reported outcome measures allow for systematic comparison and comprehensive evaluation of facial prosthetic rehabilitation and its benefit to patients. More systematic protocols of assessment are required to capture outcomes from the perspective of the clinician, independent observer, or multiple viewpoints.
specialist. The searches were performed in November 2019 and comprised a combination of Medical Subject Headings and free text keywords. One main concept was searched relating to the intervention. No further concepts were used as a population concept would overlap with the intervention concept. Furthermore, a concept relating to the outcomes was not used, as outcomes are often not well described in abstracts or well indexed with controlled vocabulary terms. 21 There were no language or time restrictions. It was anticipated that this would result in a highly sensitive but less precise search. Where possible, limits and filters were applied to exclude letters and in vitro studies. All searches were documented in a search log, and the search strategy for EMBASE is included in Table 2.
The studies were imported into a reference management software program (EndNote X8; Clarivate Analytics). Duplicates were removed with the software program, and a sample of studies was checked manually to ensure the process was reliable. Screening of titles and abstracts was undertaken independently by 2 reviewers (R.J., B.V.), and the full text of any potentially relevant reports were retrieved. Two reviewers (R.J., B.V.) independently screened the full-text articles for compliance with the inclusion and exclusion criteria. The criteria were initially piloted on sample reports to ensure they could be applied consistently. Any discrepancies were resolved through consensus or by consulting an additional reviewer (C.B., S.P., B.N.). All potentially relevant articles excluded from the review   were listed in a table of the characteristics of excluded studies. A tailored data extraction form was created, piloted, and developed based on available checklists. 22 Data were extracted from included studies by 1 reviewer (R.J.) and checked for accuracy by a second reviewer (B.V.). Multiple reports of the same study were linked, and data were collected on a single form. The following items were extracted: author details, publication year, country, design, participant characteristics, participant numbers, intervention, comparator, adverse outcomes, and outcome measures. A descriptive approach was used to categorize study design as some studies did not fit discretely with explicit study design definitions and there was variable quality of reporting. Outcome measures were not extracted if they related to concepts other than the facial prosthesis itself (such as those relating to boneanchored hearing aids or peri-implantitis). Attempts were made to contact study authors to obtain missing data.
A diverse range of study designs was anticipated, and therefore, 2 appraisal approaches were possible. First, the different study designs could be separated and evaluated with multiple appraisal tools specific to each study type. 16 Second, all study types could be appraised with a standardized, pragmatic approach with a generic quality assessment tool such as the QATSDD. 16 The second approach was in keeping with the purpose of the systematic review.
Preliminary assessments of the quality assessment tool for studies with diverse designs (QATSDD) indicate its usefulness to standardize quality assessment approaches when dealing with diverse study designs. 16 It enabled a pragmatic, holistic evaluation of the overall body of evidence and allowed broad quality comparisons to be drawn among different study types. 16 The main limitations related to the broad nature of the tool, which may not be appropriate for all types of research. 16 It was also not designed to replace quality assessment tools for specific approaches (for example, systematic reviews based entirely on randomized controlled trials). 16 The tool has been assessed in the disciplines of psychology, sociology, and nursing 16 and has recently been used to assess dental studies. 23 Duplicates removed (n=5652) Figure 1. PRISMA flow diagram of study selection process. PRISMA, preferred reporting items for systematic reviews and meta-analyses.
The QATSDD tool has a total of 16 criteria; of which, 14 apply to qualitative studies, 14 apply to quantitative studies, and all 16 apply to mixed methods research. 16 During quality assessment, each study was awarded a score on a scale of 0 to 3 for all relevant criteria. 16 A score of 3 was awarded when a criterion was completely met. Some criteria lacked clarity, which led to some statements being interpreted differently by the reviewers. 26 Therefore, 2 reviewers (R.J., B.V.) agreed on what would be expected of studies for each statement to ensure consistency of application. Any disagreements were resolved through an iterative process. 16 Each study was given an overall quality score, expressed as a percentage of the maximum possible score. 16 While the tool is useful to direct dialog and provide a general overview of study quality, overall quality scores should be interpreted with caution because of the equal weighting of all criteria. 24,26 A list of outcome measures was compiled and synthesized based on a categorization approach. 18 Category names were agreed based on the perspective of the evaluator. 18 Five categories were developed, including patient-reported outcome measures (PROMs), clinicianreported outcome measures, independent observerreported outcome measures, outcome measures encompassing multiple perspectives, and clinical indicators. Themes and subthemes were used to subcategorize outcome measures based on the concepts evaluated. For example, the PROM category was subdivided into themes relating to satisfaction, QoL, psychologic health, and other concepts. The QoL theme was then divided into subthemes such as condition specific (relating to facial prostheses), condition specific (not relating to facial prostheses), and generic tools.

RESULTS
From the database searches, 13 058 records were identified, and 7406 records remained after the removal of duplicates. After screening titles and abstracts, 189 potentially relevant records remained, and 186 full texts were located (98% retrieval rate). After screening full texts, 124 records were excluded principally because of lack of an explicit outcome measure related to facial prostheses (n=50) or lack of availability of a full-text English manuscript (n=45). Citation searches and contact with expert societies identified 4 further records. In total, 69 full-text articles were included, which were grouped into 65 studies (Fig. 1).
Characteristics of included studies are outlined in Supplemental Tables 1-5    The quality of included studies was assessed with the QATSDD (Supplemental Table 6 [available online]). 16 Average-quality scores were calculated for the study design groups (Table 3). These comprised experimental studies (61.9%), cross-sectional studies (45.2%), prospective longitudinal observational studies (39.5%), retrospective longitudinal observational studies (41.6%), and mixed-methods studies (37.5%). The broad range of quality scores for the cross-sectional and longitudinal observational studies highlights variability in their quality. Some criteria had consistently low scores among the different groups, including evidence that sample size was considered in terms of analysis, statistical assessment of reliability and validity of measurement tools, and evidence of user involvement in design.
A total of 117 outcome measures that related to facial prostheses were identified from the 65 studies. Studies were categorized based on perspective, theme, and subtheme of the outcome measures (Table 4). PROMs was the most popular category identified in 48 studies (74%). Within this category, 31 studies evaluated satisfaction, 14 studies evaluated QoL, 6 studies evaluated psychologic health, and 6 studies evaluated other patient-reported outcomes. Table 5 lists the outcome measures that fall within each category and theme. Satisfaction was frequently captured with self-designed condition-specific questionnaires. QoL was assessed through generic and condition-specific tools, including those usually used in other contexts such as plastic surgery or otolaryngology. A broad range of tools to capture psychologic health was also identified.

DISCUSSION
Over the past 40 years, facial prosthesis research has focused on PROMs. Clinical indicators was the second most popular category, which is in keeping with the lifelong maintenance and replacement of facial prostheses. New themes have emerged in the literature (such as health state utility and service delivery), which may become increasingly important in the future with focus on delivering clinical and cost-effective services. The increasing thematic variety identified in this systematic review may be due to an increase in the number of studies over time and the clinical and methodologic diversity of the studies.
One key difference between this systematic review and similar published reviews involved the use of quality assessment. 18,19 While quality assessment might not be necessary as the review did not synthesize efficacy data, 18 it was deemed important to provide a holistic overview of the quality of studies in facial prosthesis research. Two of the QATSDD criteria are related to outcome measures: rationale for the choice of data collection tool and statistical assessment of the reliability and validity of the measurement tools. 16 Both of these criteria did not rate highly among the study designs. This suggests a need for better consideration or reporting of these concepts in the facial prosthesis literature.
A limitation of this systematic review arose from the exclusion of potentially relevant manuscripts where a full-text English-language version was unavailable. This could limit the generalizability of the results if there are variabilities in the choice of outcome measure as a result of language differences. The 40-year inclusion period may have influenced the quality of included studies, as earlier studies may not be subject to recent rigorous reporting criteria. The inclusion period also resulted in challenges when acquiring missing information from earlier publications.
The outcome measure classification system was based on previous reviews. 18,19 Selecting the most appropriate categories was a challenge, as some outcome measures were not explicitly defined or related to more than 1 theme. For example, some questionnaires evaluated satisfaction, QoL, self-confidence, and social aspects in a single tool. In addition, concepts such as complications and prosthetic aftercare could overlap. In such situations, categorization was based on the predominant theme and resolved by consensus. The focus of the results may change by undertaking an alternative approach.
Consensus is lacking regarding the most appropriate and meaningful outcome measures to use in facial prosthesis research. For example, patient satisfaction was evaluated by 31 studies; of which, 20 studies created selfdesigned condition-specific questionnaires, 7 used other authors' questionnaires (in the original or a modified form), 3 used single-item scales, and 2 collected data from case note review. Condition-specific questionnaires developed and validated for other conditions were found  33,39,57,73,[79][80][81][82][83] Reasons for replacement 33,39,73,79,80 Prosthesis failure Number of failures (prostheses that are not retained by implants) 84 Aftercare Not applicable Self-designed data collection 8,33,44,79,80 Complications Not applicable Biological complications 47,58,70,72,79,85 Technical complications 65,72,73,81,85 Service delivery Costs to the hospital Cost of the prosthesis, operating room, inpatient hospital stay and miscellaneous costs. 86 Procedural characteristics Number of surgical procedures, length of stay within hospital. 86 Other objective tools Symmetry Direct measurements of distances between insertion points of normal and artificial ears and facial mid-plane. 85 Asymmetry indexdmean distance between the original and mirrored cloud divided by the diagonal of the bounding box of the face. 87 Linear distances between fixed anthropometric landmarks (eye fissure length and height) from a standardized photograph with Adobe Photoshop software. 54 Function Acoustic changedreal ear testing with a Real-Ear analyzer 69 to be used with patients with facial prostheses, and this may not provide meaningful data. Guidance is available which highlights the ideal features of outcome measures. [88][89][90] Reliability is the ability to distinguish among individuals despite measurement error, 88 validity relates to whether the tool measures what it is intended to measure, 90 and responsiveness refers to whether the tool can distinguish among patients who remain the same, improve, or deteriorate over the course of the study. 88 The authors recommend that future research uses outcome measures with appropriate measurement properties for use with facial prostheses. Evaluation of measurement properties is beyond the scope of this systematic review 91 ; however, conditionspecific outcome measures such as the Toronto Outcome Measure for Craniofacial Prosthetics were identified that appeared to be validated in this context. 92 A standardized set of outcomes may be beneficial to indicate what should be measured and reported in facial prosthesis research.

CONCLUSIONS
Based on the findings of this systematic review, the following conclusions were drawn: