Surveys designed for use in audiological practice have been available for many years. Possibly beginning with the Hearing Handicap Scale (High, 1964), surveys have been developed for varied applications, such as quantifying handicap or disability and for documenting outcomes of rehabilitation. The modern era of surveys for audiological practice arguably began in the early 1980s. There followed a proliferation of surveys for use in both pre- and post-hearing aid fitting applications. Dillon (2012) devoted an entire chapter of his text, Hearing Aids II, to this topic which included more than 35 surveys available at that time. Other articles are available to help audiologists with the selection of an appropriate survey to their clinical needs (Bentler and Kramer, 2000; Humes, 2004). 

There are many reasons to use outcome measures in clinical practice (Dillon, 2012). For example, a practice may wish to document the benefit of their hearing aid fitting program, to determine perceived benefit of some circuit option, to provide efficacy data to some third-party payer or perhaps to follow best practice guidelines. Kochkin, et al (2010) found that as more elements of best practices were incorporated into the hearing aid fitting process, including outcome measures, benefit significantly increased. Outcome measures are now considered a component of hearing aid related best practices by two major organizations representing audiologists (Valente, 2006; ASHA, 1998). However, two surveys of clinical practice indicate that only 30–40 percent of audiologists make use of outcome measures (Lindley, 2006; Brook, 2013). 

Several years ago, we committed our practice to include outcome measures for evaluating results of our hearing aid fittings. Quite simply, we wanted to determine if our hearing aid fittings were successful from the patient perspective. Cox, et al (2016) stated, “In the long run, it is the performance in daily living in the circumstances of the particular listener that determines the usefulness of a hearing aid fitting.” Dillon (2012) stated, “Outcome measures keep us grounded as to what we are, and are not, really achieving, from the perspective of the client.” We had no evidence that we were actually doing as good a job as we thought. Experience has taught us that in a small number of cases, problems become apparent over a period of time. Some patients will not ask for help, either believing nothing can be done or not wanting to be a bother. Using an outcome measure makes us proactive.

As mentioned before, there were many outcome measures available for us to consider. These measures differ in their focus (e.g., use, benefit, satisfaction, etc.), length, complexity, and ease-of-use. It became apparent from the outset that we needed to clarify both our clinical goals and the requirements of our practice for routine use. Human nature being what it is, any change that is not easy to implement will quickly fall into disuse. 

We wanted a survey that included domains of use (e.g., hours per day, ease of handling, comfort of fit), benefit (e.g., hearing conversation in quiet situations, noisy situations, TV) and overall improvement in quality of life. The survey had to be brief with simple wording suitable for self-administration in a paper and pencil format. Also, the survey had to allow efficient review by audiologists without computer input. Despite the wide variety of excellent questionnaires considered (e.g., Cox and Alexander, 1995; Cox and Alexander, 1999; Cox, Alexander, and Xu, 2014; Dillon and Ginnis, 1997; Giolas, Owens, Lamb and Schubert, 1979; Walden, Demorest and Hepler, 1984; Ventry and Weinstein, 1983) we did not find one that met all of our needs. Perhaps we suffered from the “Goldilocks syndrome,” finding the porridge too hot or too cold, but we ignored the cautions of Bentler and Kramer (2000) and set out to develop a scale based on the above objectives. This project has been approved by the Human Subjects Institutional Review Board of Western Michigan University.

Development of the Hearing Aid Follow-Up Survey

We drew on the extensive body of literature of previously developed surveys and our own years of clinical experience to begin constructing the survey. For many years, we have used a checklist to structure our in-office post-fitting visits. It addresses essential use factors (e.g., hours of daily hearing aid use; ability to change the battery; comfort of the ear mold) and common hearing situations (e.g., ability to converse in a small group; ability to hear in an auditorium or church). Much of the content of this checklist was adapted from the HAPI (Walden et al, 1984) and it has served us well. With this as our starting point, we added items that reflected the most common difficulties expressed by our patients while being mindful that aspects of use, benefit/satisfaction, and quality of life were included. It is a minimalistic survey in that it does not contain multiple items for a given situation. Revisions were made until we felt the survey met our goals.

The final survey includes 18 items. An initial item inquires about frequency of hearing aid use and offers four responses ranging from, “I wear my hearing aids most of the day” to “I rarely wear my hearing aids,” similar to the item used by Nabalek et al (2006) in research on the Acceptable Noise Level Test. Each subsequent item requests a response on a seven-item alpha scale (A through G) anchored by “strongly agree” and “strongly disagree,” similar to the APHAB. We did provide a “not applicable” (N/A) response option, as some patients simply do not experience the full range of situations addressed. There are no reversals of response direction such that strongly agree is always a positive response and strongly disagree is always a negative response. The final version of the Hearing Aid Follow-Up Survey (HAFUS) is shown in Appendix A.


We established a protocol for incorporating this survey into our hearing aid fitting and follow-up service. The survey was mailed to patients two to three months post fitting, a sufficient time to allow for possible acclimatization effects (Cox and Alexander, 1992; Humes, 1996) and to give ample time for patients to experience a diverse array of listening conditions. Once each month, a designated member of our staff used our practice management software to retrieve the demographic information on patients who had purchased hearing aids within this time frame. The information was merged to a letter sent to each patient along with a copy of the HAFUS. The letter stated our goal that they be satisfied with their hearing aids. It requested that they complete the enclosed survey addressing common concerns with hearing aids so that we could provide assistance, if needed. A stamped and addressed envelope was also enclosed. We did not send the survey to patients believed to be unable to respond independently (e.g., dementia). Also, in cases that the fitting process was still in progress, the mailing of the survey was delayed one month. If a response was not received after about one month, a prompt was sent with a second copy of the survey. 

Each response was logged into our database by a member of our staff. All responses were reviewed by the audiologist/case manager. If the responses indicated problems or seemed inconsistent with the degree of hearing loss, the patient was contacted by phone for clarification and additional appointments were scheduled as appropriate. As we gained confidence in the survey and our protocol, we began compiling the results, except as our clinical or administrative responsibilities precluded these activities. 

Over a period of about two-and-a-half to three years, we sent surveys to 473 consecutive patients during time periods that resources were available to analyze results. During this timeframe, there were 47 consecutive patients that completed surveys that were not analyzed for reasons noted above. We obtained a 74.2 percent response rate (351/473) considering both the initial and second mailing. 

Our initial review of the responses was very positive. In fact, we were concerned about ceiling effects on some items and began looking for explanations. Several patterns became apparent. There were some patients who chose the same response for all items. There were some patients who responded more positively to the benefit of the hearing aids in difficult listening conditions than in easy conditions, clearly an invalid pattern. There were some cases where so many items were left unanswered that it was impossible to judge outcomes. Also, we felt that item 9, hearing one other person in quiet, was a universal item that applied to all and its completion was required. Surveys reflecting any of the above conditions were not considered in the subsequent analysis. Consequently, we disqualified 48 surveys from further consideration. It was our belief that eliminating these patients would give a more valid indication of hearing aid outcomes. A comparison of the entire group and the “scrubbed” group showed nearly identical mean responses with the largest changes being 0.14 in mean and 0.06 in standard deviation. The remaining 303 surveys (303/473 equals 64 percent) were analyzed. 

The test-retest reliability of the HAFUS was also determined. For a group of 50 consecutive patients who returned the scale, it was sent to them again about one month later. Responses were obtained from 33 patients. Three were eliminated based on the rationale described above. This left 30 surveys (60 percent) available for analysis.


For scoring, alpha responses for individual items were converted to numbers 1 through 7 with A equals 1; B equals 2, etc. A lower score on the HAFUS indicates a better outcome. In routine clinical use, we generally do not average the item scores and simply review the entire form looking for responses greater than B and reading patient comments. As problems are identified, we contact the patient to attempt to resolve problems. 


FIGURE 1. Mean audiometric thresholds and standard deviations for 303 patients  (0=right ear; X=left ear).
FIGURE 1. Mean audiometric thresholds and standard deviations for 303 patients (0=right ear; X=left ear).

Our analysis included 303 patients fit with new hearing aids over a period extending slightly longer than 2.5 years, from November 2010 to June 2013. There were 159 females and 144 males. Age ranged from 18 to 100 with a mean of 76 years (SD 14.4). All patients had a complete hearing evaluation prior to hearing aid selection and fitting. The average hearing of our patients was a gradually sloping moderate to severe bilaterally symmetrical sensorineural loss (FIGURE 1). 

As this was inclusive of our patient load, there were a small number of patients with conductive and mixed hearing loss. The four-frequency average (0.5, 1, 2, 4 kHz) was 53 dB HL for right and left ears, respectively. Nearly all (over 99 percent) of the fittings were verified using real ear measures with the Audioscan Verifit Speech Map module referenced to National Acoustics Laboratory NL1 or NL2 targets.

There were 217 patients obtaining replacement hearing aids and 86 obtaining their first hearing aids. Binaural aids were purchased by 171 patients. Of the remaining patients, 86 were monaural wearers and obtained one new hearing aid (43 left; 43 right) while an additional 46 patients obtained one new aid but had an aid on the opposite ear. Considering both new and old hearing aids, there were 217 (72 percent) binaural wearers and 86 (28 percent) monaural wearers. All styles of behind the ear and in the ear hearing aids were represented, including open fit models. 

Insurance contributed at least some part of the purchase price for 148 (49 percent) patients while 155 (51 percent) were entirely self-pay. There were 12 patients who purchased hearing aids on more than one occasion during the analysis time interval. As these patients completed the HAFUS for each post-fitting occasion, they were treated as separate cases.

All patients were fit with digital WDRC hearing aids with a wide variety of features that were available between 2010–2013. The aids ranged from basic (four band; four channel; omnidirectional or manually selectable directional microphone) to high end (20 band; 20 channel; auto switching from omnidirectional through several levels of adaptive directionality and digital noise reduction). 


Data was analyzed using Excel 2010 and IBM SPSS Statistics 24. Although the survey responses are arguably ordinal in nature, they were analyzed with parametric methods as with other recent studies (Cox et al, 2016; Smith et al, 2013). The overall pattern of results indicates that the average patient in our practice wears his or her hearing aids most of the day with little difficulty. Wear time is nearly all day or whenever needed by 99 percent of respondents (291/293). Hearing is improved in nearly all situations although benefit is not as good in situations with groups or background noise as in quiet. It is of interest to note that improved quality of life was reported by 97 percent of those responding (285/294), despite acknowledging continued difficulty in several situations. Descriptive statistics are reported in TABLE 1.

Responses tended toward the extreme positive end of the scale on items that did not include background noise. This is consistent with our clinical impressions with the results of hearing aid fittings in our office. As a research tool, ceiling effects are not desirable. However, as a clinical outcome measure, most audiologists would expect their patients to be very successful except in more challenging situations.

It was our hope that routine review of patient surveys could be accomplished with an eye ball method. That is, a quick scan down the survey looking for items that were beyond some cutoff point. A frequency distribution of responses to each item is shown in TABLE 2. For simplicity, we suggest using a response poorer than “C” for any item as a reason to look more closely at the case. These points roughly approximate the ninetieth percentile suggesting that only 10 percent of the cases perform better. Realistically, one should anticipate responses of “A” or “B” on most items with scores of “D” on the more difficult items (10, 12, 14, 15, 16).

Better precision may result from combining items. Based on item content, it seemed reasonable to create subscales for quiet communication situations (items 9 and 11) and noise/reverberation situations (items 10, 12, 15, 16). While we do not routinely calculate subscales clinically, it can be useful in comparing conditions or to previous research. The mean score for the quiet subscale was 1.47 (SD 0.63) while the noise/reverberation subscale was 2.71 (SD 1.18). The corresponding 90th percentiles are “B” and “D,” respectively. The use of these subscales was supported by principal component analysis. After varimax rotation, the loadings for the quiet subscale were 0.71 and 0.65 for items 9 and 11 while the loadings for the noise subscale were 0.81, 0.89, 0.76, and 0.86 for items 10, 12, 15, and 16, respectively.


The HAFUS has satisfactory reliability. Internal reliability as measured by Cronbach’s alpha was 0.89. Alpha did not drop below 0.88 when any individual scale item was deleted. Corrected item-total correlations ranged from 0.42–0.71, except for item 1 (wear time) which is -0.01. 

Test retest reliability as indicated by individual scale item Pearson product moment correlations ranged from 0.39–1.0, except for item 11 (hearing conversation in a small group in quiet) which was 0.24. However, the low correlations may be attributed to the relatively limited range of variability among patients (Hyde, 2000). As an alternative measure of reliability, the intraclass correlation for the test retest data has been used (Smith et al, 2013). The overall intraclass correlation was 0.93.

From a more clinically relevant perspective, the audiologist should want to know how many scale units does a response have to change on any given item to conclude that a true difference has occurred. It was found that 90 percent of the absolute test-retest differences were separated by no more than one scale point. Critical differences were calculated to determine the change required to determine that a true difference has occurred between two scores for the same individual (TABLE 3). For the 90 percent critical difference, these values ranged from 0–0.69, in agreement with the above analysis of the distribution. The 95 percent critical differences ranged from 0–0.84. We feel it is reasonable to conclude that using an item score change of more than plus or minus one scale point represents a true change. 


We believe the HAFUS has good face validity. The average pattern of response was consistent with our expectation from patients in our office. It was our impression that our patients wore their hearing aids most of the time with little difficulty in handling the devices, in comfort of fit, or in hearing-in-quiet circumstances. We fully expected patients to report more difficulty hearing in challenging listening conditions. Despite this limitation, improved quality of life is reported by the vast majority of our patients. This is the pattern of results that is demonstrated by the mean item scores shown in TABLE 1

Results from the HAFUS were compared to the revised APHAB norms (Johnson, 2010). Both scales use seven-point response scales. However, the APHAB assigns percentages to each point (e.g., A equals 99 percent) whereas the HAFUS provided only anchors of “strongly agree” and “strongly disagree” at the extremes. It seemed likely that patients may respond differently to these descriptors so only a qualitative comparison was made.

We compared the APHAB subscales for quiet, noise/reverberation (combined), and aversiveness to the HAFUS subscales for quiet, noise and the single item (5) dealing with aversiveness. Our patients reported better performance on both subscales but the relation between hearing in quiet and noise/reverberation situations was very similar. Each scale found that performance with noise/reverberation was reported as about 1.5 scale units worse than performance in quiet situations. Considering sound aversiveness, the HAFUS response averages one scale unit less than the quiet subscale while the APHAB aversiveness subscale is about two scale units less than the quiet subscale. We feel that these similarities support the validity of the HAFUS.


This article describes the integration of an outcome measure into the hearing aid fitting program in our office. Procedures were developed for support staff and audiologists to participate in the process. Minimal time is required for the audiologist to review completed surveys. We feel that use of the HAFUS has had a very positive effect on our practice. 

We now have objective data to support our subjective impressions of the success of our hearing aid program. We can respond with confidence to many questions asked by new patients. For example, nearly everyone seems to have a family member or friend who does not wear their hearing aids. We can honestly state that 99 percent of our patients report wearing their hearing aids all of the time or whenever needed. We are often questioned about whether the hearing aids will help in background noise. We can respond that generally there is improvement yet not as much as in quiet situations. We have data that demonstrates that the nearly all of our patients (97 percent) feel we have helped improve the quality of their lives. This can be very valuable information as independent practices are now forced to compete with big box stores, the internet, or “value added” programs incorporated into insurance benefits.

We chose to develop a new outcome measure rather than use one of the many tools currently available. The HAFUS touches on areas of use, benefit, and quality-of-life that we consider important for a successful outcome. It is not intended to probe deeply into any one area. Rather, it is a more general measure that may indicate problems that require further attention. We look forward to receiving the completed surveys from our patients. They let us know that we are really accomplishing our objectives and, at times, that we are failing. In the latter case, we have the opportunity to contact the patient to address, and hopefully correct, the problem before it festers. In many cases, our patients write heartwarming comments at the end of the survey that make us happy to be audiologists!

The normative data that we presented for the HAFUS is specific to our practice. We believe that our patients are typical of the cases that would be seen in other private practices of a similar nature. However, the outcomes are likely to vary according to the procedures that are used for hearing aid selection, fitting, and follow-up care. For example, nearly 100 percent of our fittings are verified with real ear probe tube measures, while in general this is true for only about 30 percent of dispensers (Mueller, 2014). Practices that do not use similar techniques may have different outcomes (Kochkin et al, 2010). It would be desirable for a practice to generate local norms that are based on their own procedures. If this is not practical, one of the other scales with normative data gathered from many practices should be considered.

Part Two of this article discusses the relation of some of the patient and hearing aid variables to HAFUS scores, and some of the ways that we have expanded the use of this survey.