By Amanda Demas and Amanda Ortmann
Introduction to STT
Speech-to-text transcription (STT) is the process of converting spoken words into text. STT does not refer to the captions themselves, but rather to the process of creating them. STT serves as an umbrella term that does not indicate how speech is transcribed, only that speech is being transcribed in some way.
The transcription process can be achieved either by human efforts or by artificial intelligence (AI). AI is defined as “the theory and development of computer systems able to perform tasks that normally require human intelligence, such as visual perception, speech recognition, decision-making, and translation between languages” (Oxford Languages, 2021).
STT is important for individuals who are d/Deaf or hard of hearing (U.S. Equal Employment Opportunity Commission, 2014). Additionally, STT is beneficial for the “normal” hearing population in environments with a poor signal-to-noise ratio (SNR), as well as in educational environments and for those who are second-language learners (Cucchiarini et al, 2000; Liao et al, 2020; Li, 2016).
Individuals in any of these situations are considered learners because they are often expected to listen, take notes, integrate information, and ask questions. If proper content access is not provided for these individuals and situations, learning may be compromised.
STT can be used in classrooms, meetings, conferences, movies, television shows, phone calls, and even small-group or one-on-one conversations. Regardless of the situation, STT has the ability to provide access to speech through real-time transcription.
Many audiologists are aware of good communication strategies and practice them every day. However, it can be easy to forget when presenting or in an academic environment. Small things, such as the rate of speech and body language, can make a big impact on audibility for listeners and learners.
Practicing Good Communication Strategies
Good communication strategies include the following:
- When presenting, avoid turning your back to the class or the audience. This is difficult, for example, when writing on a whiteboard or a blackboard. If you need to turn to write something down, pause your presentation until you can turn back and face your audience.
- When presenting virtually or in any situation where you do not need to wear a mask, avoid covering your mouth or obstructing the view of your mouth in any way. Looking down, looking to the side, holding a hand or object such as a coffee cup over your mouth or on your chin, can be distracting and can obstruct lipreading.
- Repeat questions and comments from the audience. If a question comes from someone who is sitting in the back, it can be difficult for those up front to hear or know who is talking. If you are in a virtual meeting, announce yourself before asking a question or commenting. For example, start your interjection with “This is Amanda. My question is…”
- Offer a copy of the course or meeting content beforehand, so the class or audience does not have to focus on taking notes, listening/reading, and understanding. These simultaneous tasks may be difficult, especially for someone who is d/Deaf or hard of hearing. Having the information beforehand helps provide context clues for spoken language.
Many communication strategies help to provide visual cues by avoiding distracting activities that may obstruct lipreading. STT complements those strategies by providing the visual of the speaker’s text as it is spoken in real-time.
Integrating STT into Your Practice
- Consider using or purchasing a lavalier microphone that has a direct connection to speakers in the room or a computer that can broadcast free STT captions from your presentation.
- Make closed captioning available; turn on closed captions. There are dozens of technology options that are free, Health Insurance Portability and Accountability Act (HIPAA) compliant, and easy to implement. As a presenter, opt to provide captions whenever you can. You never know who might benefit from them.
- When recommending STT for a patient, do not specify communication access real-time translation (CART). This limits the individual to one service, which can be denied due to high cost and limited availability of human stenographers. Instead, recommend STT as an umbrella term that embodies both CART and AI options.
Transcription Options: Human or AI
STT is universally important, as demonstrated by its use to create subtitles or closed captioning for movies, television shows, and live events. STT has made its way into modern society and is a permanent presence, due to laws such as the Americans with Disabilities Act (ADA) of 1990 and Section 1474 in Part C of the Individuals with Disabilities Education Act (IDEA).
These laws do not provide mandates on the process of providing STT or accuracy and latency requirements for STT, but do require that reasonable accommodations be offered. There are two broad forms of STT: transcription by a human and transcription by a computer/artificial intelligence.
Transcription by a Human Stenographer
Transcription by a human is achieved by a stenographer through CART, which is a commonly used accommodation for individuals who are d/Deaf or hard of hearing. CART providers are trained, beginning with courtroom reporting, then build their skills in preparation for the Certified CART Provider (CCP) Examination (Knight, 2020). Similar to the confidentiality requirement for courtroom reporting, CART is recognized as a HIPAA-compliant service.
Using a steno-writing machine, CART providers must consistently supply transcriptions that are at least 98 percent accurate in order to achieve certification (NCRA Broadcast and CART Captioning Committee, 2016). CART stenographers are expected to transcribe a minimum of 180 words per minute (wpm) on single-speaker dialogues and a minimum of 225 wpm on two-speaker dialogues (Miller, 2021).
CART is presumed to be highly accurate in its transcription and, until the development of speech-recognition technology, was the only transcription option available. However, accessibility to CART is often significantly hindered by financial burden and a shortage of trained professionals.
The Collaborative for Communication Access via Captioning (CCAC) found that CART services cost an average of $75 to $200 per hour (CaptionMatch, 2019). Extensive training, skill, and certification requirements have led to a shortage of providers.
The COVID-19 pandemic brought collective standards of social distancing and limited-occupancy allowances, which have made it difficult for CART providers to find their place in a physical setting. Remote CART options can be considered in these instances, but connection requirements often create equipment barriers and increased latency of transcription.
Introducing Human-Stenographer-Based STT in Your Practice
- Research local CART providers by using online resources such as stenoresearch.com and ncra.org to find out who your local providers are.
- Decide on remote CART or in-person CART and think about where an in-person provider would fit into your space.
- Establish good communication between your practice and your CART provider. This will be important to communicate service hours, share unique terminology that may be used during the session so that it may be entered into the steno machine, and agree upon a place to share the transcript after the session.
Transcription by AI
Over the last decade, automatic-speech-recognition (ASR) systems have emerged as a free or low-cost, widely accessible medium for STT. ASR technology uses AI to develop its own lexicon and improve in accuracy over time.
Continuous speech-recognition and transcription capabilities are not limited to STT and are also seen in devices such as Apple’s Siri and Amazon’s Alexa products. The integration of speech-recognition and transcription to household devices contributes to one of the advantages of AI-based forms of STT: universal accessibility. The ease of use and integration into “smart” devices has contributed to increased awareness and demand for AI.
STT offers inclusivity by providing access to real-time audio for individuals of any hearing ability. The increase in AI options, versatility, and ability to transcribe hundreds of languages makes STT universally accessible.
Introducing AI-Based STT in Your Practice
It is useful to evaluate the accuracy and latency of free or low-cost, readily available STT platforms such as machine learning or AI to provide an alternative to CART, without the limitations of cost, provider accessibility, and human fatigue.
Companies such as Google, IBM, Microsoft, and others are making STT technology available:
- Ava captioning technology was designed for accessibility in the workplace, specifically for individuals who are d/Deaf or hard of hearing. Ava offers a range of services from AI alone to a combination of AI and a human scribe to correct AI captions in real time (Ava Scribe). The company claims a 90 to 99 percent accuracy rate. Ava is not HIPAA compliant (Ava, 2021). A subscription starts at $9.99 per month for one person and ranges up to $149 per month for a business. Product information is available here.
- Google Live Transcribe is a machine-learning platform that can transcribe speech in 125 languages in real time. Google owns YouTube; the speech-recognition algorithm is used for video transcription from user-uploaded videos across the world. Google transcription is available on all Google platforms. Google Slides, Meet, and Docs are all HIPAA compliant if a business associate agreement (BAA) is signed (Google, 2021).
- Microsoft offers cognitive STT services in 85 languages through its Azure software platform. Free to low-cost STT services using artificial intelligence technology are available. The platform uses a customizable acoustic model to aid in accurate detection of speech in atypical environments. The technology can adopt frequently used terms for better recognition. Microsoft STT is HIPAA compliant (Microsoft, 2021). Product information is available here.
- Otter.ai, an automatic-transcription technology, is gaining attention because of a partnership with Zoom. A purchased Otter account can be linked to Zoom and will appear in a sidebar on Zoom calls. In the future, Otter is expected to be available as an integrated system. While Otter is not specifically designed for accessibility, it offers features including the ability to add photos for speakers, save and edit transcripts, and more. Use is free for individuals for up to 10 hours per month and ranges from $8 to $20 per month for unlimited individual or business use. Otter is not HIPAA compliant (Otter.ai, 2021). Product information is available here.
Ready, Set, STT!
Implementing STT and strong communication strategies into your practice is easy. Options are available in a range of pricing options, including low-cost or even free possibilities. Providing this service ensures that patients have good access to spoken information and helps perpetuate strong communication strategies in your practice.
Additional information on STT applications, connectivity information, and basics to share with patients can be found among the resources compiled by Tina Childress, AuD, available online.
DISCLAIMER : Due to the fluid nature of the technology industry, the applications, features, pricing, and technology specifications mentioned here may have changed since this article went to press.
This article is a part of the September/October 2021 Audiology Today issue.
Association of National Advertisers. (2010) The Benefits of Closed Captioning Commercials. ANA Production Management Committee. https://ecfsapi.fcc.gov/file/7521097468.pdf (accessed July 8, 2021).
CaptionMatch. (2019) General Information about Captioning and CART. captionmatch.com/general-info-captioning-cart/ (accessed October 18, 2019).
Childress T. (2021) Speech-to-Text Options. See Hear Communication Matters. docs.google.com/spreadsheets/d/18g0yT08qPmboMX5WUvacOOvN8M90tvr0_WtIhUxTd0k/edit#gid=376066376 (accessed April 1, 2021).
Cucchiarini C, Strik H, Boves L. (2000) Quantitative assessment of second language learners’ fluency by means of automatic speech recognition technology. J Acoust Soc Am 107(2):989–999.
Google. (2021) HIPAA Compliance with Google Workspace and Cloud Identity. Google Workspace Admin Help. support.google.com/a/answer/3407054?hl=en (accessed April 1, 2021).
Google Cloud. (2021) Cloud Speech-To-Text. cloud.google.com/speech-to-text/ (accessed March 29, 2021).
IBM. (2021) What Is Artificial Intelligence (AI)? IBM Cloud Education. ibm.com/cloud/learn/what-is-artificial-intelligence (accessed March 4, 2021).
Knight M. (2020) Is CART Easier Than Court Reporting? StenoKnight CART Services. stenoknight.com/StudentCART.html (accessed March 20, 2021).
Li M. (2016) Investigation into the Differential Effects of Subtitles (First Language, Second Language, and Bilingual) on Second Language Vocabulary Acquisition (Thesis). The University of Edinburgh. https://era.ed.ac.uk/handle/1842/22013 (accessed March 31, 2021).
Liao S, Kruger JL, Doherty S. (2020) The impact of monolingual and bilingual subtitles on visual attention, cognitive load, and comprehension. J Spec Trans (33)70–98.
Microsoft. (2021) Empower healthcare professionals with Microsoft Teams. microsoft.com/en-us/microsoft-teams/healthcare-solutions#:~:text=Built%20on%20the%20secure%20and,(GDPR)%2C%20and%20more (accessed April 1, 2021).
Miller C. (2021) Becoming a CART Provider FAQs. Florida Court Reporters Association. www.fcraonline.org/becoming-a-cart-provider-faqs (accessed February 17, 2021).
National Court Reporters Association. (2016) NCRA Broadcast and CART Captioning Committee. Guidelines for CART Captioners. www.ncra.org/docs/default-source/uploadedfiles/governmentrelations/guidelines-for-cart-captioners.pdf?sfvrsn=584f6a61_10 (accessed March 18, 2021).
Otter.ai. (2021) Otter is where conversations live. https://otter.ai/ (accessed March 20, 2021).
Oxford Languages. (2021) The Oxford English Dictionary. https://languages.oup.com/google-dictionary-en/ (accessed July 8, 2021).
United States Department of Education. (2021) Individuals with Disabilities Education Act. Section 1474. https://sites.ed.gov/idea/statute-chapter-33/subchapter-iv/part-c/1474 (accessed March 20, 2021).
United States Department of Justice, Civil Rights Division. (2021) The Americans with Disabilities Act of 1990 and Revised ADA Regulations Implementing Title II and Title III. www.ada.gov/2010_regs.htm (accessed April 20, 2021).
United States Equal Employment Opportunity Commission. (2014) Deafness and Hearing Impairments in the Workplace and the Americans with Disabilities Act. www.eeoc.gov/laws/guidance/deafness-and-hearing-impairments-workplace-and-americans-disabilities-act (accessed June 13, 2021).