Speech — Apple framework reference

Speech performs speech recognition, transcribing audio files and live audio with support for custom language models. You configure an SFSpeechRecognizer for a locale and submit either an SFSpeechURLRecognitionRequest for a recorded file or an SFSpeechAudioBufferRecognitionRequest for live audio, then receive results through an SFSpeechRecognitionTask whose SFSpeechRecognitionResult carries an SFTranscription composed of SFTranscriptionSegment values. Custom recognition is driven through SFSpeechLanguageModel and SFCustomLanguageModelData, and recognition requires checking SFSpeechRecognizerAuthorizationStatus before use. The framework also provides a SpeechAnalyzer actor and modular SpeechTranscriber, DictationTranscriber, and SpeechDetector components that consume AnalyzerInput and build on the SpeechModule protocol for analyzing audio.

Speech Recognition Essentials 3

Create a recognizer for a locale and start transcribing audio.

Cl
SFSpeechRecognizermacOS 10.15+
An object you use to check for the availability of the speech recognition service, and to initiate the speech recognition process.
Pr
SFSpeechRecognizerDelegatemacOS 10.15+
A protocol that you adopt in your objects to track the availability of a speech recognizer.
En
SFSpeechRecognizerAuthorizationStatusmacOS 10.15+
The app's authorization to perform speech recognition.

Recognition Requests 3

Describe the audio to transcribe, whether a recorded file or a live audio stream.

Cl
SFSpeechRecognitionRequestmacOS 10.15+
An abstract class that represents a request to recognize speech from an audio source.
Cl
SFSpeechURLRecognitionRequestmacOS 10.15+
A request to recognize speech in a recorded audio file.
Cl
SFSpeechAudioBufferRecognitionRequestmacOS 10.15+
A request to recognize speech from captured audio content, such as audio from the device's microphone.

Recognition Tasks and Results 6

Track an in-progress recognition and read back its transcribed output.

Cl
SFSpeechRecognitionTaskmacOS 10.15+
A task object for monitoring the speech recognition progress.
Pr
SFSpeechRecognitionTaskDelegatemacOS 10.15+
A protocol with methods for managing multi-utterance speech recognition requests.
En
SFSpeechRecognitionTaskStatemacOS 10.15+
The state of the task associated with the recognition request.
En
SFSpeechRecognitionTaskHintmacOS 10.15+
The type of task for which you are using speech recognition.
Cl
SFSpeechRecognitionResultmacOS 10.15+
An object that contains the partial or final results of a speech recognition request.
Cl
SFSpeechRecognitionMetadatamacOS 11.3+
The metadata of speech in the audio of a speech recognition request.

Transcriptions 4

Inspect the recognized text and its constituent segments.

Cl
SFTranscriptionmacOS 10.15+
A textual representation of the specified speech in its entirety, as recognized by the speech recognizer.
Cl
SFTranscriptionSegmentmacOS 10.15+
A discrete part of an entire transcription, as identified by the speech recognizer.
Cl
SFAcousticFeaturemacOS 10.15+
The value of a voice analysis metric.
Cl
SFVoiceAnalyticsmacOS 10.15+
A collection of vocal analysis metrics.

Custom Language Models 4

Bias recognition toward domain-specific vocabulary using custom language model data.

Cl
SFSpeechLanguageModelmacOS 14+
A language model built from custom training data.
Cl
SFCustomLanguageModelData
An object that generates and exports custom language model training data.
Pr
DataInsertable
A protocol supporting the custom language model training data result builder.
Pr
TemplateInsertable
A protocol supporting the custom language model training data result builder.

Speech Analysis 8

Analyze audio through a modular pipeline of transcription and detection components.

Ac
SpeechAnalyzer
An actor that coordinates a pipeline of modules to analyze an audio stream.
Cl
SpeechTranscriber
A speech-to-text transcription module that's appropriate for normal conversation and general purposes.
Cl
DictationTranscriber
A speech-to-text transcription module that's similar to system dictation features and compatible with older devices.
Cl
SpeechDetector
A module that performs a voice activity detection (VAD) analysis.
Pr
SpeechModule
Protocol that all analyzer modules conform to.
Pr
LocaleDependentSpeechModule
A module that requires locale-specific assets.
Pr
SpeechModuleResult
Protocol that all module results conform to.
En
SpeechModels
Namespace for methods related to model management.

Analyzer Input 5

Supply and convert audio input that feeds the speech analysis pipeline.

St
AnalyzerInput
Time-coded audio data.
Cl
AnalyzerInputConverter
Converts audio buffers to a format suitable for analysis by a speech analyzer.
Cl
AnalysisContext
Contextual information that may be shared among analyzers.
Cl
AssetInputSequenceProvider
Reads from an audio file or asset, providing its audio in a format suitable for analysis by a speech analyzer.
Cl
CaptureInputSequenceProvider
Reads from an AV capture device such as a microphone, providing the captured audio in a format suitable for analysis by a speech analyzer.

Asset Management 1

Manage the on-device models and assets required for analysis.

Cl
AssetInventory
Manages the assets that are necessary for transcription or other analyses.

Errors 1

Errors reported by speech recognition and analysis.

St
SFSpeechErrormacOS 14+
A structure describing errors that occur during speech recognition or analysis.

Classes 1

Cl
AssetInstallationRequest
An object that describes, downloads, and installs a selection of assets.

Extends 3

← Machine Learning & AI