We’re looking into using https://github.com/k2-fsa/sherpa-onnx to provide built-in text-to-speech and speech-to-text to greatly improve the out-of-the-box accessibility of GrapheneOS for blind users. We already have a screen reader included via our fork of the open source variant of TalkBack.
To have text-to-speech functioning out-of-the-box, we can choose one of the models with open source training code and data as the default to be included within the OS. We wouldn’t need to include anything that’s not truly open source. It’s the only reasonable option we’ve found.
There are over 100 models for 40 languages. Some research is going to be required to figure out which of the English ones are fully open source (open training data and code) and then which of those works best for basic text-to-speech to have as the default bundled in the OS.
If we had text-to-speech support included in GrapheneOS, we could also provide an automatic captions feature.
We’ll need to do a basic review of the code for text-to-speech, speech-to-text, shared code and any other parts we decide to use. We’ll need at least a minor fork of it.
We want to stick to a model with open source training code/data for what we bundle, so we’re likely not going to be able to use one of the best options by default. Having a tolerable open source model by default with the option to switch to great “open” models seems good enough.
We could use help narrowing down which of the available English models with open training data would be best (least bad) for basic text-to-speech usage including for TalkBack. We could also collect feedback somewhere on which ones people think are best overall across languages.