Meta Platforms rolled out its multimodal AI model SeamlessM4T on August 22. It’s a neural network that can reportedly process both audio and text. The company says it’s capable of translating and transcribing speech. It can do it in about a hundred languages.
If it lives up to the hype, then the SeamlessM4T stands as a vanguard. It would allow for real-time communication through various languages.
Facebook’s parent company explained that their AI model can perform text-to-speech translations. It can also do speech-to-text. It also supports speech-to-speech and text-to-text translations. It can do this in almost 100 languages. The SeamlessM4T model can reportedly do full speech-to-speech translations in 35 languages. This includes Western Persian, Urdu, and Modern Standard Arabic.
The tech company built upon previous innovations. It combined technology that was available as separate models. The No Language Left Behind model was one tech Meta used in creating Seamless. The other was the Universal Speech Translator.
Meta CEO Mark Zuckerberg said he thinks of these tools as means to ease interactions. Especially between users in the metaverse. It’s a series of interconnected virtual worlds. Zuckerberg has already made this controversial platform a hill to die on. The SeamlessM4T could make communications easier. It won’t even matter if the users are from different parts of the world.
The blog post also mentioned that Meta will be making the model open to the public. But it will only be for non-commercial use.
The social media company has already rolled out a surge of AI models this year. Most of these are for free. The language model Llama is reportedly a serious threat to what Google and OpenAI offer.
Zuckerberg said an open AI ecosystem is beneficial to Meta. He explained that crowd-sourcing consumer-facing tools gives the company an advantage. It’s also more effective for Meta’s social platforms than charging users access to the models.
It’s not all smooth sailing though. Meta is facing some legal questions on the training data consumed to develop its models. The rest of the industry is also facing the same situation.
Popular comedian Sarah Silverman filed a lawsuit against Meta and OpenAI. Silverman and two other authors opened a copyright infringement case against these companies. They claimed their books were then used as training data without permission.
Meta researchers revealed in a research paper that they collated audio training data. They got this from 4 million hours of “raw audio.” These were from a repository of crawled web data that was available to the public. The data was then used for the SeamlessM4T model. The company didn’t mention what repository they used.