Sunday, July 7, 2024
HomeTechnologyMicrosoft's new AI model to synthesize voice

Microsoft’s new AI model to synthesize voice

Microsoft’s new AI creation, VALL-E AI Model can clone your voice from a three-second audio clip

VALL-E AI Model

Microsoft, recently added a new AI creation to their bucket list of technologies. VALL-E AI, is a model that can synthesize a voice from a very short clip and can recreate it. However, text-to-speech models usually require significantly longer training samples.

Microsoft and its new innovations strike in technology. Microsoft recently swooped in the world of artificial intelligence (AI). The new innovation in AI is a transformer-based text-to-speech model. This model can recreate any voice from a three-second sample clip and is named as VALL-E AI.

According to cybersecurity experts this AI creation requires more protection as this can be used in phishing. Apart from this, it can be misused for collecting information.

Model Overview

VALL-E creates a natural-sounding synthetic voice as compared to other models by preserving the pitch, charisma, and style of the original voice. These can then be directed as needed when writing the text-to-speech script. Moreover, the feature requires less training time to generate a new voice.

VALL-E can record any voice whether on phone call, in person or on podcast within 3 seconds of time. Further model can synthesise the voice recorded into a sentence.

Performance has improved over previous synthetic voice models to such a point that it would be difficult to tell whether you were hearing a real or fake voice, Microsoft says.

VALL-E can be used for Gaming

The code for VALL-E is not currently available to the public. However, only sample audio files have been published that were produced using the tool. It also isn’t clear when or if Microsoft plans to make VALL-E available as a public access or commercial tool.

Joshua Kaiser, CEO of AI company Tovie.ai, told Tech Monitor that the model has been designed in such a way that it allows users to do a lot more with a lot less data, which is crucial for organisations that try to create speech synthesis that don’t have enough data for better performance. “We think this will benefit a lot of industries – from retail to fintech to gaming – that are already embracing voice interfaces, by making the whole process more accessible,” he says.

The biggest benefit from VALL-E is its potential scale, says Arun Chandrasekaran, distinguished VP analyst at Gartner. It can be effective in “zero-shot” or “few-shot” scenarios where little domain-specific training data is available. “In addition, if these models can be delivered as a cloud service, they can reduce time/effort required to get the models up and running in contrast to classic approaches,” Chandrasekaran says.

Risk Factors of VALL-E

Spoofing could include allowing a cybercriminal to gain access to banks or secure systems that use a voice print as a password. However, many banks or high-end secure systems follow proper mechanism to detect whether it is a live or recorded voice.

It could also be used in a phishing scam to take a short sample of a voice from a phone call, then use that sample to create a new voice model that could make it easier to convince someone to part with a password, perhaps by spoofing a finance manager at a company.

Notably, considering all pros and cons of this model, Microsoft is still pending with its final statement.

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

- Advertisment -

Most Popular

Antalya escort Antalya escort Belek escort
Antalya escort Antalya escort Belek escort
porn