OpenAI is working on perfecting voice cloning technology amidst the rise of deepfakes, emphasizing responsible use of the tool.
OpenAI has unveiled its latest creation, the Voice Engine, which is an enhancement of its existing text-to-speech API. The Voice Engine allows users to upload a 15-second voice sample to create a synthetic clone of that voice. However, the release date for public access remains unknown as OpenAI wants to ensure responsible deployment of the technology.
Jeff Harris from OpenAI emphasized the importance of understanding the potential risks associated with the technology and implementing safeguards to mitigate them.
Model Training Process
The AI model behind the Voice Engine has been quietly in use for some time. It powers the voice capabilities in ChatGPT and Spotify has utilized it to dub podcasts in various languages.
Regarding the training data used for the model, Harris mentioned that it was a combination of licensed and publicly available information. Training data for such models is typically sourced from a wide range of public datasets, although specific details are often closely guarded due to competitive and legal reasons.
OpenAI has faced legal challenges over claims of using copyrighted material without proper accreditation. The company has agreements with certain content providers for data usage and allows artists to opt-out of having their work included in training sets.
Voice Synthesis Process
Surprisingly, Voice Engine is not tailored or fine-tuned on individual user data due to the ephemeral nature of the model, which uses a combination of a diffusion process and a transformer for speech generation.
Despite other companies offering similar voice cloning products, OpenAI asserts that its approach delivers superior speech quality. The pricing for Voice Engine is set competitively, with different options available depending on the quality desired.
Voice Engine lacks customization controls for adjusting voice characteristics, but expressions from the original sample are retained in subsequent synthetic voices.
Impact on Voice Talent
OpenAI’s tool could potentially disrupt the voice actor industry with its cost-effective synthetic voice generation. Voice actors are already facing challenges as clients seek to utilize AI-generated voices that could potentially replace human talent.
Some platforms are attempting to find a balance by collaborating with unions and compensating original creators for voice usage. However, OpenAI currently does not have such arrangements in place but requires explicit consent for voice cloning.
Ethical Considerations and Misuse Prevention
Voice cloning technology poses ethical concerns and has been misused for malicious purposes. OpenAI is taking steps to prevent misuse by limiting access to Voice Engine during its early stages and watermarking generated content for identification.
OpenAI also plans to involve its network of experts to assess potential risks associated with Voice Engine misuse. While some experts advocate for more comprehensive measures, OpenAI prioritizes safe technology deployment as its primary focus.
Future Release Plans
Depending on the feedback from the preview phase, OpenAI may expand access to the Voice Engine in the future. The company is exploring security measures, such as user verification through text prompts, to ensure responsible usage of the tool.
OpenAI’s ultimate goal is to distinguish between artificial voices and human voices effectively, maintaining clarity in the use of synthetic voice technology.