WellSaid - Creating New Voices With AI
Their recent round of Series A funding led by FUSE, was oversubscribed with interest due to their growth of revenue in the past couple of years and increasing customer demand
Generating synthesized human-like voices purely from a computer is a task that many have been trying to achieve in recent years and making them more lifelike has certainly become a true challenge. The start-up WellSaid was founded in 2018 by Microsoft cofounder Paul Allen as a research project at the Allen Institute of Artificial Intelligence. The goal of the company is to generate lifelike voices for use within different industries, and Michael Petrochuck -CTO- led the research and development team to build the crucial AI that would make this become a reality.
With their recent round of Series A funding led by FUSE, and others participating such as a previous investor Voyager, and subsequently due to their growth of revenue in the past couple of years and increasing customer demand, this round of funding was oversubscribed with interest. WellSaid Labs got funded $10 million from this series and is planning to use this round to support the research and development of the project and will assist in expanding its team, says CEO Matt Hocking. This brings their total funding up to $12 million. Across the board, startups that are doing this type of work creating artificial people and lifelike virtual entities have raised around $320 million in venture capital up until now.
“Plain and simple, WellSaid is the future of content creation for voice,” said Cameron Borumand, General Partner at FUSE. “This is why thousands of customers love using the product daily with off-the-charts bottom-up adoption. Matt and Michael have assembled a world-class team and we couldn’t be more thrilled to be a part of the WellSaid journey.”
WellSaid Labs is already powering through any challenges and are becoming the leading company in generating AI powered human-like voices, and it’s comparable to similar projects from DeepDub, Resemble AI and Synthesia. One thing that is worrying about the result of these projects is it could lead to some unfortunate happenings that have already begun through deepfaking and may end up with scammers and the like using it for criminal activities. Matt Hocking has already said they don’t create imitation voices without consulting those actors beforehand and subscribes to the “Hippocratic Oath for AI” proposed by Microsoft executives Brad Smith and Harry Shum.
Text-to-speech has been around for decades, but with AI, product developers and creators will be able to personalise their brand’s experiences and increase their potential when dealing with clients or adverts. There is a variety of accents and languages that can be chosen, and they will have the ability to have them read from a specific script and enable them to chose different gender and production types. WellSaid lets users share projects with their team member to get an all-round view of how it’s going and will enable them to create voiceovers or whatever they please in just a few hours. The ability to pause, edit, try out different voices and styles will allow companies to find what is right for them, as well as being able to pronounce things how they wish and use unique spellings or slang.
“We’ve added AI Voice to the toolkit of thousands of content creators and their teams,” said Matt Hocking, CEO of WellSaid Labs. “Our human-parity AI voice can be produced faster than real-time and updated on-demand. Opening up new and exciting opportunities to ‘add voice’ where never before perceived possible. AI voice easily ensures every production can be created and updated efficiently at scale.”
The company’s mission is to create the most intelligent and lifelike TTS service available for brands and businesses, and they are well on their way to making that a reality. Over two years WellSaid have been improving the natural elements of the voices and are aiming for “human parity” says Hocking. They conducted a study in 2019 and participants were asked to listen to random recordings created by WellSaid, and human voice actors and then rank them based on quality from 1-5, 5 being highest. The average rating for the voice actors ended up at 4.5, with the WellSaid’s synthesized voices coming in very close at around 4.2. This means the company is truly making leaps and bounds to the quality of their project and will soon be up to par with actual human voices.
The company that’s based in Seattle, Washington is currently made up of just 12 employees, and will hopefully be increasing that number once the funding money has gone through. The team are currently working on improvement of the platform’s handling of various lengths of text and different styles, and generating voices faster, they can currently create a 10 second audio file in around 4 seconds. The software is being used already by enterprises for corporate content and training for employees and will be used across more and more platforms such as radio and advertising in upcoming times.
Keep up-to-date with the latest tech industry insights, trends as well as information technologies, app development, and small business content with the Proteams Blog
Follow us on LinkedIn for updates on the latest tech news here