Voice Technology and Conversational AI

Transcript

Marsh Naidoo (00:14):

Hi guys. Welcome to the Raising Kellan podcast. My name is Marsh and I blog at raisingkellen.org where we curate resources for parents raising children with developmental delay and or disabilities. It is a brand new year and in 2022 we remain focused on providing you more information on technology and digital accessibility. And in today's episode, we chat with Bradley Met CEO of Project Voice, who is going to tell us more about voice technology and conversational AI. Guys as always remember that the information provided on this podcast is educational. So grab your cup of coffee, relax, and get ready for some awesome conversation.

Marsh Naidoo (01:16):

Bradley, Can you tell us what exactly is Project Voice and how that ties in with voice technology and conversational AI?

Bradley Metrock (01:27):

Sure. And yeah, thank you for having me. Honored to be a guest on the Raising Kellan podcast. Appreciate that. My name is Bradley Metrock. I'm CEO of Project Voice and a lot of the work we do serves to accelerate the growth and adoption of voice tech. The future involves talking to technology. It involves using our voice. And it doesn't mean the Qwerty keyboard's gonna go away, although somebody just made that up. If it went away, it wouldn't be that bad, right? Our voice is what makes us human. Being able to access technology through our voice makes the world more accessible. We'll talk all about that. Our company has median content arm with some podcasts, a well-read Substack newsletter and event series. It's both virtual and physical. My company has a consulting arm which works with companies that are born native to voice and AI to help them grow, as well as companies that are mid-sized to large, that are looking to integrate voice and AI somehow to help them do that in a more efficient and cost effective manner. And now the most recent component of what I do is a 20 million venture capital fund, which I'm one of two general partners on which invest into the space. So a lot. We have our hands in a lot of different areas involved with people talking to computers and technology with what we call voice and AI.

Marsh Naidoo (03:04):

Bradley. For the person like me that might not know much about voice technology, can you just inform me about some of the applications that voice technology might have in my everyday life?

Bradley Metrock (03:20):

Really ever since Alexa came on the scene, there was stuff before that, but Alexa's arrival really in 2016, heading into 2017. It opened a lot of doors because now if you have a question, you're looking for a fact, you're looking for information about a business, you're looking for any number of information, you're able to speak to a computer that's sitting there as opposed to typing it in or touch type and swipe on your phone. And just from an immediate point of view, people who are vision impaired or people who are blind, were able to take advantage of that rather quickly. As well as a lot of other folks as well. That's just a mainstream voice assistant. Amazons Alexa. And just a very specific example, there's some other ones that involve Alexa, for example, you can go up to an echo show and the device with a front facing camera and you can say, Alexa, what am I holding?

(04:42):

And if you're holding an item with a barcode, like a water bottle, it'll say, Show me the barcode and it'll scan it. If it's something Amazon has that you can reorder it if that's what you're trying to do. But it'll tell you what you're holding. If it doesn't have a barcode, it will try to scan it machine learning style and tell you what you're holding even without a barcode, either way. Again, for people with vision issues, it's monumental because they might be holding something and they can't read it. Either they're blind or their vision appear, they can't read it. And these folks, as people with accessibility related challenges of different types are really, really hesitant to impose voice integrated into our technology, enables people to be more empowered. And I've only scratched the surface within every type of application and implementation. Every industry, every vertical, hotels, travel, banking, publishing, healthcare, automotive, the ability gaming, I could go on.

(06:05):

The layering of being able to speak as part of how we interact with technology in that context changes everything and many things. It starts with core accessibility use cases, and it extends outward to rapidly help a lot of other people too. And a good example of that is that I'm a lifelong gamer. I've been a gamer for almost my whole life. My dad introduced me to arcade games at early age, and the rest is history. And fortunately, unfortunately, I don't have, well, fortunately or unfortunately, I'm not sure, I don't have that much time these days just a little bit. But when I play a game, I have subtitles on for dialogue <affirmative>. Now that's very helpful to me because it just, my vision's okay, but it just,and my hearing's okay, but it just helps me and I like it. But nobody invented that For me, that was invented for people who really needed it.

Marsh Naidoo (07:13):

That's the beauty of universal design, isn't it?

Bradley Metrock (07:17):

Yeah, for sure. So that's a little bit of a taste of the evolution that's going on right now is multimodal interaction with technology, which in inherently involves our voice is opening a lot of doors. And we can talk about individual ones or we can take it any direction you want to go in, but that's a fact and it's great to see.

Marsh Naidoo (07:48):

I'm very interested about the two main domains I'm interested in, and I would like to talk more with you about is the field of education and when it comes to accessibility and figuring out how our kids might necessarily learn, are they auditory learners, which they well could be, and how they could use their voice tech to learn. What are your thoughts as far as that domain is concerned, Bradley?

Bradley Metrock (08:17):

Sure. So yeah, no, it, it's a vast blue ocean is what it is. And the reason we haven't seen more progress there already is directly related to the fact that the ubiquity of voice-enabled devices and voice technology has come at the hands of big tech. If it didn't come at the hands of big tech and it was more smaller private companies, smaller companies, whether they're private or public, you wouldn't be seeing the conversations about privacy that you see Privacy for defenseless children is something that is obviously extremely serious. And so that's, that sort of slowed the pace down and I think that there's nothing wrong with that. But where are we going? So there's a couple of things that sort jump out. First of all, I, I'll use a healthcare application to shed some light on what'll happen in education. So in healthcare, there are companies that have already figured out that your voice can be analyzed not from the words out of your mouth and your vocal D and vocabulary rather your intonation and the characteristics of your vocal output.

(10:05):

And on the basis of the sound of your voice, anywhere between five to 10 different diseases can be diagnosed with that information alone. Now, and these are things that ranging from depression all the way up to predisposition for Parkinson's and Alzheimer's, okay? So the reason that the work that remains to be done is shrinking the confidence interval to where it meets, you know, FDA and regulatory guidelines. Cuz right now it's a wide band of potential false positives or negatives and it has to be refined. But that's an important realization is that voice technology extends well beyond what we're saying into output of our characteristics of our voice that we can't hide. So that brings us over into the educational sphere where you can just imagine a child speaking to his or her computer or some device and going through a reading exercise. One of the earliest things about smart speakers that was discovered the Amazon Echo and then later Google's devices, is that when you give them to children who have speech impediments, as I did growing up, I had to go to speech therapy once a week when I was in third grade.

(11:46):

When you give these to children who have speech problems. These devices and the devices don't care if you have trouble pronouncing something, they're just gonna flatly tell you, I didn't understand what you said. So that creates this feedback loop where you say it, you say it again, you say it again, and you say it again and you're continuously improving yourself on a drastically reduced time cycle from what a normal speech pathologist would do. And it's really impactful. Furthermore, there's no judgment involved. It's just a device. So the child is able to say it and do it and do it and do it. And there's no adult sitting there saying, Well, what's wrong with you? Why didn't you do this faster?

(12:46):

Or whatever, or might be impugned from that interaction. So we've already discovered that. Now you can imagine if that is happening with a child and the technology detects, Oh wait a minute, you're not sounding quite the same that you've sounded the last 92 times. We're detecting you're a little depressed, we're detecting that something's wrong with you. And that can initiate a conversation, Hey, is everything okay? Well no, it's not. Okay, well what's the matter? And it is just a totally different paradigm for how children receive information in an educational context when you start to think about technology doing these sorts of things and having this much context on a child. And yes, there's huge, there's massive privacy concerns, but assuming that those can be resolved, the power of this technology to improve education at a, I'm really thinking here, K through eight type of level, even K through 12 is quite significant

Marsh Naidoo (14:07):

For sure. Bradley, I mean, the whole issue of trust, I mean, is one that will definitely, the ethics of the trust component will definitely need to be addressed. But your point with regards to the practice elements, being there with the assistive technology and the ai, but also taking into account that if there is a physical impairment where you may have limited fine motor coordination and dexterity, just the effort and in terms of being an energy saver, to be able to use your voice to send out your emails or your text messages. I spoke with Liz Persaud on our last podcast, episode 56, and it was amazing to hear the way she implemented voice technology to help her navigate her through her day. That's definitely exciting and interesting work coming along to help with digital accessibility. Any other applications or domains that you are interested in? Bradley?

Bradley Metrock (15:23):

I mean, how much time you got <laugh>? No. Yeah, I mean there's a lot I, I'll give you a few more. So audio books. So what has been considered an audiobook and what will be considered an audiobook in the future are two very different things. Right now. An audiobook is a human being gets a physical book, reads it in its entirety, in a method that's recorded by technology and then that recording is distributed. If you didn't, we're already seeing there's problems with that, first of all, you know, have to be able to do that physically. The second thing is you have to be able to set enough time, which is substantial amount of time usually to read that. Third thing is if there's any problems that are discovered after the fact with the recording, then you have to go back. And for people with busy schedules, celebrities who are often doing these narrations, it really complicates things.

(16:45):

And fourth of all, if there's additional content that you want to add at any additional point in time, you've got a complex situation of trying to go back. Do you get the original narrator? Do you get a new one? You've got a huge amount of logistical challenges presented with how audiobooks are done today much. And that's irrespective of the distribution channels and other things that are relatively limited as well. When you bring voice technology into the mix, First of all, we're already seeing the use of synthetic voices. One of the more common situations is where you've got a celebrity who is going to read the book, maybe they wrote the book, and so it's gonna be in their voice and they just don't have the time to do it. And the time to record enough audio to produce a synthetic voice of that person is about 15 minutes to do a voice that it would be very tough for you to tell that's not that person. Okay? Because synthetic voices, people think, Oh, it's computer, it's computery, it doesn't sound right. It'd be annoying to listen to a book. No, no, no. We're way past that.

(18:11):

At the point now where these voices sound virtually indistinguishable to where 95% of people would get it wrong. If I said, which one of these five voices is the fake one? You'd get it wrong every time. So the economics of audiobooks are changing. And that's exciting for a particular reason. I've always been interested in publishing just because I love the idea that for really the first time in human history, the gatekeepers to telling our stories are minimal, if not nonexistent. So most of the time on this planet there have been singular gatekeepers where if they didn't like you didn't like the message that you're trying to say any number of things, you're not saying it, yeah, you're just not publishing it. That's the end of, for you. And really for the first time, the last, you'd say back to Gutenberg, but that's not really accurate really for the last 25 years, 30 years, has it really been accurate that if you wanted to produce something largely independent of your socioeconomic station that you could, and the changes that are underway with audiobooks and what voice and conversational AI brings to that domain just opens up.

(19:42):

It just explodes the opportunity for people to tell stories. And I'll give you an example. There is a woman named Devar Arlan who would make a great guest for your show, by the way, she is the executive producer of audio for National Geographic. But in addition to that, because that's not enough for her, apparently she has a startup she has created called I Vow ai. And what I vow ivo w.ai does is they use voice technology and voice and AI to preserve and share cultural heritage. And it's really fascinating. And the first project that they've done is a storytelling type of project that's a little bit different. What they do is they get people from different cultures, whether it's Ellucian, Eskimos or different types of folks to produce a recipe of a dish that means something to them and their culture. And they're recorded cooking it, and while they're cooking it, they're telling stories about it.

(21:17):

So they're telling stories about their culture and about this dish and what it means to their culture. This dish, while they're making this dish. And it's just another form of storytelling. And the voice layer provides some interesting functionality to that. That's something that wasn't possible 50 years ago, if not sooner. And through the work that she's doing, these cultures will be honored in a way that they've never been honored before. And that's just one microscopic example, a dot on the sun of what I'm talking about here. And it's exciting to me that this technology is opening these types of doors and audiobooks and this type of storytelling and narrative component to all of this is just a fraction of the whole story too. But everywhere you turn the bottom line, it's pretty simple. Everywhere you turn with technology and how we live our lives, whether it's, and think of anything you do, think about booking a hotel, think about how you think about booking a hotel. Think about your calendar, think about how you drive an automobile, think about how you determine where you're gonna get healthcare. Think about how you pay for that healthcare. Think about how you interact with your children. Think about how your children interact with adults. Everything is different with this technology and it's all at its own pace. The cadences are distinct and unique, but the fact is that this evolution is well underway and it's really interesting to see.

Marsh Naidoo (23:09):

So I'm gonna circle back to Alexa Bradley, your book, more than just the weather and music, 200 ways to use Alexa. Yeah. What are your top three choices,

Bradley Metrock (23:25):

So right now we're working on a revision of the book. Version two we expect will come out this Christmas, which will be exciting cuz some of the skills, so the name of an experience for Alexa and Alexa Skill, that's an app for the Alexa platform. Some of them are gone and some of them are still with us. It's been interesting to see how that's, that's happened. I like some of the games. As I mentioned, I'm a lifelong gamer. There's a big role playing game that came out about a decade ago called Skyrim that any gamer will have heard of. There is a voice only version of Skyrim that they came out with, the studio Bethesda came out with. And when they did it, people thought, Oh, this is a joke. There's no way they did this. They're just making a joke. Cause I think they came out, they announced on April Fool's Day, but it wasn't a joke, which was even funnier. And they created this whole separate Alexa version of the game. And it's really interesting to see how they did that. There's several other really good games. One is called Finder, which is the biggest by man hours production of any Alexa skill to date.

(24:51):

Great game for people who wanna play something like that. There's a good sort-lighter game to play called Question of the Day that a ton of people play and it's a lot of fun. You just say, Alexa play open question of the day. But then there's some stuff that's a bit more useful, you know, can do drop in calling, which is an accessibility-minded feature. So what drop in calling is, and that's a first party ability, which we talk about in the book. So there's first party abilities in Amazon themselves have implemented. And then there's third party skills that third parties have implemented well. So the drop in calling enables you to set up a device in a business or in a home or in a senior living facility or wherever, where typically if you call somebody, they have to make an affirmative decision to answer your call.

(25:50):

A drop in call is, they don't have to make that decision, you just drop on in. And that saved a lot of lives already with drop in calling from adults to senior citizens as well as some other context as well where you can just drop in on somebody. You can just check on them whenever you feel like doing that. And I really like that feature as well. I mentioned Alexa, what am I holding? That's one uncommonly referred to that I really think is value added. That's a reason why people buy these devices alone. There's been interesting implementations from PayPal on the FinTech side, PayPal and MasterCard have done some interesting things with Alexa skills. I could go on and on, but there's a lot of good things that have been done with the Alexa platform already and no reason to think that's not gonna continue.

Marsh Naidoo (26:49):

I would like to close off Bradley with the future of ai and what has you most excited about ai?

Bradley Metrock (27:00):

It's aligned with our pandemic way of thinking. Along came the pandemic and along came a lot of problems. But in addition to all of those problems came some opportunities. And one of these opportunities was to revisit our, and renegotiate our relationship with work and what does that look like? How much time do we give? How do we strike a better and more appropriate balance for some people that will revert right back to the way it was pre pandemic. But for fortunate people, people who really make it a habit and really are conscientious about it, some permanent changes can be affected and to improve lives and have greater balance and a life more worth living. Ai, that's my hope for AI is that it brings a similar sort of adjustment to our culture and our society where far privacy issues are always gonna be with us.

(28:12):

I just sort of assume those away because if you, all privacy issues really mean is I don't see enough value out of what you're doing to give you information. Everybody's fine giving up information, it's just what do we get for it? But with ai, I'm hoping that it makes our lives more efficient, it makes our lives easier, everybody, the whole spectrum, especially from an accessibility related point of view. If everybody's got challenges, no matter what your challenges are, hopefully AI rises up to meet you and makes your life better and easier. And that promise is why I'm excited about it and it's what I'm most looking forward to seeing.

Marsh Naidoo (29:05):

Bradley, thank you so much for your time and informing us more about voice technology today. Are there any closing remarks? And I'm hoping that you are gonna touch on the upcoming conference you guys have in Chattanooga, so take it away.

Bradley Metrock (29:24):

Sure. So yeah, we do a lot of events. We've got a gaming event taking place in Austin, Texas on January 19th. That'll be smaller in scale for sure. Just a little gathering of 50, 75 people. But Project Voice, our name Project Voice is our namesake event takes place every year. The last time we did at Amazon, Google, Microsoft and Samsung, were all presenting sponsors of it. It takes place in Chattanooga, Tennessee, which interestingly is home of the fastest internet in the United States and a quietly burgeoning tech scene in addition to being a beautiful place. And as we have discovered, people like to get off the beaten path a little bit from time to time. We do stuff in New York, we have an event we do at Harvard Medical School each year. We're out in Silicon Valley. But Chattanooga gives a little bit different of an opportunity. And if anything that's been said here on this Podcasts of Interest, this is an event that you may very well wanna attend. It's April 25th through the 28th at the Chattanooga Convention Center and somewhere around 500, 700, 7 50, very senior level executive types either working with this technology now are looking to immediately implement Voice and I solutions and their operations and business will be there. And so for those who wanna learn more about that, they can go to www.projectvoice.ai.

Marsh Naidoo (30:54):

Bradley, thank you so much for spending some of your morning with us and you have the best day

Bradley Metrock (31:00):

Marsh. Thank you. It's honor. I appreciate being asked to join you and thank you for having me.

Marsh Naidoo (31:07):

As always. Thank you for your time and thank you for listening to the Raising Killen podcast. If there are any topics that you would like us to investigate or research, you can contact us at raisingkellan@gmail.com. Please, we would appreciate a review on your podcast provider and to stay in contact, we can be found on facebook@raisinglen.org or on Instagram at raising_kellan We also have a YouTube channel where these podcasts are put down on a audiogram format if a podcast player is not available in your geographic location. Until we see you guys the next time, take care. And as always remember, get to the top of your mountain. This is Marsh Naidoo signing off.

Previous
Previous

Entrepreneurship, innovation and the social determinants of pediatric healthcare.

Next
Next

Liz Persaud: Assistive Technology Professional