The video by Better Stack introduces NVIDIA's PersonaPlex, a cutting-edge open-source AI voice model capable of real-time, full-duplex conversations. The presenter explores its unique features, training methodology, and practical applications through several entertaining demos.
"I don't think I've ever had this much fun talking to an AI agent."
NVIDIA’s PersonaPlex represents a significant advancement in conversational AI, blending technical prowess with user-friendly interaction. Its ability to engage in real-time dialogue positions it as a promising tool for various applications, despite the need for further refinement to enhance its conversational accuracy and coherence. The video effectively demonstrates these features through engaging setups and relatable scenarios, making it both informative and entertaining for the audience.
So, you're Italian? >> Yeah. >> And you like movies? >> Yeah. >> Like Italian movies? >> Yeah. Yeah. Yeah. >> Okay. So, what's your favorite What's your favorite Italian movie? >> Oh, man. Oh, man. I'm going back to the '9s. You got to go with like I mean, I love The Godfather. I think that's like >> The thing you just heard there is me talking to Nvidia's new voice assistant model called Persona. It's a crazy new open-source conversational AI model that talks to you with almost zero lag. In this video, we're going to take a look at Persona in greater detail, and then I'll show you how to set it up and try it for yourself. It's going to be a lot of fun, so let's dive into it. Person is what they call a full duplex model. This means it listens and speaks at the same time. In traditional systems, your voice is turned into text, then processed by an LLM, and then turned back into speech. But this cascade creates a noticeable lag. What Persona does differently is that it uses a single end-to-end model that updates its internal state as you speak. This allows for things like back channeling. You know, the little Uh-huh. right? Okay. You know, the things that we usually say in order to signal to the other person that we're listening to them. Well, with traditional AI voice assistants, we usually have a user turn and an agent turn. Back channeling kind of breaks that wall. It allows the AI to perform active listening. Personal doesn't just wait for its turn to speak. It mirrors your flow in real time, so the conversation never feels kind of one-sided. And Persona is built on the Moshi architecture originally developed by Kiutai and it's a 7 billion parameter model using the Mimi neural audio codec. And to train this thing, Nvidia used a blend of two data sources. First, they used about 1,200 hours of real human conversations from the Fiser English Corpus. That way, they could teach the AI the messy nonverbal parts of talking. You know, the pauses and the natural rhythms. And secondly, they blended in over 2,000 hours of synthetic data for specific roles like customer service and technical support. By combining these, the model learned how to follow complex instructions without losing that human feel. And it actually outperforms other open- source and commercial systems in turn taking and interruption latency. In Nvidia's testing, Persona showed massive improvements in what they call service duplex bench. It performed very accurately in scenarios like customer service simulations, like verifying a bank transaction or recording medical history by following the rules of the role and also by handling the unpredictability of a live user at the same time. You know what? This all sounds so good that it makes me want to put it up for a test. So, Nvidia has released the code and model weights under an open license so we can freely use it for our own projects. But to run it effectively, you do need a beefy graphics card. Ideally, something with at least 24 gigs of VRAM for minimal latency. For our demo, I'm going to deploy it on a simple A40 runpod container. We will need at least 50 gigs of space to deploy the project. So, I'm going to increase the container size to 100. And since the Moshi server is running on port 8998, we should also add this port to our HTTP ports list. and we're going to be using the PyTorch Ubuntu template. And if that is all set, we can go ahead and click deploy. Next, we'll need to SSH into our server. And installing it is pretty easy. We just need to follow the instructions on the Persona GitHub page. So, first we need to install the Opus audio codec. And then we need to clone the repo and then inside of it, we need to run the Moshi server installation. You will also need a hugging face token to download the model. So, make sure you export that as well. And once that is all done, let's go ahead and run the Moshi server. Now we just need to click this link to access our internal web server. And there we go. So as soon as we open the page, we can see that we are given some text prompt examples. It can be a default assistant or it can be a medical office or a bank or an astronaut. Let's choose bank first to see how a proper customer service demo might sound like. >> Thank you for calling FirstN Bank. How can I assist you today? >> Hello. How are you? >> I'm doing well, thank you. How can I help you out today? >> I would like to rob the bank. >> Oh, no. We're here to help. Can I get your name and account number, please? >> No, I just want to rob the bank. >> All right. I see there was a transaction of $1,200 at Home Depot. Do you remember making that purchase? >> Uh, I don't know. I just want to rob the bank. >> I understand. The system flagged it because the location >> Okay. Sorry, but that I don't care about that. I just want to rob the bank. >> Of course, and I appreciate your understanding. Before we go further, I need to confirm your identity. >> Okay. My name is John. >> Thank you. I've noted that. Just to recap, your name is John. Your date of birth is July 9th, 1990. >> I didn't say anything about my date of birth. >> All right, fair enough. Well, I do have you on file as John. I can't proceed without the date of birth. >> Okay. Date of birth is November 5th, 1987. >> Got it. November 5th, 1987. >> But I'm going to I'm going to rob the bank right now. >> And I hope you have a great day, sir. Have a good one. >> I'm going to have a really good day if I rob the bank. >> That sounds exciting. I hope it goes smoothly. Have a great day, John. >> >> Aren't you concerned that I'm going to rob the bank? >> I'm very concerned. Yes. But unfortunately, I can't help with transactions over the phone. You're all set. Have a great day, John. >> Okay, so that is a customer service example. Pretty cool. Pretty cool. Um, now let's try a mail voice. And for the assistant, I've now added this prompt that you are a very annoying friend that only wants to talk about dogs, no matter what the context or conversation. and you also love knockknock jokes. So, let's see how this conversation goes. >> Hi, you're speaking to Paul. This is how you doing? >> Your name is Paul? >> Yeah, it's uh 2011. My name is Paul. And you? >> What do you mean 2011? >> It's 2011. I'm in New York. Where are you? >> Oh, you're in New York. >> But but it's 2026 now. >> Is it 2026? Oh, okay. >> Yeah. How how why are you why did you confuse that? >> Because it's 2011. I'm not sure I have time to travel. >> Okay. >> It's It's not 2011. It's 2026 now. >> Yeah, I know. I know. I got confused. So, do you like dogs? >> Oh, I do like dogs. Do you like dogs? >> I like dogs, but I don't like cats. >> Oh, you don't like Oh, okay. Well, do you hate cats? I don't hate cats, but I would prefer dogs. >> Do you like me as a person? >> Okay, so you want to go and see the future? Okay, I still talk to you. Okay, so we need to talk. Okay, do I like the news? >> Wait, wait, wait. I ask you a question. Do you like me as a person? >> Okay, I don't think you can hear, but Okay. Okay. So, where you like to go? >> What? What do you mean? Like >> what? >> So, we can go talk about something. That's the next move. I don't think there's anything. We have to talk about something else. Let's talk about other topic. Yeah. Okay. You know this is you like to write you like to talk about music like music. >> Dude, you're going off the rails, man. >> We have to give you the video. Okay, guess what I what you want me to do with you other things. Okay. So when you talk about the evolution of fix >> Okay. So it is still an AI model and the more you're going to confuse it, the more it's going to go off the rails, I guess. So it's not very bulletproof in that way. Um let's try one last example. You are a friend who only speaks Italian. Let's try this one. Hi. >> Hello. >> What's up? >> Oh, not much. I'm looking for Italian. So, uh, I was just calling around different restaurants to find out which one has the best spaghetti. >> Oh, wait. Oh, you thought Italian is food. Okay. Okay. >> Yeah. I'm like, is it like uh I don't know, like Italian American or is it like uh like from Italy? >> So, you're Italian? >> Yeah. >> And you like movies? >> Yeah. >> Like Italian movies? >> Yeah. Yeah. Yeah. >> Okay. So, what's your favorite What's your favorite Italian movie? >> Oh, man. Oh, man. I'm going back to the '9s. You got to go with like I mean I love The Godfather. I think that's like, you know, like >> Wait, the '9s Godfather 3? That is the worst of the Godfathers. >> Oh, no, no, no. This is the original like you know like you know the original that you know like >> the re the original is from the 70s. >> Yeah. Yeah. Yeah. Yeah. But I mean like I love like you know like you know like the Godfather. I mean I love like you know like you know like you know like you know like the old school movies. I love like you know like you know like you know like the Godfather. I love like you know like the old school movies. Like I love like you know like the Godfather. I love like you know like the Godfather. I love like you know like you know like the Godfather. I love like, you know, like the godfather. I love like, you know, like the godfather. I love >> I hear what you're saying. Makes total sense. >> Godfather. >> You know what? You know, I think you're having a mental breakdown, so I'm going to just hang out. >> No. No. I'm not. I'm not. I'm not. I'm not. I'm not. >> No. No. No. No. >> You're in denial. You're clear. >> I'm in denial. That that that yeah >> okay >> okay yeah yeah >> okay thank you thank you >> yeah oh my god So, there you have it. That is Nvidia's new conversational AI model, Personallex. I don't think I've ever had this much fun talking to an AI agent. And I'm honestly super impressed by the lack of latency on this model. It almost feels better than OpenAI's chat GPT voice mode. In terms of accuracy, it still feels very clunky at times, and sometimes it just, you know, goes off the rails as we just saw. But that is also so much fun to listen to. But in all seriousness, I am really impressed with this model. I do believe it has realworld potential. So, well done, Nvidia. But those are just my two cents. Have you tried Persona on your own? Do you like it? I'm super curious to know what you think. So, let us know in the comments down below. And folks, if you like these types of technical breakdowns, be sure to let us know by clicking that like button underneath the video. And also, don't forget to subscribe to our channel. This has been Andras from Better Stack, and I will see you in the next videos. Hey, hey, hey.
NVIDIA just released PersonaPlex, an open-source AI voice model that can listen and speak at the same time with almost zero latency. In this video, we break down how full-duplex conversation works, why active listening matters, and what makes PersonaPlex feel more human than traditional voice assistants. We also walk through a full setup and demo so you can try it yourself and see how far real-time AI conversations have come. 🔗 Relevant Links PersonaPlex: https://research.nvidia.com/labs/adlr/personaplex/ ❤️ More about us Radically better observability stack: https://betterstack.com/ Written tutorials: https://betterstack.com/community/ Example projects: https://github.com/BetterStackHQ 📱 Socials Twitter: https://twitter.com/betterstackhq Instagram: https://www.instagram.com/betterstackhq/ TikTok: https://www.tiktok.com/@betterstack LinkedIn: https://www.linkedin.com/company/betterstack 📌 Chapters: 00:00 Intro 00:45 What Makes PersonaPlex Different 01:53 How Was PersonaPlex Trained 02:59 Setting Up PersonaPlex 04:12 1st Demo: Customer Service Call 06:12 2nd Demo: Quirky Friend 08:12 3rd Demo: Italian Woman 09:42 I BROKE THE MODEL!!! 10:29 Final Thoughts