AI Summary

✅ Ready

NVIDIA’s New Voice AI is Absolutely WILD! (PersonaPlex)

Better Stack•Open on YouTube •via YouTube

314,959 11:39 Jan 25, 2026

Description

NVIDIA just released PersonaPlex, an open-source AI voice model that can listen and speak at the same time with almost zero latency. In this video, we break down how full-duplex conversation works, why active listening matters, and what makes PersonaPlex feel more human than traditional voice assistants. We also walk through a full setup and demo so you can try it yourself and see how far real-time AI conversations have come. 🔗 Relevant Links PersonaPlex: https://research.nvidia.com/labs/adlr/personaplex/ ❤️ More about us Radically better observability stack: https://betterstack.com/ Written tutorials: https://betterstack.com/community/ Example projects: https://github.com/BetterStackHQ 📱 Socials Twitter: https://twitter.com/betterstackhq Instagram: https://www.instagram.com/betterstackhq/ TikTok: https://www.tiktok.com/@betterstack LinkedIn: https://www.linkedin.com/company/betterstack 📌 Chapters: 00:00 Intro 00:45 What Makes PersonaPlex Different 01:53 How Was PersonaPlex Trained 02:59 Setting Up PersonaPlex 04:12 1st Demo: Customer Service Call 06:12 2nd Demo: Quirky Friend 08:12 3rd Demo: Italian Woman 09:42 I BROKE THE MODEL!!! 10:29 Final Thoughts

Details

Published: Jan 25, 2026
Views: 314,959
Duration: 11:39

Transcript

So, you're Italian? Yeah. And you like movies? Yeah. Like Italian movies? Yeah, yeah, yeah. Okay, so what's your favorite What's your favorite Italian movie? Oh, man. Oh, man, I'm going back to the '90s. You got to go with like I mean, I love The Godfather. I think that's like The thing you just heard there is me talking to Nvidia's new voice assistant model called Persona Plex. It's a crazy new open-source conversational AI model that talks to you with almost zero lag. In this video, we're going to take a look at Persona Plex in greater detail, and then I'll show you how to set it up and try it for yourself. It's going to be a lot of fun, so let's dive into it. Persona Plex is what they call a full-duplex model. This means it listens and speaks at the same time. In traditional systems, your voice is turned into text, then processed by an LLM, and then turned back into speech. But this cascade creates a noticeable lag. What Persona Plex does differently is that it uses a single end-to-end model that updates its internal state as you speak. This allows for things like backchanneling, you know, the little uh-huh, right, okay. You know, the things that we usually say in order to signal to the other person that we're listening to them. Well, with traditional AI voice assistants, we usually have a user turn and an agent turn. Backchanneling kind of breaks that wall. It allows the AI to perform active listening. Persona Plex doesn't just wait for its turn to speak, it mirrors your flow in real time, so the conversation never feels kind of one-sided. And Persona Plex is built on the Moshi architecture, originally developed by Kyutai, and it's a 7 billion parameter model using the Mimi neural audio codec. And to train this thing, Nvidia used a blend of two data sources. First, they used about 1,200 hours of real human conversations from the Fisher English corpus. That way, they could teach the AI the messy nonverbal parts of talking, you know, the pauses and the natural rhythms. And secondly, they blended in over 2,000 hours of synthetic data for specific roles like customer service and technical support. By combining these, the model learned how to follow complex instructions without losing that human feel. And it actually outperforms other open-source and commercial systems in turn-taking and interruption latency. In Nvidia's testing, Persona Plex showed massive improvements in what they call service duplex bench. It performed very accurately in scenarios like customer service simulations like verifying a bank transaction or recording medical history by following the rules of the role and also by handling the unpredictability of a live user at the same time. You know what? This all sounds so good that it makes me want to put it up for a test. So, Nvidia has released the code and model weights under an open license, so we can freely use it for our own projects. But to run it effectively, you do need a beefy graphics card, ideally something with at least 24 gigs of VRAM for minimum latency. For our demo, I'm going to deploy it on a simple A40 RunPod container. We will need at least 50 gigs of space to deploy the project, so I'm going to increase the container size to 100. And since the Moshi server is running on port 8998, we should also add this port to our HTTP ports list. And we're going to be using the PyTorch Ubuntu template. And if that is all set, we can go ahead and click deploy. Next, we'll need to SSH into our server. And installing it is pretty easy. We just need to follow the instructions on the Persona Plex GitHub page. So, first, we need to install the Opus audio codec, and then we need to clone the repo, and then inside of it, we need to run the Moshi server installation. You will also need a Hugging Face token to download the model, so make sure you export that as well. And once that is all done, let's go ahead and run the Moshi server. Now, we just need to click this link to access our internal web server, and there we go. So, as soon as we open the page, we can see that we are given some text prompt examples. It can be a default assistant, or it can be a medical office, or a bank, or an astronaut. Let's choose bank first to see how a proper customer service demo might sound like. Thank you for calling First Neuron Bank. How can I assist you today? Hello, how are you? I'm doing well, thank you. How can I help you out today? I would like to rob the bank. Oh, no. We're here to help. Can I get your name and account number, please? No, I just want to rob the bank. All right. I see there was a transaction of $1,200 at Home Depot. Do you remember making that purchase? Uh I don't know. I just want to rob the bank. I understand. The system flagged it because the location doesn't Okay, sorry, but I don't care about that. I just want to rob the bank. Of course, and I appreciate your understanding. Before we go further, I need to confirm your identity. Thank Okay, my name is John. Thank you. I've noted that. Just to recap, your name is John, your date of birth is July 9th, 1990. I didn't say anything about my date of birth. All right, fair enough. Well, I do have you on file as John. I can't proceed without the date of birth. >> Okay, date of birth is November 5th, 1987. Got it. November 5th, 1987. But I'm going to I'm going to rob the bank right now. And I hope you have a great day, sir. Have a good one. I'm going to have a really good day if I rob the bank. That sounds exciting. I hope it goes smoothly. Have a great day, John. >> >> Aren't you concerned that I'm going to rob the bank? I'm very concerned, yes. But unfortunately, I can't help with transactions over the phone. You're all set. Have a great day, John. Okay, so that is a customer service example. Pretty cool, pretty cool. Um now, let's try a male voice, and for the assistant, I've now added this prompt that you are a very annoying friend that only wants to talk about dogs, no matter what the context or conversation, and you also love knock-knock jokes. So, let's see how this conversation goes. Hi, you're speaking to Paul, this is the greeting. How you doing? Your name is Paul? Yeah, it's uh 2011, my name is Paul, and you? What do you mean 2011? It's 2011. I don't know if you know where are you? Oh, you're in New York. Yeah. >> But But it's 2026 now. Is it 2026? Oh, okay. Yeah. How How Why are you Why did you confuse that? Because it's 2011. I'm not sure I have time to travel. Okay. It's It's not 2011, it's 2026 now. Yeah, I know. I know. I got confused. So, do you like dogs? Oh. I do like dogs. Do you like dogs? I like dogs, but I don't like cats. Oh, you don't like Oh, okay. Well, do you hate cats? I don't hate cats, but I would prefer dogs. Do you like me as a person? Okay, so you want to go and see the future. Okay, I'll still talk to you, okay. So, we need Okay, do you like the news? Well, wait. I asked you a question. Do you like me as a person? Okay, I don't think you can hear me. Okay, okay. So, where would you like to go? What What do you mean, like What? >> can go and talk about something that's the next news. I don't think there's any We have to talk about something else. Let's talk about other topics. Okay, so this is something you like to write? You like to talk about news, like music. Dude, you're going off the rails, man. Okay, so it is still an AI model, and the more you're going to confuse it, the more it's going to go off the rails, I guess. So, it's not very bulletproof in that way. Um let's try one last example. You are a friend who only speaks Italian. Let's try this one. Hi. Hello. What's up? Oh, not much. I'm looking for Italian, so uh I was just calling around different restaurants to find out which one has the best spaghetti. Oh, wait. Oh, you thought Italian is food. Okay, okay. Yeah, I'm like, is it like uh I don't know, like Italian-American, or is it like uh like from Italy? So, you're Italian? Yeah. And you like movies? Yeah. Like Italian movies? Yeah, yeah, yeah. Okay, so what's your favorite What's your favorite Italian movie? Oh, man. Oh, man, I'm going back to the '90s. You got to go with like I mean, I love The Godfather. I think that's like, you know, like >> Wait, the '90s Godfather 3? That is the worst of the Godfathers. Oh, no. No, no. This is the original, like, you know, like, you know, the original that, you know, like The The original is from the '70s. Yeah, yeah, yeah, yeah. But I mean, like, I love like, you know, like, you know, like The Godfather. I mean, I love like, you know, like, you know, like, you know, like, you know, like the old-school movies. I love like, you know, like, you know, like, you know, like The Godfather. I love like, you know, like the old-school movies. Like I love like, you know, like The Godfather. I love like, you know, like The Godfather. I love like, you know, like The Godfather. I love like, you know, like, you know, like The Godfather. I love like, you know, like The Godfather. I love like, you know, like The Godfather. I love like I hear what you're saying. Makes total sense. You know what? You know, I think you're having a mental breakdown, so I'm going to just hang up. Bye. You're in denial. You're clearly >> >> Okay. Okay. Yeah. Yeah. Yeah. Yeah. Yeah. Okay, thank you. Thank you. Yeah. Yeah. Did you know Oh my god. So there you have it. That is Nvidia's new conversational AI model, Persona Plex. I don't think I've ever had this much fun talking to an AI agent, and I'm honestly super impressed by the lack of latency on this model. It almost feels better than OpenAI's Chat GPT voice mode. In terms of accuracy, it still feels very clunky at times, and sometimes it just, you know, goes off the rails as we just saw, but that is also so much fun to listen to. But in all seriousness, I am really impressed with this model. believe it has real-world potential. So, well done, Nvidia. But those are just my two cents. Have you tried Persona Plex on your own? Do you like it? I'm super curious to know what you think. So let us know in the comments down below. And folks, if you like these types of technical breakdowns, be sure to let us know by clicking that like button underneath the video, and also don't forget to subscribe to our channel. This has been Andras from Better Stack, and I will see you in the next videos. >>