Vishing attacks are not new, but they are on the rise. This can be explained by many factors, ranging from better email protection and better user maturity but also due to the rise of voice AI technology that can help attackers increase efficacy as well as attack scale.
In this context, I’m going to focus specifically on what AI vishing attacks bring to the table for attackers.
The takeaway
- AI supercharges vishing with realism and scale – Advances in text-to-speech, voice cloning, and conversational AI now enable attackers to launch highly convincing and scalable voice phishing campaigns, mimicking real human interactions.
- Voice-based trust is no longer reliable – Since people instinctively trust familiar voices, cloned voice attacks can easily bypass traditional skepticism, making them especially effective against busy employees or high-stakes targets.
- The threat landscape is evolving rapidly – Future attacks will likely combine OSINT, automation, and agentic AI to launch hyper-personalized vishing at scale. Defensive training must evolve equally fast to keep up with this shift.
Key AI components in vishing attacks
Unless you’ve been living under a rock for the past year, you’ve probably heard about AI progress and impressive growth.
Everybody is adding AI to their product, often renaming their decade old products and services, adding “AI-powered” before their value proposition.
This not only attests to poor marketing choices but also to the rapid rise of a very useful and actionable technology.
Well, attackers benefit from this technology too and vishing attacks are relying more and more on it.
Text-to-speech and voice synthesis technology
This field isn’t new. We’ve had voice over and different voice synthesis technology for a long time now, but recent progress in AI brings new capacities to text-to-speech engines.
This is a key component in AI vishing attacks: we can now use cloned voices, fake accents, emotions, … Possibilities are endless.
Faking human interactions
The other great progress we’ve seen is the quality of interactions and human-like discussions Large Language Models can have.
This brings a whole new level of realism in conversational attacks and can be used in AI vishing attacks.
There are different ways it can be deployed, from augmented decision-trees to fine-tuned models but the bottom line is: generative Ai is really good at faking human interactions.
Proof being: how many times have you said “please” and “thank you” to ChatGPT?
Mass scale vishing
AI vishing attacks can be done at scale.
This is the first and most obvious change in the threat landscape.
Large scale attacks used to be limited to asynchronous vectors: email, messages, malvertising…
Synchronous attacks used to require a human operator to interact with their targets. It was a 1-on-1 situation.
With the rise of AI, large scale AI vishing attacks are now very real and easy to do.
One attacker can program several “bots” that will call and interact with their victims simultaneously.
Note that what is true for attackers is also true for defenders: with this AI vishing technology, you can also craft large scale vishing simulations to train all of your employees.
Deepfake and voice cloning attacks
Volume is not the only problem here. AI vishing attacks are also more realistic.
Voice cloning technology has been around for some time now.
I remember seeing Mission: Impossible and Ethan changing his voice with a piece of high-tech tape on his neck ten years ago.
Now, with 15 seconds of a clean voice sample, you can have a good clone of the voice of someone.
Because this tech is new and we are used to associate a voice with an identity, it abuses an implicit form of authentication.
What I mean by that is by default, when you hear someone’s voice, you can identify this person.
And you’re confident about it.
But you shouldn’t.
Just imagine the impact it can have on busy employees, going about their day and having a direct superior asking them for a specific piece of information or action.
Why would they ask a specific double-verification process to make sure it’s the right person? It’s their voice. Their brain, our brain will scream at us: it’s the right person, no need to go through that boring process, call them back, ask a weird question only themselves can answer.
This is very potent and has already been used. In Italy for instance, where more than a million dollars was stolen by scammers pretending, thanks to voice cloning, to be the Italian minister of defence.
Voice cloning attacks are real, effective and dangerous.
Agentic AI vishing attacks: a forecast
Up until now, I’ve talked about what is already happening and has been documented.
I’d like this blog post to age well, so I’m going to do a very risky exercise and talk about what will happen next.
First of all, the two points I talked about are non-exclusive.
This means the increase of volume will not be at the cost of lower realism: we will see more attacks and they will be more effective.
Now, these attacks will follow the same trend phishing did.
We first had mass-spam and loosely targeted phishing attacks. Then we have spear phishing, with personalized email and context based on open-source intelligence (OSINT).
The OSINT phase can be largely automated, tapping into sales enrichment databases, publicly accessible (or privately bought) leaks, monitoring social network activity and news to coordinate tailored attacks to the right person, at the right moment.
And this is doable at scale. We have a few scary prototypes at Arsen that will map out targets, craft a relevant scenario, get the contact info and call.
Agentic AI — another buzzword, sorry about it — will help craft a chain of automations that can build a very comprehensive attack framework ranging from data collection to exploitation.
We can have agents call you, make you talk for about 15 seconds, clone your voice from the sample then call your employees to make them install remote access software.
This is not science fiction anymore and given the rate of progress — although I hate fearmongering — I’d recommend starting training your employees against these threats.
Conclusion
I tried to shine a light on current AI vishing attacks but also where this will go. Phone numbers don’t have the same level of security as email has and it’s harder to defend with the currently available technical solutions of the market.
If you’re looking into a scalable way to train your people to adopt better behavior against this new generation of attack, please reach out and request a demonstration.
Using the same technology attackers do, our platform provides large scale vishing training, with voice cloning and a high level of customization to evaluate and train people in realistic situations.