The Rising Risk of Voice Cloning in Government Services

Public sector organisations are constantly under threat from cybercriminals. A recent report revealed that public sector assets remain the most heavily targeted of any industry. In the past year alone, government systems faced around 2,550 cyberattacks, a sharp 91% increase from the 1,332 incidents recorded in 2023. The public sector is an attractive target for a wide range of actors, from financially motivated hackers to state-sponsored groups and hacktivists seeking political leverage or disruption to essential services.

The tactics being used by these cybercriminals to gain entry to systems are growing more advanced, generating highly convincing identity cloning powered by deepfakes and synthetic voices. In one recent incident, an unknown fraudster used AI to impersonate U.S. Senator Marco Rubio, successfully contacting at least five senior officials through the Signal messaging app, no phishing, no malware, just a synthetic voice realistic enough to be believed.

This evolving threat landscape enhanced by AI makes one thing clear, traditional methods of identity verification, such as passwords, PINs, one-time codes, or just your voice or face, are no longer enough. Public sector organisations must adopt stronger ID verification – AI powered, fused-biometrics technology to provide continuous, multi-layered protection which outpace a new era in AI-generated threats like deepfakes and synthetic voice.

How voice cloning works

Voice cloning has never been easier. What once required hours of recordings and specialist equipment can now be achieved with just a few seconds of audio and off-the-shelf tools. Even just a quick Google search brings up multiple free and low-cost tools for creating synthetic voices, eliminating the cost barrier and bringing this technology into the hands of everyday users. That means anyone with minimal technical knowledge can create convincing synthetic voices at scale.

Voice cloning is happening more frequently and has advanced to the point that it is often difficult to identify. Its misuse is already being felt across both the public and private sectors, from impersonating political leaders to attacking emergency services, to bombarding government call centres with fake calls from bots, leading to helpdesks wasting considerable time and money. Synthetic voices are fast becoming one of the most disruptive tools in the cybercriminal playbook.

Political leaders are prime targets

One of the most common uses of voice cloning is the impersonation of political leaders and senior officials. A synthetic voice of an influential individual can be used to spread disinformation or even attempt to gather login credentials for official accounts, which can then be used to compromise other government systems and harvest financial account information.

Recent incidents make it clear just how real this threat has become. In January 2024, voters in New Hampshire in the US were hit with ‘robocalls’ featuring an AI-generated clone of then-President Biden, urging them not to go to the polls. Just a few months earlier, in October 2023, two fake audio clips of UK Labour leader Keir Starmer spread across social media, one portraying him verbally abusing staff, and another falsely suggesting he criticised the city of Liverpool. At the same time, Canada and the US warned of a malicious campaign where attackers used text and AI-generated voice messages to impersonate senior officials and public figures to target business leaders and government executives in an attempt to steal money and sensitive information.

For governments and public sector officials, the implications of a synthetic voice attack are serious. A single convincing synthetic call could be enough to disrupt operations, compromise sensitive information, or threaten confidence in political institutions.

The impact on government services and the public

While politicians and influential government figures are a huge target, public services and everyday people are also at risk from synthetic voice attacks. Government call centres, including tax departments and local councils, are now often bombarded with cloned voices flooding the phone lines and wasting staff time who should be helping real people.

These aren’t harmless nuisance calls, fraudsters are using fake voices to steal sensitive data and information, or credit card details. If successful, this could result in devastating financial losses for all parties and an erosion of trust. On average, it costs around 50p per minute to answer a call, so minutes wasted can soon add up to a greater financial lost. It’s clear that traditional security measures are no longer enough, and stronger defences against synthetic voices are needed.

Defending against synthetic voice attacks

Traditional passwords, PINs and one time codes are inherently insecure; they can be easily stolen, shared, or guessed. Even basic biometrics like voice or face recognition, once seen as strong alternatives, are no longer sufficient. And because most systems only authenticate once, at the start of a call or login, they can’t detect if a fraudster is still the one speaking minutes later.

In this environment, a one-shot verification process is no longer enough. It assumes that once you have passed this test, you remain the same caller. That’s a big assumption.

To counter this approach by fraudsters, a continuous fused speaker and speech recognition that combines voice and speech patterns is crucial.

Fusing speaker and speech recognition extracts both the biometric and spoken information. This fusion makes voice cloning dramatically harder to pull off. While a synthetic voice might be able to mimic how someone sounds, it cannot easily replicate the unique rhythm of their speech patterns. Cloned voices are often too perfect, where are the ums and aarghs, the hesitations, errors and mistakes?

Furthermore, biometric algorithms identify the acoustic artefacts embedded in the voice signal generated by biometric algorithms to not only recognise a clone voice (as opposed to the natural voice), but also to discern which cloning algorithm is used to produce the deepfake voice. This, combined with a library of commonly used synthetic voices, such as those found in YouTube and TikTok videos, allows voice biometric systems both to detect a cloned voice and to classify the cloning algorithm.

What’s more it can even be taken a step further by combining speaker, speech and
face recognition, analysing both unique vocal traits and facial features, to verify who is really on the end of each sentence. Ensuring that access is granted only when the person is genuinely who they say they are.

In short, we now have active synthetic voice detection working so the idea of deploying a voice biometric solution without active deepfake detection should be “unthinkable”.

Fused biometric authentication can protect political leaders from impersonation, safeguard government call centres, and prevent wasted resources by cutting out fake interactions before they drain time and money. In short, fused biometrics provide the trust and resilience government services need in an age where synthetic identities are just a click away.

Here at FARx, we’re the future of human and computer interaction. To learn more about how fused voice-face biometrics can be used to prevent synthetic voice attacks, get in touch with our expert team here.

Leave a Comment Cancel Reply