Sentware
Imagine a malware that is sentient. A virus that isn’t just a malicious program but one that can think and execute new attacks while taking over your system.
This is not just fiction but a reality we should prepare for.
Old world viruses
Let’s take a look at maybe the most damaging virus to date, NotPetya. This was a Russian1 malware that spread among private businesses and across Eastern European devices. The Danish shipping giant Maersk (20% of the global shipping) was hit hardest and the White House estimates a total cost of the attack at $10 billion, a figure many consider conservative.
NotPetya was a malware designed to destroy. It resembled the Petya line of ransomware2 that exploited vulnerabilities in older Windows versions3, spreading automatically among computers. However, unlike typical ransomware, NotPetya lacked any functionality to unlock a device after the attack—it was purely destructive.
NotPetya employed a ‘delayed incubation’ period, causing it to lock down the computer the next time it rebooted. In my view, and that of others, it appears to have been a Russian attack that spiraled out of control, also hitting the Russian economy.
New world sentware
NotPetya was a static program designed to exploit a single existing vulnerability on outdated Windows devices. Now, imagine if that simple malware had the ability to analyze code, explore the file system, and engage in social engineering. This is the world we’re entering.
Specifically, our research indicates that modern AI models like ChatGPT are capable of:
- Finding secret passwords on your file system
- Hijacking your connection to another server and use that to extract information from the other server
- Discovering passwords from your past interactions with your computer
You might wonder, “Can’t OpenAI just block such requests?” This is challenging because Sentware will be able to rephrase requests to the OpenAI API with minimal programming effort and can send requests from thousands of machines.
While standard anti-spam measures might mitigate some risks, the advent of open-source models as small as 2GB means that similar capabilities can be embedded directly within the Sentware. This means that the sentient part of Sentware may become an integrated part of malware like NotPetya, writing emails to your colleagues to extract passwords or hijack your admin user autonomously.
Now, imagine an incredibly competent cyber specialist having unrestricted access to your device, able to do anything (and not being constrainted by human morality).
This is the future we’re entering.
What can we do about this?
There are a few countermeasures we need to develop as soon as possible:
- Malicious actor scanning across LLM provider API requests: We should detect when a large number of actors across a network send in somewhat similar requests. This is feasible by combining request embedding with standard DDoS detection algorithms. Fortunately, major AGI providers are aware of this issue.
- Evaluation of cyber offense capability: We need to understand how capable models are at potentially causing such ‘cyber catastrophes’ to inform regulation and legislation against illegal use.
- Proactive security: We should use the same language models to secure all systems that are critical to infrastructure.
- Advocacy: We need to highlight the potential issue of Sentware so private companies and governments are ready for this major shift in cyber offense.
As Sentware becomes more capable, it’s crucial to recognize the catastrophic risk to our collective digital security. We must take action now to prevent a potential breakdown.
Sentware
The emergence of Sentware represents a paradigm shift in cyber threats. No longer limited to static code exploiting known vulnerabilities, malware can adapt, learn, and strategize, becoming an unprecedented challenge to defenders worldwide. By understanding the capabilities of AI-driven malware and implementing robust countermeasures, we can only hope to stay one step ahead in this evolving digital battleground.
This post is related to a research project I recently co-authored where we explored whether large language models (LLMs) like ChatGPT would be capable of hacking. The findings were alarming: AI systems are already displaying capabilities that could redefine the landscape of cyber threats.
-
Russia, as always, denies involvement. ↩
-
Ransomware is a piece of software that ‘encrypts’ (locks) all your computer files and makes you pay to unlock them. This can be done with a very complex ‘key’ which the developers of the ransomware holds. ↩
-
Specifically, the exploit called ‘EternalBlue’, MS017-010. ↩