I recently had the enjoyable task of cleaning up a website that had been infected with malware. A friend’s site had suddenly started generating that oh-so-friendly browser warning—you know, the one with the bright red background and the big message that says “Warning: Visiting this site may harm your computer!” Not only does this problem basically kill your search engine rankings, but it also undermines any confidence in your company or product. So, obviously, she wanted it cleaned as soon as possible.
Originally, since the main portion of the site is hosted through Ning social network, I thought that a rogue user (or smart bot) had created a profile with links to malware sites, and that profile had been around long enough that Google picked up on it and labeled the site as evil. However, I should have known that Google is smarter than that. Merely having clickable links to evil sites won’t cause them to flag a site as hosting malware. Good thing, too, since that would implicate many perfectly clean blogs and forums whose comments aren’t completely moderated.
Actually, browsing to nearly any page on the entire site caused not only the malware warning, but if you ignored that and proceeded anyway, you would also be redirected to one of many nefarious sites and be bombarded with popups (or popup attempts, if you have a good browser). There is a non-Ning portion of this particular website as well, and that exhibited the exact same symptoms. So, I downloaded the entire website over FTP into my IDE and began looking through some of the files. It was immediately apparent that there was a rampant web-based infection in nearly every file that a normal visitor might touch with their browser. Something evil had access to the hosting account.
Well, if you happen to have an exact original copy of the entire site, and you can clean it out and re-upload it, then I congratulate you. That’s definitely the easiest solution. But what if you don’t? What if you only have the infected version, and there is no clean master copy?
Use regular expressions, of course! (I love that comic.)
For this particular infection involving PHP, HTML, and JS files, each evil tag looked something like this:
<?php eval(base64_decode("PHNjcmlwdCBzcmViYWRzaXRlLmNvbT48L3NjcmlwdD4=")); ?>
There is a particular pattern to these, which helps us clean them out automatically. Notice the obfuscation in the PHP code, for one thing. They use
base64_decode() to hide the true code. But how often does anyone legitimately use
base64_decode() in their PHP code? Not often, I’d guess. Also, notice the lack of quotes around the script’s
src attribute value. That’s bad form, and hopefully your code doesn’t look like that.
Here are the regular expressions that I used to wipe out the entire infection. You need an IDE, editor, or shell script that will apply these to an entire source tree recursively. I use PhpED, but I know others will do the trick. Also, it is a very good idea to try find alone before you try replace, in case you have legitimate code that these regexes match. Don’t just assume these will work without testing. They worked beautifully for me though. If your site uses languages other than PHP, JS, and HTML, you may need to modify or add to these. Also, to clean the infection, anything that matches should be replaced with a nothing (i.e. deleted).
Pay special attention to the last one in this list, since it will kill any
<script> tags that don’t have quotes around the
src attribute, have the
src attribute immediately following the tag name, and have an absolute source reference. If you write your JS code this way, then be very careful with that one. Again, try running a global find with these expressions before you run a global replace, or you very well may be sorry.
Anyway, after my IDE cleaned over 4,000 infected files with those three regexes, I re-uploaded the entire site and submitted it for review using Google’s webmaster tools. That was only last week, so the warning hasn’t been removed yet, but the site functions perfectly and there are no more attempted redirects or popups.