UPDATE (July 7, 2010) Daniel Herding sent me a story about his adventures with Avira’s latest version. Looks like they’ve just paved over the problem.
We’ve been seeing a number of reports that our DotSpots Chrome extension is being reported as infected with the HTML/Crypted.Gen malware. There’s not a lot of information on Avira’s site about this malware, except that it’s a ‘trojan’ with ‘low damage potential‘.
I previously submitted a sample of the false positive which solved the problem until we pushed out a new version of our code. Since we can’t submit each of our builds ahead of time to Avira for approval, I had to spend some time figuring out exactly what was causing them to think our code was malicious.
I started by downloading a trial version of their anti-virus product. Immediately after restart, it detected the HTML/Crypted.Gen malware in the Chrome extension that was already installed. I extracted the script from the extension to my desktop and it continued to pop up infection warnings on the file. Now that I had a reliable reproduction case, I could start working on narrowing down what triggered the alert.
I loaded up the file in my trusty analysis tool, Notepad. Starting from the original file, I deleted large swaths of code until it was no longer detected as a virus, then restored those pieces and deleted smalled chunks. Eventually I reduced it down to a couple of lines, which were then reduced down to a few strings of characters. At the end, a file with only a few hundred characters would trigger the signature:
.fromCharCode .charCodeAt for eval 0,0,0,0,0,0 Math.min
Aside: my original set of pattern strings included “nodeValue” rather than “eval”. The patterns are all case-insensitive and don’t ensure matches happen on JS token boundaries. When I went character-by-character to simplify the triggers further, I discovered that it was the ‘eVal’ in ‘nodeValue’ causing issues.
When I create a file with those six strings in it on a website, Avira will attempt to block the download. This appeared to be the most specific components of the signature. Putting those keywords into Google, I found a few references to the malware it detected. The malicious script seems to construct an iframe from an array of characters, then inserts it into the document to download malware from a third-party site.
I posted a report to the GWT contributor Google Group with my findings. I had expected that since I posted the offending signature in the message, Avira would warn me that the web page I was reading was malicious. It didn’t.
I ran the message page through the same process that I used to figure out what triggered the signature, this time looking for the smallest piece of text that disables detection of the virus. It turns out that this text is the phrase google. So, the heuristic looks for the presence of the six character strings above, but also the absence of the word Google.
It’s a little disappointing to see how poorly this anti-virus product implements heuristic detection of this particular scripting pattern. It was trivial for me to figure out the pattern. I could have worked around any number of ways- by adding whitespace to the array of zeros, using Math['min'], or String['from' + 'CharCode'], all of which breaks this pattern recognition. Having the phrase ‘google’ disable detection of the virus made my job even easier. It’s possible that there are a set of other safewords that do the same thing. If I were writing malware or viruses, I’d definitely spend time altering it to work around this sort of heuristic.
Considering that the risk of false positives is so high (and users might be trained to ignore other, potentially valid virus warnings), I’d say that users are worse off with this virus definition than they are without.
You can find me on twitter as @mmastrac