The sorry state of Avira anti-virus heuristics

UPDATE (July 7, 2010) Daniel Herding sent me a story about his adventures with Avira’s latest version. Looks like they’ve just paved over the problem.

We’ve been seeing a number of reports that our DotSpots Chrome extension is being reported as infected with the HTML/Crypted.Gen malware. There’s not a lot of information on Avira’s site about this malware, except that it’s a ‘trojan’ with ‘low damage potential‘.

I previously submitted a sample of the false positive which solved the problem until we pushed out a new version of our code. Since we can’t submit each of our builds ahead of time to Avira for approval, I had to spend some time figuring out exactly what was causing them to think our code was malicious.

I started by downloading a trial version of their anti-virus product. Immediately after restart, it detected the HTML/Crypted.Gen malware in the Chrome extension that was already installed. I extracted the script from the extension to my desktop and it continued to pop up infection warnings on the file. Now that I had a reliable reproduction case, I could start working on narrowing down what triggered the alert.

I loaded up the file in my trusty analysis tool, Notepad. Starting from the original file, I deleted large swaths of code until it was no longer detected as a virus, then restored those pieces and deleted smalled chunks. Eventually I reduced it down to a couple of lines, which were then reduced down to a few strings of characters. At the end, a file with only a few hundred characters would trigger the signature:

.fromCharCode
.charCodeAt
for
eval
0,0,0,0,0,0
Math.min

Aside: my original set of pattern strings included “nodeValue” rather than “eval”. The patterns are all case-insensitive and don’t ensure matches happen on JS token boundaries. When I went character-by-character to simplify the triggers further, I discovered that it was the ‘eVal’ in ‘nodeValue’ causing issues.

When I create a file with those six strings in it on a website, Avira will attempt to block the download. This appeared to be the most specific components of the signature. Putting those keywords into Google, I found a few references to the malware it detected. The malicious script seems to construct an iframe from an array of characters, then inserts it into the document to download malware from a third-party site.

Unfortunately, these keywords also end up in the compiled Javascript of nearly every Google Web Toolkit application, giving Avira anti-virus users false-positives when viewing many of these applications.

I posted a report to the GWT contributor Google Group with my findings. I had expected that since I posted the offending signature in the message, Avira would warn me that the web page I was reading was malicious. It didn’t.

I ran the message page through the same process that I used to figure out what triggered the signature, this time looking for the smallest piece of text that disables detection of the virus. It turns out that this text is the phrase google. So, the heuristic looks for the presence of the six character strings above, but also the absence of the word Google.

It’s a little disappointing to see how poorly this anti-virus product implements heuristic detection of this particular scripting pattern. It was trivial for me to figure out the pattern. I could have worked around any number of ways- by adding whitespace to the array of zeros, using Math['min'], or String['from' + 'CharCode'], all of which breaks this pattern recognition. Having the phrase ‘google’ disable detection of the virus made my job even easier. It’s possible that there are a set of other safewords that do the same thing. If I were writing malware or viruses, I’d definitely spend time altering it to work around this sort of heuristic.

Considering that the risk of false positives is so high (and users might be trained to ignore other, potentially valid virus warnings), I’d say that users are worse off with this virus definition than they are without.

You can find me on twitter as @mmastrac

28 Responses to “The sorry state of Avira anti-virus heuristics”

  1. anonymous says:

    great sleuthing!

  2. Joshua Gruber says:

    So will future builds of you extension have random “Google”‘s sprinkled in the comments?

  3. Jim WOods says:

    Oh wow, that is indeed some very good stuff dude.

    jess
    http://www.anon-vpn.net.tc

  4. Chris says:

    That’s great work. I tip my hat to you, but the article makes me sad :(

  5. M says:

    This is not heuristic at all; its a simple pattern matching. and a real lame one !

  6. I thought as much says:

    Well I guess it’s still AVG FTW, then?

    • DavidDeKlabouter says:

      sadly, avg failed on me once, whereas avira detected the malware. also, avg seriously cripples my system’s performance.
      i don’t use the web component of avira though, so the problem above is of less importance to me. still leaves a bad aftertaste…

  7. Andres says:

    Warning: Reversed Engineered code.
    It goes a little bit like this:

    def infected?(target)
    File.open(target).each { |line|
    # antivirus magic:
    if( line.include? “eVal” and !line.include? “google”)
    return “You’re Infected!”
    end
    }
    end

    Just save it to a ruby file and you too can be the proud owner of a state of the art antivirus. lol

  8. Donald says:

    I switched to MS Security Essentials recently. I used to use AVG, but it was starting to get too unstable and slow. Quite a few false positives were creeping in as well.

    MS SE is quite quick and very stable. No idea if I am actually protected though… :)

  9. I wonder if this is actually an instance of learned behaviour. e.g. if you train a bayesian classifier on a lot of example code you could probably still reduce the behaviour to a similarly nonsensical “heuristic” by feeding it things that don’t look much like code. It’s possible that the decision boundary is actually more subtle than “if it contains these words and doesn’t contain the word google it must be a virus”.

    Of course, it’s equally possible that it’s really not.

    • Matt Mastracci says:

      Based on my limited testing it doesn’t seem to act bayesian. I’ve found a couple of minimal subsets of tokens that will trigger the warning, but adding the phrase ‘google’ will only disable it for one of them. It’s possible that things are sitting precariously close to some bayesian threshold and ‘google’ isn’t a big enough negative score, but I don’t get that impression, especially as adding ‘google’ in a comment at the beginning of our Chrome extension’s script was sufficient to cancel out all of the other malware-looking things within.

  10. [...] The sorry state of Avira anti-virus heuristics « grack.com: Matt … [...]

  11. [...] von Avira den Unterschied feststellt? Na, das haben die sich ganz einfach gemacht. Wenn diese sechs Zeichenketten zusammen mit einer siebten Zeichenkette, nämlich »google&#…. Ja, die glauben bei Avira wirklich, dass Code mit diesen sechs Zeichenketten unbedenklich ist, [...]

  12. Nik says:

    Really great finding. Now I even have a reason for not using this AV solution …

  13. Crony says:

    Hi,
    nice articel. I think you opened my eyes: Considering copy protection of software/music …
    Theses techniques fulfill every single aspect for antiviruscompanies to be the pure definition of malware. I recently talked to someone working at an antiviruscompany and he bespeaked that they really have huge problems not detecting software with copyprotection/drm … etc.

    If i was an virusauthor, i would definitely inject my malware into some commercial, copyprotectet software an reverse the patterns of the antivirussoftware that causes this piece of software to be undetectet. This sounds like kind of a free ticket for every malware, thanks to DRM. Finally DRM saved a vocational field, unfortunately not the field of the music industries.

  14. Opinio says:

    Lacher des Tages schon am frühen Morgen…

    So kann man doch eine Tag beginnen: Die Gewalt der Hells-Angels etc. kommt von Computerspielen. Einer der ernsthaften Ursachenfaktor dafür ist, das viele junge Leute sich durch Computerspiele in Kampf-Rollen bewegen. Irgendwann will man das dann auch…

  15. According to VirusTotal, there’s actually four products which fall for this:
    http://www.virustotal.com/analisis/63c0e2fbdaaefeb3d86080bd4425307fad5f5e141352aeda3fc273957cd14982-1269044775

    The test file contained nothing but the six lines from the black box above.

  16. [...] auf einer anderen Webseite über die in Avira verwendeten Heuristiken entdeckt, genauer auf dieser http://grack.com/blog/2010/03/17/the-sorry-state-of-avira-anti-virus-heuristics/ Webseite. Darin beschriebt ein Entwickler der ein Add-on für Google Chrome entwickelt das sein [...]

  17. Robert says:

    Genealogists never die, they just loose their roots.

  18. “Unterm Strich”: 11. Kalenderwoche…

    Was war diese Woche? In der Kolumne “Unterm Strich” werfe ich ein Highlight auf die News und Schlagzeilen der vergangenen Woche. Natürlich kann dabei niemals ein vollständiges Bild gezeichnet werden, aber es entsteht hoffentlich ein gewisse…

  19. [...] zumindest so ähnlich. Folgendes facepalm musste sich Avira von einem etwas zu neugierigem Informatiker gefallen lassen. [...]

  20. [...] der Antivirenprogramme weitgehend wertlos und oft sogar gefährlich sind. (Update: siehe auch diesen Artikel, der zeigt, wie lächerlich die Erkennungsmuster sind.) Ernst nehmen kann man die Warnungen [...]

  21. [...] Kopf schütteln kann. Einem Entwickler einer Chrome-Extension (ein Plugin für den Browser) ist das aufgefallen, nachdem er Berichte über die mögliche Viren-Natur seiner Erweiterung bekommen hat. [...]

  22. If you think that Avira had fixed this problem, think again. They have just added a kludge on top of their kludgy heuristics. Users of GWT-generated websites still get false alarms, and it is still simple for malware authors to circumvent the heuristics.

    https://sourceforge.net/apps/mediawiki/protoreto/index.php?title=Avira_AntiVir_triggers_false_alarms_in_GWT-generated_JavaScript

  23. Jason says:

    Hi Matt,

    First of all, a BIG thanks for your investigation and reporting! This is the first actual post I’ve seen with more than just hate/flaming for something that is free!

    That said, I am also getting VERY frustrated with AVIRA recently. I’m getting the same random and unjustified alerts on some WebPages but most frustrating to me is the detection of Microsoft files. Today, Avira suddenly blocked 14 different files on a machine that has had no changes in weeks! Almost all of the files were MS files like SAPISRV.EXE, Rbutton.exe, fpcount.exe etc… (I uploaded fpcount.exe to Virus total and ONLY Avira detected anything so I’m pretty sure it’s an FP) Ok, sure, just add to the exceptions list (which is starting to get very large for me) but if you’re an engineer like me, you’ve got service packs uncompressed in multiple locations so even after you (and I mean manually) add a file to the exception list (Twice), it still blocks all other instances of the same file in different locations so you have to add the path to each copy!

    This is pretty bad for me as I advise my customers to install (and pay for if they like) Avira free version. (I also advise them to install Malwarebytes Anti Malware)

    I can’t stand Avast as it makes anything I install it on slow to a crawl.

    #1. If you use the paid version, does the ignore option actually work(from within the Guard popup, right click ignore), or do you still need to open the Avira Consol/Control Panel and browse to each instance of each file giving an FP (in both Scanner and Guard!!!)?

    #2. Can you suggest another free, easy to use and actually WORKING AV (with or without Malware detection) that I can suggest to my customers?

    #3. I understand what you are saying about Avira’s flawed method of detection, but is that just Avira? Do they all/is it common that other AV’s detect the same way?

    Regards,
    Jason