6

Is there an algorithm like Aho-Corasick, which can match a set of patterns simultaneously and is applicable to be used in anti-malware comparison? Do all known commercial antivirus software use the Aho-Corasick algorithm?

What are the advantages of the Aho-Corasick algorithm over Boyer-Moore?

jogojapan
  • 68,383
  • 11
  • 101
  • 131
Aan
  • 12,247
  • 36
  • 89
  • 150
  • 2
    Keep in mind most commercial Anti Malware tools probably use something more than just exact string matching, in which case neither algorithm is the right answer. – Billy ONeal Nov 04 '11 at 18:39
  • Yes, I mean the standard comparison process without Heuristic & AI Techniques. – Aan Nov 04 '11 at 18:45
  • 2
    But Aho-Corasick, being a finite-state method, can be extended to fuzzy matching with some basic automata theory. Determining how to weight the dictionary is the hard part. – Fred Foo Nov 04 '11 at 18:48
  • @larsmans Do you have references about how it can be extended to fuzzy matching? Do you know about `ClamAV` if it extended the algorithm? – Aan Nov 04 '11 at 18:53
  • 3
    @Adban: Aho-Corasick constructs a finite-state machine on which the entire algebra of FA operations applies. The references that come to mind are Kornai's *Extended finite state models of language*, Mehryar Mohri's papers and Jurafsky & Martin's *Speech and Language Processing*, first few chapters. – Fred Foo Nov 04 '11 at 18:58

1 Answers1

7

Boyer-Moore: For searching one string in another target string
Aho-Corasick: For searching multiple patterns simultaneously

So the advantage being that Aho-Corasick is optimal if you want to search lot of patterns simultaneously in one pass.

Rabin-Karp string search can also match multiple patterns.

jogojapan
  • 68,383
  • 11
  • 101
  • 131
parapura rajkumar
  • 24,045
  • 1
  • 55
  • 85
  • Is it commonly believed that Aho Corasick is optimal? I mean surely it is if you only do one query, but if you do many, might there not be a more efficient datastructure? – Thomas Ahle Mar 31 '15 at 13:57