3

dawgdic is a great DAWG library, but it has a significant drawback because it is static (not updateable) and has to be constructed form strings sorted in alphabetical order. If the raw data from which the DAWG is constructed is big (several gigabytes), the initial construction of DAWG involving sorting a huge array of strings can demand too much resources.

Is there a library that provides a memory-efficient structure as dawgdic which allows construction from non-sorted dictionary?

lizarisk
  • 7,562
  • 10
  • 46
  • 70

3 Answers3

3

Currently, I don't think there's any library that allows construction of DAWGs from a non-sorted dictionaries.

But, after a lot of searching, I've found this paper, "Incremental Construction of Minimal Acyclic Finite-State Automata" , which I think has exactly what you want. Maybe you could make your own library after reading this, and share it with everyone!

EDIT: Have you looked at this question?

Community
  • 1
  • 1
max
  • 4,248
  • 2
  • 25
  • 38
2

I've found some great libraries that allow online construction from non-sorted data, altough they are not based on DAWG:

  1. cedar - a very fast double-array trie
  2. marisa-trie - а very space efficient string matching library
lizarisk
  • 7,562
  • 10
  • 46
  • 70
2

I currently know of no C++ implementations of a DAWG which support the construction from non-sorted data, but if you're open to creating your own solution which has such a feature, Incremental Construction of Minimal Acyclic Finite-State Automata (2000) is a paper which basically lays out the algorithm behind it.

Alternatively, if you're open to porting solutions from other languages, it may be worth your while to check out MDAG, a Java implementation of the data structure. It supports both on-the-fly string addition and string removal, which is exactly what you are looking for. The code is also easy to follow, and extensively commented, so porting it should be a fairly simple task.

Disclaimer: I am the author of MDAG :) .

Kevin
  • 2,617
  • 29
  • 35