I was wondering if it would be possible at all to build an inverted index for all possible regular expressions... I have had a few ideas, but they are extremely vague at the moment.
My reasoning behind this is because I think that a search engine that uses regex would be pretty useful (I'm sure many people would agree), although the problem with a search engine is that there is quite a lot of things to search. This is why there are inverted indexes, I guess.
Maybe something similar? I don't really know.
Here's a description of my idea:
The search engine should be a regex search engine. Instead of being like a normal search engine which only matches words, this will match specific regex specified by the user.
an example of a search: [^ ]*ell[^ ]* .*\.
something like that, for example. the reasoning behind this is that sometimes i want to search something that can't be found due to the limitedness of normal search engines.
it'll be a simple sed-like regex, maybe a bit javascripty. they are all similar anyway (with the basics)
Edit: I've seen regular expression search engine, but it's not what I am asking. I'm wondering if it's possible to build one.
Edit 2: Maybe an inverted index that has bits of words, and numbers (and their length), etc. Maybe some kind of table where I can quickly pick things out, so if I have a number of a certain length in my regex, I can quickly filter all the numbers that i have indexed that have that length?
If I combine those ideas, I just realized that maybe multiple searches, but with a shrinking data source, until everything that is left is what matches the regex? Eg: ell.\*\\.
would search for everything with e
, then everything with a l
following the a
, then everything with another l
following the el
, and then any number of characters followed by a .
.