regular expression search engine

Question

Is there a search engine, that would allow me to search by a regular expression?

This would be nice, but regex searches don't allow for efficient indexing and would result in a linear search of the trillion or so pages on the Internet. Smaller categories such as articles on a particular website or posts on StackOverflow might be possible however. — Vortico, Nov 12 '12 at 04:47

score 3 · Accepted Answer · answered Jan 01 '11 at 14:01

3

Google Code Search allows you to search using a regular expression.

As far as I am aware no such search engine exists for general searches.

answered Jan 01 '11 at 14:01

Mark Byers

811,555
193
1,581
1,452

2

Note that Google Code Search is being retired. – MetaEd Nov 08 '11 at 21:07
Most answers to this question are now outdated. [Google Web Search also supports regular expressions](http://webapps.stackexchange.com/a/82769/20087) now. – Anderson Green Feb 02 '17 at 04:48

score 2 · Answer 2 · answered Jan 01 '11 at 17:36

There are a few problems with regular expressions that current prohibit employing these in real-world scenarios. The most pressing would be that the entire cached Internet would have to be matched with your regex, which would take significant computing resources; indexes are pretty much useless in regex context it seems, due to regexes being potentially unbound (/fo*bar/).

score 1 · Answer 3 · answered Nov 17 '12 at 03:58

If regex takes up too many resources, why not charge for its use by cputime instead of making it completely unavailable? I'm sure some people would pay and get use of it (and of course offer an explanation for the charge, explain in terms of carbon footprint and cpu resources). Google does support expansive * in its searches *go or go* or intitle:"*go" here it is: http://www.hackcollege.com/blog/2011/11/23/infographic-get-more-out-of-google.html

score 1 · Answer 4 · answered Dec 15 '11 at 23:45

I don't have a specific engine to suggest.

However, if you could live with a subset of regex syntax, a search engine could store additional tokens to efficiently match rather complex expressions. Solr/Lucene allows for custom tokenization, where the same word can generate multiple tokens and with various rule sets.

I'll use my name as an example: "Mark marks the spot."

Case insensitive with stemming: (mark, mark, spot)

Case sensitive with no stemming: (Mark, marks, spot)

Case sensitive with NLP thesaurus expansion: ( [Mark, Marc], [mark, indicate, to-point], [spot, position, location, beacon, coordinate] )

And now evolving towards your question, case insensitive, stemming, dedupe, autocomplete prefix matching: ( [m, ma, mar, mark], [s, sp, spo, spot] )

And if you wanted "substring" style matching it would be: ( [m, ma, mar, mark, a, ar, ark, r, rk, k], [s, sp, spo, spot, p, po, pot, o, ot, t] )

A single search Index contain all of these different forms of tokens, and choose which ones to use for each type of search.

Let's try the word "Missippi" with a regex style with literal tokens: [ m, m?, m+, i, i?, i+, s, ss, s+, ss+ ... ] etc.

The actual rules would depend on the regex subset, but hopefully the pattern is becoming clearer. You would extend even further to match other regex fragments, and then use a form of phrase searching to locate matches.

Of course the index would be quite large, BUT it might be worth it, depending on the project's requirements. And you'd also need a query parser and application logic.

I realize if you're looking for a canned engine this doesn't do it, but in terms of theory this is how I'd approach it (assuming it's really a requirement!). If all somebody wanted was substring matching and flexible wildcard matching, you could get away with far fewer tokens in the index.

In terms of canned apps, you might check out OpenGrok, used for source code indexing, which is not full regex, but understands source code pretty well.

score 0 · Answer 5 · edited Oct 05 '12 at 08:04

0

http://www.google.com/codesearch has been shut down...

Regular expression search takes much resources and thus is not affordale by popular search engines.

edited Oct 05 '12 at 08:04

Stephan

41,764
65
238
329

answered Oct 05 '12 at 08:00

arabindamoni

29
7

score 0 · Answer 6 · answered Oct 23 '12 at 18:01

0

Globalogiq has an HTML Source Code Search where you can search with regular expressions. It's not free though.

answered Oct 23 '12 at 18:01

Ben

2,493
1
18
10

score 0 · Answer 7 · answered Jan 20 '12 at 18:59

0

A very good article on regex search on a trigram index for by Russ Cox

http://swtch.com/~rsc/regexp/regexp4.html

answered Jan 20 '12 at 18:59

bpgergo

15,669
5
44
68

regular expression search engine

7 Answers7

Linked