Is there a search engine, that would allow me to search by a regular expression?
-
Character classes only could be doable though. – Joris Geer Feb 24 '11 at 23:32
-
This would be nice, but regex searches don't allow for efficient indexing and would result in a linear search of the trillion or so pages on the Internet. Smaller categories such as articles on a particular website or posts on StackOverflow might be possible however. – Vortico Nov 12 '12 at 04:47
-
[stackse](http://stackse.com/) – ren Jul 04 '17 at 21:07
7 Answers
Google Code Search allows you to search using a regular expression.
As far as I am aware no such search engine exists for general searches.

- 811,555
- 193
- 1,581
- 1,452
-
2
-
Most answers to this question are now outdated. [Google Web Search also supports regular expressions](http://webapps.stackexchange.com/a/82769/20087) now. – Anderson Green Feb 02 '17 at 04:48
There are a few problems with regular expressions that current prohibit employing these in real-world scenarios. The most pressing would be that the entire cached Internet would have to be matched with your regex, which would take significant computing resources; indexes are pretty much useless in regex context it seems, due to regexes being potentially unbound (/fo*bar/).

- 4,346
- 24
- 20
If regex takes up too many resources, why not charge for its use by cputime instead of making it completely unavailable? I'm sure some people would pay and get use of it (and of course offer an explanation for the charge, explain in terms of carbon footprint and cpu resources). Google does support expansive * in its searches *go
or go*
or intitle:"*go"
here it is: http://www.hackcollege.com/blog/2011/11/23/infographic-get-more-out-of-google.html

- 19
- 1
I don't have a specific engine to suggest.
However, if you could live with a subset of regex syntax, a search engine could store additional tokens to efficiently match rather complex expressions. Solr/Lucene allows for custom tokenization, where the same word can generate multiple tokens and with various rule sets.
I'll use my name as an example: "Mark marks the spot."
Case insensitive with stemming: (mark, mark, spot)
Case sensitive with no stemming: (Mark, marks, spot)
Case sensitive with NLP thesaurus expansion: ( [Mark, Marc], [mark, indicate, to-point], [spot, position, location, beacon, coordinate] )
And now evolving towards your question, case insensitive, stemming, dedupe, autocomplete prefix matching: ( [m, ma, mar, mark], [s, sp, spo, spot] )
And if you wanted "substring" style matching it would be: ( [m, ma, mar, mark, a, ar, ark, r, rk, k], [s, sp, spo, spot, p, po, pot, o, ot, t] )
A single search Index contain all of these different forms of tokens, and choose which ones to use for each type of search.
Let's try the word "Missippi" with a regex style with literal tokens: [ m, m?, m+, i, i?, i+, s, ss, s+, ss+ ... ] etc.
The actual rules would depend on the regex subset, but hopefully the pattern is becoming clearer. You would extend even further to match other regex fragments, and then use a form of phrase searching to locate matches.
Of course the index would be quite large, BUT it might be worth it, depending on the project's requirements. And you'd also need a query parser and application logic.
I realize if you're looking for a canned engine this doesn't do it, but in terms of theory this is how I'd approach it (assuming it's really a requirement!). If all somebody wanted was substring matching and flexible wildcard matching, you could get away with far fewer tokens in the index.
In terms of canned apps, you might check out OpenGrok, used for source code indexing, which is not full regex, but understands source code pretty well.

- 1,446
- 2
- 19
- 37
http://www.google.com/codesearch has been shut down...
Regular expression search takes much resources and thus is not affordale by popular search engines.

- 41,764
- 65
- 238
- 329

- 29
- 7
Globalogiq has an HTML Source Code Search where you can search with regular expressions. It's not free though.

- 2,493
- 1
- 18
- 10