Some say I should use regex whenever possible, others say I should use it at least as possible. Is there something like a "Perl Etiquette" about that matter or just TIMTOWTDI?
-
2Obligatory: [Some people, when confronted with a problem, think “I know, I'll use regular expressions.” Now they have two problems.](http://www.codinghorror.com/blog/2008/06/regular-expressions-now-you-have-two-problems.html) (And the [background](http://regex.info/blog/2006-09-15/247)) – Brad Christie Jun 08 '11 at 15:36
-
7@Brad: If I could downvote your comment, I certainly would. Insofar as I may be something of a virtuoso regexer, I hate to tell people to do as I say not as I do. Written properly, regexes replaces pages and pages of complicated and hard to debug code. They can make code faster, more maintainable, and clearer — not the other way around (since if they do, you’re doing something wrong). Use `/x`; use whitespace, indentation, grouping, and comments; use named captures; use smaller pieces; use grammatical regexes; etc etc etc. – tchrist Jun 08 '11 at 16:38
-
1@tchrist: The comment wasn't made to be interpreted as "never use regex", instead I was (in a round-about form) saying that regex shouldn't _always_ be the first solution. My apologies if it came off wrong. – Brad Christie Jun 08 '11 at 16:53
-
4It is true that Perl programmers tend to think more in terms of regexes than programmers in other languages even when those languages also offer regex facilities. It’s a bit of cultural thing. We write `$s =~ s/\A.{5}//s` over `substr($s, 0, 5) = ""`, and innumerable similar examples. People make trouble for themselves with regexes when they fail to apply the time-tested principles of structured progamming and small-is-better to them. [Grammatical regexes](http://stackoverflow.com/questions/4840988/the-recognizing-power-of-modern-regexes/4843579#4843579) help a lot, and there are other tools. – tchrist Jun 08 '11 at 16:58
7 Answers
The level of complexity generally dictates whether I use a regex or not. Some of the questions I ask when deciding whether or not to use a regex are:
- Is there no built string function that handles this relatively easily?
- Do I need to capture substring groups?
- Do I need complex features like look behind or negative sets?
- Am I going to make use of character sets?
- Will using a regex make my code more readable?
If I answer yes to any of these, I generally use a regex.

- 1,937
- 1
- 14
- 26
I think a lot of the answers you got already are good. I want to address the etiquette part because I think there is some.
Summed up: if there is a robust parser available, use it instead of regular expressions; 100% of the time. Never recommend anything else to a novice. So–
Don'ts
- Don't split or match against commas for CSV, use Text::CSV/Text::CSV_XS.
- Don't write regexes against HTML or XML, use XML::LibXML, XML::Twig, HTML::TreeBuilder, HTML::TokeParser::Simple, et cetera.
- Don't write regexes for things that are trivial to split or unpack.
Dos
- Do use substr, index, and rindex where appropriate but recognize they can come off "unperly" so they are best used when benchmarking shows them superior to regular expressions; regexes can be surprisingly fast in many cases.
- Do use regular expressions when there is no good parser available and writing a Parse::RecDescent grammar is overkill, too much work, or will be too slow.
- Do use regular expressions for throw-away code like one-liners on well-known/predictable data including the HTML/CSV previously banned from regular expression use.
- Do be aware of alternatives for bigger problems like P::RecD, Parse::Yapp, and Marpa.
- Do keep your own council. Perl is supposed to be fun. Do whatever you like; just be prepared to get bashed if you complain when not following advice and it goes sideways. :P

- 4,307
- 2
- 19
- 28
-
1In all my time using `index` I had never heard of `rindex`! Thanks, I learned something new. I guess I need to curl up with `perlfunc` again now that I am much deeper into Perl. – Joel Berger Jun 08 '11 at 20:50
-
1
I don't know of any "etiquette" about this.
Perl regex are highly optimized (that's one of the things the language is known for, although there are engines that are faster), and in the end, if your regex is so simple that it could be replaced by a string function, I don't believe that the regex will be any significantly less performant. If the problem you are trying to resolve is so time sensitive that you might look into other possibilities of optimization.
Another important aspect is readability. And I think that handling all string transformations through regex also add to this, insteas of mixing and matching different approaches.
Just my two cents.

- 68,819
- 11
- 102
- 123
Though I would classify this as too opinionated for SO, I'll give my point of view.
Use regex when the string is:
- "Too Dynamic" (The string could have a lot of variation to it, that making use of the string library(ies) would be cumbersome.
- "Contains patterns" if there is a genuine pattern to the string (and may be as simple as 1 character or a group of characters) this is where (i feel) regex excels.
- "Too Complex" If you find yourself declaring a whole function block just to do what a single pattern can do, I can see it being worthwhile just to use regex. (However, see "Too Complex" below, too).
Do not use regex to be:
- "Fast" Consider the overhead involved in spinning up a regex library over grabbing information directly from a string.
- "Too Complex" Good code isn't always short. If you begin making a huge pattern to circumvent several lines of code, that's fine, but keep in mind it's at the risk of readability. Coming back to that piece and trying to wrap your head around it again may not be worth just doing the plain-jane method.

- 1
- 1

- 100,477
- 16
- 156
- 200
-
I think you mean to say that shorter code isn't always good/better. :-) – Wiseguy Jun 08 '11 at 15:55
-
And yes, you are right, I should have been more specific when phrasing the question. But nonetheless I think I got a lot of very useful answers. – AlexTheBird Jun 09 '11 at 15:41
I'd say, if you need more than one or two string function calls to do it, use a regex. ;)

- 33,241
- 9
- 83
- 121
-
(My answer assumes that regex would be the right tool for the job, and a proper parser is not needed. So it's just for string functions vs regex.) – Qtax Jun 09 '11 at 18:52
-
Yes, I know, I already said to Brad Christie, I am not very happy with the phrasing of my question. I didn't mean to offend you with my comment and if I did I apologize. – AlexTheBird Jun 11 '11 at 14:13
Perl is a great language for regex. It honestly has one of the greatest parsers of any language, so that is why you see so many "use regex" answers. I am not sure what the aversion to regex is, however.
My answer would be: can you sum up the work in a single pattern easier than using the string function, or do you need to use multiple string functions versus a single regex? In either case, I would aim for regex. Otherwise, do what feels comfortable for you.

- 39,270
- 4
- 65
- 132

- 16,870
- 3
- 25
- 32
-
2since we're talking regexes: `s/PERL/Perl`. http://perldoc.perl.org/perlfaq1.html#What's-the-difference-between-%22perl%22-and-%22Perl%22%3f – Joel Berger Jun 08 '11 at 16:13
-
Imo you can only see the simplicity and elegance of regex when you actually work with them. I guess that is the point where all the aversion comes from. – AlexTheBird Jun 09 '11 at 14:51
-
1@AlexTheBird: I think another part of the aversion is the learning curve. Other types of string comparison, etc., use the coding practices the user already knows. Regex is a new syntax. – Gregory A Beamer Jun 09 '11 at 15:02
For things that are not too complex that the regex becomes bloated, affects the readability of code and cause performance issues. You can do it via a serious of steps, using builtin functions and other means. You may not have a cool single line regex, but your code will be readable and maintanable.
And also not too simple problems because, again, regexes are heavy weight and there are usually built-in functions that handled the simple scenarios.
It is going to depend on what you are going to do. Ofcourse, please don't use regex for parsing ( especially HTML etc. )

- 290,304
- 63
- 469
- 417