regular expression for textarea

Question

I am looking for a regular expression to filter out all \r\n out of the html file but if there is a textarea it should be passed without having the enter removed.

I am using .NET (C#) technology.

Can you give more information about what you are trying to do? — Matt Kocaj, Dec 02 '09 at 20:06
What platform are you using? RegEx is notoriously unreliable when working with HTML. — 3Dave, Dec 02 '09 at 20:07
Just here to echo cottsak's comment. I can't think of a reason to do this that doesn't imply a larger problem elsewhere. — aehiilrs, Dec 02 '09 at 20:32
This will answer your question: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 — Callum Rogers, Dec 02 '09 at 23:08

Mark Byers · Answer 1 · 2009-12-02T20:30:14.503

3

Don't use regular expressions - use an HTML parser.

edited Dec 02 '09 at 20:30

answered Dec 02 '09 at 20:06

Mark Byers

811,555
193
1,581
1,452

3

avoid parsing ̣̉́̃HṬ̣̣́̀ML with ̣̃̉̉̀ŕ̃̃̃̉̀̀ẽ̉̉́̃g̣̣̃̃̃ẹ̣x̣̣̣̉̉̃̃? – aehiilrs Dec 02 '09 at 20:38

score 2 · Answer 2 · answered Dec 02 '09 at 20:22

2

Speaking of HTML parsers, the Html Agility Pack is great for solving this type of problem.

answered Dec 02 '09 at 20:22

3Dave

28,657
18
88
151

score 0 · Answer 3 · answered Dec 02 '09 at 20:10

0

Alternative approach:

Find, with regex, the position (in the string) where there's a textarea element. The suitable regex for this would be: (<textarea>(.*?)</textarea>)
Remove the \r\n characters from everywhere, except the places you found on #1.

answered Dec 02 '09 at 20:10

Dor

7,344
4
32
45

The problem with this is that the tag will probably have attributes such as id, class, any custom attribs, etc, that may appear in any order, and may or may not be quoted correctly, among other issues. An HTML parser is the only way to get a reliable match. – 3Dave Dec 02 '09 at 20:24
Do you have a sample on how to accomplish this? – Nyla Pareska Dec 02 '09 at 20:28
I built a regex that could find the location of something. `Regexp.new('[\s]* finder.. Course, it's dumb so it doesn't recognize to ignore comments and other such things. I say HTML parser FTW though – Earlz Dec 02 '09 at 20:37

score 0 · Answer 4 · edited May 23 '17 at 12:19

0

This is extremely similar to this answer I've given before.

Fortunately, .NET has a balanced matching feature.

So you can do this:

(<textarea[^>]*>[^<>]*(((?<Open><)[^<>]*)+((?<Close-Open>>)[^<>]*)+)*(?(Open)(?!))</textarea>)|\r\n

Then you can perform a replace value of $1.

Here it is in action: http://regexhero.net/tester/?id=292c5529-5fe8-42e9-8d72-d7ea9ab9e1fe

Hope that helps. The benefit of using balanced matching like this is that it's powerful enough to handle nested tags that are inherent to HTML.

However, it's still not 100% reliable. Comments can still throw it off. And of course this is also an insanely complicated regular expression to manage if you ever need to make changes. So you may still want to use an html parser after all.

edited May 23 '17 at 12:19

Community

1
1

answered Dec 02 '09 at 20:19

Steve Wortham

21,740
5
68
90

2

Even with nested matching, regex is not a powerful enough tool to parse HTML correctly. Things like `` will cause problems. – Mark Byers Dec 02 '09 at 20:32
Good point, I changed the last paragraph. I think this is about as reliable as you can make a regular expression for this task. But you're right -- an ill-placed comment can throw it off. – Steve Wortham Dec 02 '09 at 21:08

score 0 · Answer 5 · edited May 23 '17 at 12:30

Read this: RegEx match open tags except XHTML self-contained tags

This question is like saying how do you do up a bolt with a hammer. Now I'm sure if you were determined enough you could do tighten the bolt with a hammer. However it would be difficult and problematic to say the least and the chances are you would break something by trying.

Take a step back, throw away the assumption that your hammer is the best tool and go back to your tool box, if you dig around in there you will find a better tool its called an HTML parser.

regular expression for textarea

5 Answers5