-4

I'm working on a regular expression but I just can't make it work.

With a text like this:

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nunc ultricies congue feugiat. 

Tom wrote:
> blablabla
> this is very interesting 
> blabla blabla
> I ate a apple yesterday
> this is very interesting 
> blabla blabla

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nunc ultricies congue feugiat. 

I Would like to match the whole part of the text that starts with "Tom wrote:" and that ends with last line starting with a ">" in order to remove it and to get only the 2 lorem ipsum sentences.

I don't know if I'm clear enough...

Edit: I was thinking about a regexp that match everything that starts with "\w wrote: " and that stops when it matches a newline followed by a char that is not a ">"

Edit2: Found a solution :

\w+ wrote:(\n>[^\n]*)*
ROMANIA_engineer
  • 54,432
  • 29
  • 203
  • 199
philippe87
  • 167
  • 1
  • 1
  • 10
  • We still would like to know what you got so far first. – Martijn Pieters Oct 30 '12 at 15:22
  • I tried: (\w+ wrote:)|(>.*?\n) But it doesn't take the last line if it doesn't end with a newline Plus it isn't correct as it would also remove lines with starts with a ">" but which are not preceded by "xxx wrote :" – philippe87 Oct 30 '12 at 15:26
  • Giving your question a good title means that other people with a similar issue can find your question and (hopefully) an answer. – Lee Taylor Oct 30 '12 at 15:29
  • Why do you all rate it down.... – philippe87 Oct 30 '12 at 15:34
  • You may have wanted this, but your posted solution will match lines that look like "Bob wrote:" even if it has no lines starting with ">" after it. – Matt Oct 31 '12 at 11:38

3 Answers3

1

Practice your regular expressions in a tool like this one. Makes it very easy to visualize what your regex is doing.

Give this one a go and adjust as necessary:

Tom wrote:(.|\s)*>.*

ubiquibacon
  • 10,451
  • 28
  • 109
  • 179
1

Matching regular expressions across multiple lines requires specifying multiline matching.

See this answer for details: Regular expression matching a multiline block of text

Community
  • 1
  • 1
Igor Levicki
  • 1,017
  • 10
  • 17
1

It looks like this is what you want (adjust newline characters as necessary for your system)

\w+ wrote:\n(>.*\n)*(>.*)

http://regexr.com?32l21

Matt
  • 3,651
  • 3
  • 16
  • 35