0

I'm trying to split a text of n phrases into paragraphs using regular expressions (i.e. : after a certain number of phrases, begin a new paragraph) with Notepad++.

I have come up with the following regex (in this case, every 3 phrases -> new paragraph) :

(([\S\s]*?)(\.)){3}

So far so good. However, how do I match the phrases now? $1, $2 will only match the braces..

Example text:

Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

Desired result (using a count of 2):

Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.

Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

Smandoli
  • 6,919
  • 3
  • 49
  • 83
Mihai Galos
  • 1,707
  • 1
  • 19
  • 38

3 Answers3

2

How about:

Find what: ((?:[^.]+\.){2})
Replace with: $1\n

Toto
  • 89,455
  • 62
  • 89
  • 125
  • That helps a lot M42, thanks. I however also want to remove the \n\r from the matched lines. Edit : The original text was multi-line, although when one copy-pastes it on Notepad++, one gets a single long line. – Mihai Galos May 14 '14 at 11:40
  • @MihaiGALOS: It'll be too complex in a a single regex. I suggest you first remove all the line break then apply this regex. – Toto May 14 '14 at 11:45
1

Find using this pattern:

((.*?\.){2})

Breaking it down a bit...

The inner parentheses ...

 (     )

... provide the group which is affected by {2}.

The outer parentheses ...

(          )

...provide the delimiters for the replace pattern. Since they are "top-level", they are what the replace pattern \1 will attach to.

Note the outer parentheses have to enclose the {2}. I'm not good at thinking through how regex will handle everything, but fortunately Notepad++ offers instant confirmation -- just press "Find" to watch it jump through the matches.

The replace pattern is followed by your return and new line, so the whole string looks like this:

\1\r\n

If you want an optional space, make sure you add \s? ... probably like this, but I didn't test it.:

((.*?\.\s?){2})

If the issue is inserting a space with the results, just add a space (or two, if you're old-school like me) to the replace pattern:

\1 \r\n
Smandoli
  • 6,919
  • 3
  • 49
  • 83
  • Can you add an optional space as well? To discard -- as it is, all new "phrases" will begin with the space after each period. – Jongware May 13 '14 at 23:13
  • This would be a much better answer if it explained to the user what this change does... – Alex Angas May 13 '14 at 23:57
  • @AlexAngas No problem, I explain the change in my answer which is basically same but posted earlier. – Hans Schindler May 14 '14 at 00:19
  • @AlexAngas - quite right, thanks for the prompt. I was called away from my computer, but now I hope I have done more for the cause. – Smandoli May 14 '14 at 03:34
  • Thank you for the responses. The original text has linebreaks after each line, with phrases (or sentences if you will) spanning over multiple lines. When you copy-paste it in Notepad++ from stackoverflow, the linefeeds+carriage returs are not pasted. The original text should look like/ ... ut labore et\n\r ... ut aliquip \n\r this explains why I used [\s\S]* in my original regexp, to ignore line breaks. I also used positive lookahead, until a dot is reached. The regex works perfectly but I need to replace the prases (sentences) with the matching pattern now. – Mihai Galos May 14 '14 at 11:35
0

To find n sentence that end with period is quite easy. For instance for two sentence

(?:.*?\.){2}

To make it a paragraph (insert new line) you replace with

$0\r\n\r\n

This insert two carriage return + line feed which is the Windows way of marking new line. On Unix files \n\n would be enough. If you only want one line break, just do $0\r\n\r\n

If you want to make it htlm paragraph same search, you can replace with

<p>$0</p>

Hans Schindler
  • 1,787
  • 4
  • 16
  • 25
  • I wanted to up-vote your answer, and I didn't mind suggesting fixes; but there was quite a lot to fix, so I made my own. But I felt it was a good contribution anyway. – Smandoli May 13 '14 at 23:11
  • @Smandoli I don't think theres a lot to fix. You just wrote same answer with Group#1 instead of overall match i.e. Group#0 like me. – Hans Schindler May 14 '14 at 00:18
  • Your original answer caused all text to disappear. Your current answer works. With your knowledge of regex, the edit was probably very small. My knowledge is not so great and I had to revise more. Then too, I had Notepad++ open so I could test ... :-) – Smandoli May 14 '14 at 03:20
  • On a separate note, the use of regex on html that you proposed MAY be okay in this case, but in general you would want to be VERY careful with that ... http://stackoverflow.com/questions/1732348 – Smandoli May 14 '14 at 03:40