6

I have a text file with each paragraph on its own line. Some of the paragraphs got split at the start of a word. For example:

Books are an effective way to 
communicate across time, both from the past and into the future.

I could use regular expressions (regex), in the search and replace them in Notepad++ or Geany, to search for a lower case letter, at the start of a line and replace the \r\n (carriage return+line feed) with a space.
The problem is chapters have a subtitle that comes after the word "or" and the word "or" is on a line by itself. For example:

Chapter 3 
The Importance of Reading 
or
Literature is the most agreeable way of ignoring life

Using that method would put the "or" lines in the titles of the chapters instead of on their own line.

What I want is to tell regex if a line starts with a lowercase letter to match it (replacing the proceeding \r\n with a space) but not if the line is "or\r\n".

ArchRanger
  • 69
  • 1
  • 8
Dave Brunker
  • 1,559
  • 5
  • 15
  • 23
  • 10
    This post is being [discussed on meta](https://meta.stackoverflow.com/questions/419923). – cigien Aug 20 '22 at 22:28
  • 4
    Linked dupe question has many good answers but doesn't seem to be a close dupe as none of the answers there would solve this problem – anubhava Aug 24 '22 at 08:15

1 Answers1

20

It looks like you could use lookarounds—search for:

\h*\R(?=[a-z])(?!or$)

And replace with space. See this demo at regex101 (explanation on the right side).

  • \h matches horizontal space
  • \R matches any newline sequence
  • $ matches end of line (Notepad++'s default)

In Notepad++'s replace dialog, make sure to check [•] Match case and [•] Wrap around.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
bobble bubble
  • 16,888
  • 3
  • 27
  • 46
  • 9
    I must admit I don't know much about regex (just the bare bones basics one picks up when using them every few weeks or so). But, to those six people who voted "This answer is not useful": please elaborate. In what sense is the answer "not useful". It seems to be short, to the point, solving the question that was asked. So please enlighten me: what is wrong with this answer? – CharonX Aug 22 '22 at 13:57
  • @CharonX The important part of the answer is the lookahead. Please enlighten me: how does the answer explain that? Neither the prose nor the formulaic listing even identify the two key parts of the expression. It is a fire-and-forget pattern to copy/paste, not something to be understood and adapted. – MisterMiyagi Aug 22 '22 at 17:01
  • 2
    @MisterMiyagi The regex101 link has an explanation for the entire regular expression, although it may have been helpful to copy it over. – Unmitigated Aug 22 '22 at 20:13
  • 1
    @Unmitigated Links are for extra information and should be considered potentially inaccessible. Including the information in the answer would not just have been *helpful* - it would have made the answer generally *useful*. That the answer already does explain the superfluous parts makes the lack all the more obvious. – MisterMiyagi Aug 23 '22 at 05:13
  • 2
    @MisterMiyagi Yes, if you see in the first line I've had put [this link](https://www.regular-expressions.info/lookaround.html) on the word *lookarounds* (it's one of the most popular sources for learning about). – bobble bubble Aug 23 '22 at 06:29