2

I would like to combine two regex functions to clean up some textarea input. I wonder if it is even possible, or if I should keep it two separate ones (which work fine but aren't looking as pretty or clean).

I have adjusted either so that they utilize global and multiline (/gm) and are replaced by nothing (''). I tried with brackets and vertical/or lines in any position, but it never ends up giving the expected result, so I can only assume there is a way that I have overlooked or that I should keep it as is.

Regex 1: /^\s+[\r\n]/gm

Regex 2: /^\s+| +(?= )|\s+$/gm

Currently in JavaScript: string.replace(/^\s+[\r\n]/gm,'').replace(/^\s+| +(?= )|\s+$/gm,'')

The goal is to remove:

  • Empty spaces in the beginning and end of each line
  • Empty lines (including any in the very beginning and end)
  • Double spaces

Without it ending up on one and the same line. The single line breaks (\r\n) should still be there in the end.

Regex 1 is to remove any empty line (^\s+[\r\n]), Regex 2 does the trimming of whitespaces in the beginning (^\s+) and end (\s+$), and removes double (and triple, quadriple, etc) spaces in between (+(?= )).

Input:


   Let's  
make   this
 look

 a    little


    nicer   
  and 
more   

readible


Output:

Let's
make this
look
a little
nicer
and
more
readible

Edit: Many thanks to Wiktor Stribiżew and his comment for this complete solution:

/^\s*$[\r\n]*|^[^\S\r\n]+|[^\S\r\n]+$|([^\S\r\n]){2,}|\s+$(?![^])/gm

Toine Lille
  • 55
  • 1
  • 7
  • 1
    Try `s.replace(/^\s*$[\r\n]*|^[^\S\r\n]+|[^\S\r\n]+$|([^\S\r\n]){2,}/gm, '$1')`. To also remove the trailing line breaks, add `|\s+$(?![^])` to the end of the pattern. – Wiktor Stribiżew Jan 28 '20 at 21:11
  • Start with `/^\s*\n|^\s+/gm` to remove empty lines and empty spaces in the beginning and end of lines. It doesn't cover the double spaces between words. – Bojan Bedrač Jan 28 '20 at 21:47
  • @BojanBedrač it definitely does most of it, especially when adding `+(?= )` for the double spaces and Wiktor's `|\s+$(?![^])` for the trailing line break. Unfortunately, spaces in the end of the lines are still there. – Toine Lille Jan 29 '20 at 08:28
  • 1
    @BojanBedrač This hackery, however, would work: `/^\s*[\r\n]|^\s+| +(?= )| +$|\s+$(?![^])/gm` – Toine Lille Jan 29 '20 at 08:58

1 Answers1

2

I'd suggest the following expression with a substitution template "$1$2" (demo):

/^\s*|\s*$|\s*(\r?\n)\s*|(\s)\s+/g

Explanation:

  • ^\s* - matches whitespace from the text beginning
  • \s*$ - matches whitespace from the text ending
  • \s*(\r?\n)\s* - matches whitespace between two words located in different lines, captures one CRLF to group $1
  • (\s)\s+ - captures the first whitespace char in a sequence of 2+ whitespace chars to group $2
AndreyCh
  • 1,298
  • 1
  • 14
  • 16