1

I'm writing a macro in Excel 2010 in order to remove line breaks in multiple cells of a column. Actually, I have 5 cases for which I have to search within the cells. I enumerate them as follows:

  1. If there's a dot or a comma followed by a blank space and then a newline(\n), replace it with dot or comma( whatever it had before regexp applied) followed by a blank space
  2. If there's a character followed by blank space and then newline, replace it with character and blank space
  3. If there's a character followed by a newline, replace it with character + blank space
  4. If there's a character followed by two blank spaces and then a newline, replace it with character + blank space
  5. If there's a dot at the end of the line, leave it like that cause it's a finishing dot.

As you can see, 2 and 3 are really similar, so I think the regexp could be something like [a-zA-Z0-9]\n but I don't know...first, if it's correct to search for a newline just adding \n and asecond, how to search for a blank space. After detecting the regexp, what follows I think it could be solved with a single .Replace(Text,"regexp","regexp ") where the ending blank space is from the form "char" + " "
So basically my question is, what could be the regexp for this patterns? In the 5th case, how can I search for the line terminator so it does not try to search for a newline after the last dot of a parragraph.
Can I use Chr(10) for newline and Chr(32) for space?
Btw, I've been following these references:
How to use RegExp
VBA Split strings

Community
  • 1
  • 1
Edgar Sampere
  • 263
  • 4
  • 24

1 Answers1

3

This pattern will find any alphanumeric character, ., or , followed by optional whitespace and then a newline and replace it with the matched ending character, followed by one space.

Dim re
Set re = CreateObject("VBScript.RegExp")
re.Pattern = "([\w.,])\s*\n"
strText = re.Replace(strText, "$1 ")

Output from testing. Ignore the brackets. They're just there to show spaces.

[the end.]     = [the end.]
[the end.\n]   = [the end. ]
[the end. \n]  = [the end. ]
[the end.  \n] = [the end. ]
[the end,]     = [the end,]
[the end,\n]   = [the end, ]
[the end, \n]  = [the end, ]
[the end,  \n] = [the end, ]
[the end]      = [the end]
[the end\n]    = [the end ]
[the end \n]   = [the end ]
[the end  \n]  = [the end ]
Bond
  • 16,071
  • 6
  • 30
  • 53
  • nit-picking: `\w` also matches the underscore `_` e.g: `\w` = `[A-Za-z0-9_]` (in vbscript) – Ron Rosenfeld Aug 12 '15 at 17:54
  • The line `strText = re.Replace(strText, "$1 ")` is replacing the patterno found directly in the text searched, right? And the `$1` is the result of the regexp processing? – Edgar Sampere Aug 12 '15 at 17:55
  • @Kronos - The `$1` represents the first capture group (in this case, the ending character, `.`, or `,`). So it replaces the entire pattern found with just the ending character plus a space. Also, as @Ron mentioned, the `\w` includes the underscore character. If you don't want to count that, we'll need to use `[a-zA-Z0-9,.]` instead. – Bond Aug 12 '15 at 17:57
  • 1
    What to capture depends on what he meant by **character** in his original question. Might be as simple as `(.)` ;-). – Ron Rosenfeld Aug 12 '15 at 18:00
  • @RonRosenfeld - Yep. I made sure to specify _alphanumeric_ in my answer but I forgot about the `_`. – Bond Aug 12 '15 at 18:00
  • Also, of possible interest, is that in vbscript, characters that are not in that set are not matched. So some foreign alphabet characters, and Unicode characters, would be excluded. – Ron Rosenfeld Aug 12 '15 at 18:06
  • By character I mean any alphanumeric character, underscore included but it's little chance it have to be considered cause it's not used. – Edgar Sampere Aug 12 '15 at 18:07
  • That worked! Thanks both. The implementation was easier this way – Edgar Sampere Aug 12 '15 at 18:59
  • Sure, don't worry about it. Now I am trying to define my function to be called with a range of cells selected by the user. I tried giving it a range argument and trying to loop with a `For Each` every row within the column range but it gives me an error. Do you know how to achieve this? I found this [link](http://stackoverflow.com/questions/18168151/vba-pass-a-group-of-cells-as-range-to-function) – Edgar Sampere Aug 12 '15 at 19:27