1

I've been trying around to get this working with regular expressions but I just keep failing, so maybe someone more experienced with this can help?

How can I render a string close to the way any webbrowser renders a html string? Example HTML:

<html>
  Hel
lo 
  how
 are   you
</html>

Is rendered:

Hel lo how are you

I want it to be

Hello how are you

So the difference to html is that a newline without explicit spaces is just removed. In java this string would look like this:

\tHel\nlo \n  how\n are    you

My current solution:

// remove linebreaks and tabs and any leading or trailing whitespace
// this is necessary to avoid converting \t or \n to a space
script = script.replaceAll("\\s+\n\\s+", "");
script = script.replaceAll("\\s+\t\\s+", "");
// remove any length of whitespace and replace it with one
script = script.replaceAll("\\s+", " ");
// rewmove leading and trailing whitespaces
script = script.trim();

Has only one problem: If I have a line with a trailing space followed by a newline and some more text, the trailing space will be removed:

Hello \nhow are you?

will be reduced to

Hellohow are you

So, using underscore (_) as space marker the following should be true:

_ = _
__ = _
\t\n_ = _
_\t\n = _
\t_\n = _
_\t_\n_ = _
\n = // nothing
\t = // nothing
\t\n = // nothing

What combination of replaceAll(regex, string) would I need to use?

dpr
  • 10,591
  • 3
  • 41
  • 71
Pete
  • 10,720
  • 25
  • 94
  • 139
  • You're looking for regex. – SLaks May 17 '17 at 14:46
  • 3
    Possible duplicate of [Java how to replace 2 or more spaces with single space in string and delete leading spaces only](http://stackoverflow.com/questions/2932392/java-how-to-replace-2-or-more-spaces-with-single-space-in-string-and-delete-lead) – Arnaud May 17 '17 at 14:47
  • .replaceAll(" +"," ") – mike May 17 '17 at 14:47
  • What have you tried so far and what's the problem. I suppose you figured out the obvious regex by yourself?! – dpr May 17 '17 at 14:52
  • I've completely reworked the question also including my current solution – Pete May 18 '17 at 05:49
  • How about `Hello\nhow are you?`. What should be the desired output after normalization? – dpr May 18 '17 at 06:49

2 Answers2

3

I think - given your current examples - you want to change your replace function to this:

// remove any newlines or tabs (leading or trailing whitespace doesn't matter)
script = script.replaceAll("(\\\t|\\\n)", "");
// boil down remaining whitespace to a single space
script = script.replaceAll("\\s+", " ");
script = script.trim();

This will of course cause something like

Hello\nhow are you?

to be reduced to

Hellohow are you?

But this is something that is an inherent consequence of your requirement.

dpr
  • 10,591
  • 3
  • 41
  • 71
1

The regex \s matches all whitespace. Therefore I believe you just need myString.trim().replaceAll("\\s+", " ");

bradimus
  • 2,472
  • 1
  • 16
  • 23
dumptruckman
  • 114
  • 7
  • Won't that leave one leading space? – bradimus May 17 '17 at 14:49
  • 3
    True, you could just do `myString.trim()` first. – dumptruckman May 17 '17 at 14:51
  • I've already tried it that way. I've updated my question explaining why that is not enough. I need to explicitly differentiate between spaces and tabs / newlines – Pete May 18 '17 at 04:45
  • I'll have a look at a new solution tomorrow but in the mean time, can you tell me why you think trim will change `Hi \nthere` to `Hithere`? If it is the same string, that space in between is not considered trailing... – dumptruckman May 18 '17 at 05:04
  • Sorry, updated the comment. Of course it's not "trimmed" but my current regex replacements are reducing it in an unwanted way. – Pete May 18 '17 at 05:08