46

Pretty basic question for someone who knows.

Instead of getting from

"This is my text. 

And here is a new line"

To:

"This is my text. And here is a new line"

I get:

"This is my text.And here is a new line.

Any idea why?

L.replaceAll("[\\\t|\\\n|\\\r]","\\\s");

I think I found the culprit.

On the next line I do the following:

L.replaceAll( "[^a-zA-Z0-9|^!|^?|^.|^\\s]", "");

And this seems to be causing my issue.

Any idea why?

I am obviously trying to do the following: remove all non-chars, and remove all new lines.

Alan Moore
  • 73,866
  • 12
  • 100
  • 156
jason m
  • 6,519
  • 20
  • 69
  • 122

7 Answers7

72

\s is a shortcut for whitespace characters in regex. It has no meaning in a string. ==> You can't use it in your replacement string. There you need to put exactly the character(s) that you want to insert. If this is a space just use " " as replacement.

The other thing is: Why do you use 3 backslashes as escape sequence? Two are enough in Java. And you don't need a | (alternation operator) in a character class.

L.replaceAll("[\\t\\n\\r]+"," ");

Remark

L is not changed. If you want to have a result you need to do

String result =     L.replaceAll("[\\t\\n\\r]+"," ");

Test code:

String in = "This is my text.\n\nAnd here is a new line";
System.out.println(in);

String out = in.replaceAll("[\\t\\n\\r]+"," ");
System.out.println(out);
JeanValjean
  • 17,172
  • 23
  • 113
  • 157
stema
  • 90,351
  • 20
  • 107
  • 135
  • `L.replaceAll("[\\t|\\n|\\r]"," ");` also appears to not work though. – jason m Jun 15 '12 at 10:39
  • 1
    You might want to add a `+` after the character class, so you only get one space for each sequence of whitespace characters. – Tim Pietzcker Jun 15 '12 at 10:43
  • @jasonm I updated my solution. The changed string is **returned**, `L` is not changed. Is this your problem? – stema Jun 15 '12 at 10:50
  • ` //How to remove all new and reutnrs String _L = L.replaceAll("[\\r?\\n|\\t]+"," "); //Normalize for all noncharss _L = _L.replaceAll( "[^a-zA-Z0-9|^!|^?|^.|^\\s]", ""); //Normalize for the sentences _L = _L.replaceAll("[?|!]", ".");` – jason m Jun 15 '12 at 10:51
  • the above are the three lines of code i am using the clean the results. – jason m Jun 15 '12 at 10:51
  • @jasonm, I think you need to clarify your question (within your question, not as comment) and explain what you want to achieve, what is the expected output? – stema Jun 15 '12 at 11:00
15

The new line separator is different for different OS-es - '\r\n' for Windows and '\n' for Linux.

To be safe, you can use regex pattern \R - the linebreak matcher introduced with Java 8:

String inlinedText = text.replaceAll("\\R", " ");
Dimitar II
  • 2,299
  • 32
  • 33
  • Nice! Although to replace linebreak and spurious whitespace introduced by indentation? \R[\s]{2,} ? – ndtreviv Jun 08 '22 at 12:27
10

Try

L.replaceAll("(\\t|\\r?\\n)+", " ");

Depending on the system a linefeed is either \r\n or just \n.

Keppil
  • 45,603
  • 8
  • 97
  • 119
  • This is wrong. `?` means a literal question mark in a character class (just as `|` is a literal here too). And four backslashes are overkill, even for Java. EDIT: But that's @adarshr's fault for not taking out the extra backslashes when reformatting the answer :) – Tim Pietzcker Jun 15 '12 at 10:44
  • Oh, sorry, I didn't realise that. Glad it is fixed now. – adarshr Jun 15 '12 at 10:52
  • Thanks for this alternative answer. I was trying to solve a similar problem - replacing all OS' newlines with my host's standard newline, and this approach can be adapted for that problem - here's the regex: "(\\r?\\n)+" – Ian Durkan Aug 22 '13 at 22:21
  • @Keppli I extended your way by `|\\n`, so no matter which kind of linefeed does exists. – Reporter Feb 19 '14 at 14:01
3

Your regex is good altough I would replace it with the empty string

String resultString = subjectString.replaceAll("[\t\n\r]", "");

You expect a space between "text." and "And" right?

I get that space when I try the regex by copying your sample

"This is my text. "

So all is well here. Maybe if you just replace it with the empty string it will work. I don't know why you replace it with \s. And the alternation | is not necessary in a character class.

buckley
  • 13,690
  • 3
  • 53
  • 61
  • I'm not doing jave but is it realy necessary to put an extra escape characters like \\t ? Isn't \t enough? – buckley Jun 15 '12 at 10:41
  • I think for `\t`, `\r` and `\n` you will get away with a single backslash. The regex engine will then not be given the string `\n` (which it would internally interpret as a newline character) but the newline character directly (0x0A, I think). But since all other backslashes in regexes do need to be doubled (`\s` or `\b` for example), it's convention to always use double backslashes. – Tim Pietzcker Jun 15 '12 at 10:52
  • @jasonm Can you save your file so we can download it? That will certainly allow us to diagnose what's going on – buckley Jun 15 '12 at 13:23
  • ...and again, the `|` does not belong there. There's no need to specify OR in a character class, so `|` just matches a literal `|`. – Alan Moore Jun 16 '12 at 02:23
  • @AlanMoore Of course, alternation is not necessary in a char class – buckley Jun 16 '12 at 09:54
3

I found this.

String newString = string.replaceAll("\n", " ");

Although, as you have a double line, you will get a double space. I guess you could then do another replace all to replace double spaces with a single one.

If that doesn't work try doing:

string.replaceAll(System.getProperty("line.separator"), " ");

If I create lines in "string" by using "\n" I had to use "\n" in the regex. If I used System.getProperty() I had to use that.

Community
  • 1
  • 1
TikkaBhuna
  • 515
  • 2
  • 6
0

You May use first split and rejoin it using white space. it will work sure.

String[] Larray = L.split("[\\n]+");
L = "";
for(int i = 0; i<Larray.lengh; i++){
   L = L+" "+Larray[i];  
}
Ishwar
  • 11
  • 1
0

This should take care of space, tab and newline:

data = data.replaceAll("[ \t\n\r]*", " ");
Ani Menon
  • 27,209
  • 16
  • 105
  • 126