1

I need to remove the characters '¼' and '½' from a string in Java. How can I do that?

I do not have control over the request, I have to accept what the input comes as. Basically the request string has the characters '¼' and '½', and I need to replace these with the empty string.

I tried putting it in a list and it did not work:

invalidChars.add('½');

and then search through this. Jenkins complains: unmappable character for encoding UTF-8 [INFO] 2 errors

Abhijit Deb
  • 39
  • 2
  • 9

2 Answers2

2

Technically speaking, you can write:

input = input.replace("\u00BC", "").replace("\u00BD", "");

. . . but I'm a bit suspicious of your use-case. It seems like this won't fix whatever the real problem is.

ruakh
  • 175,680
  • 26
  • 273
  • 307
  • This is a quite inefficient way about it because it will both search twice and allocate the whole request string twice. A regex character class is a much better choice. – Marko Topolnik May 30 '13 at 14:24
  • but it's simpler and who cares about efficiency unless the strings are huge. – Jason S May 30 '13 at 14:36
  • @MarkoTopolnik: Searching twice for a substring, vs. searching once for a regex character-class -- to decide which is more efficient, you'd have to test. (Regexes aren't magical, y'know. `"[\u00BC\u00BD]"` would still have to compare each character of the input-string to both possible values.) Allocating the string twice -- well, the usual regex way is `input.replaceAll("[\u00BC\u00BD]", "")`, which has to create a `Pattern` and a `Matcher` every time, as well as the result-string. *[continued]* – ruakh May 30 '13 at 14:57
  • *[continued]* Overall, since my approach involves copying the entire string twice, it might well perform slightly worse on very large strings, but it's hard to be sure: the number of copies differs only by a small constant factor, and the performance of linear-regex-search vs. substring-search differs by at *least* that, so it's impossible to decide *a priori* which one would end up being faster. – ruakh May 30 '13 at 14:59
  • If we're discussing performance, naturally I assume non-trivial string length. Going over it twice is without doubt less efficient due to cache misses. Also, I assume precompiled Regex. I have tested such things before and the difference is real. – Marko Topolnik May 30 '13 at 15:03
  • @jasons It's one method call against two and less characters. It obviously scales dramatically better. I wonder how it's more complex. As for performance, if it comes free with The right way to do it, rejecting it is just shooting yourself in the foot. – Marko Topolnik May 30 '13 at 15:14
  • @MarkoTopolnik: You say "I assume non-trivial string length" and "I assume precompiled Regex" as though we were discussing this in the abstract, rather than discussing an actual real-world use case -- one where you've made claims with terms like "quite inefficient" and "much better choice". The fact is, in most cases it's better to write code in the simplest, clearest way than to write it in the most efficient way, because in most cases the difference in simplicity and clarity is greater than the difference in efficiency. – ruakh May 30 '13 at 15:14
  • My way *is* simple and clear. If you disagree, that's just your opinion against mine. – Marko Topolnik May 30 '13 at 15:16
  • It is one method call in your code. I am replying to the claim that Regex code is more complex. – Marko Topolnik May 30 '13 at 15:18
  • @MarkoTopolnik: This whole thing is just opinion. You have no evidence to support your assumptions about efficiency. Maybe this code is called in a very tight loop over a very large number of very small strings, in which case my version will be more efficient than yours. – ruakh May 30 '13 at 15:19
  • As for total method calls executed, did you really inspect both implementations? String#replace is no one-liner, either. – Marko Topolnik May 30 '13 at 15:21
  • It is about processing a *request*. It is not a tight loop, but even if it was, I don't know where you get the assumption that regexes are inefficient. – Marko Topolnik May 30 '13 at 15:25
  • I have just completed the measurements. On an 800-char string, your `replace` is 50% slower than `replaceAll`, which means even when I compile the regex each time, it still beats your solution by a wide margin on a sub-1K string. – Marko Topolnik May 30 '13 at 15:42
  • @MarkoTopolnik: Re: "String#replace is no one-liner, either": Oh, absolutely. Neither one is native code. And I'm really not arguing against regexes; I use them often, and considered using them for this answer, before deciding that `replace` would be simpler and easier for the OP. Rather, I'm arguing against the claims in your initial comment ("quite inefficient", "much better choice"). – ruakh May 30 '13 at 16:20
  • The strongest case I have against replace is that it scales badly with each futher char to delete. – Marko Topolnik May 30 '13 at 16:54
0

If you assume that the string is a variable you can do this

public void asciiReplacer(String str)
{
String[]tmp= str.split(" ");
 for(int i =0; i < temp.length ; i++){
  if(temp[i].equals((char)189)
      temp[i] = " ";
  if(temp[i].equals((char)188)
      temp[i] = " ";
  System.out.println(temp[i]+"\n");
 }
}
Lucarnosky
  • 514
  • 4
  • 18