5

I have CharSequence source, int start, int end

I would like to strip all "control characters" from source between start and end and return this as a new CharSequence

by "control character" I mean undeseriable characters like Tab and Return, line feeds and such... basically all that was in ASCII < 32 (space) ... but I don't know how to do it in this "modern age"

what is a char? is it unicode? How can I remove these "control characters" ?

ycomp
  • 8,316
  • 19
  • 57
  • 95
  • Have you checked this: http://stackoverflow.com/questions/4283351/how-to-replace-special-characters-in-a-string – assylias Feb 23 '12 at 16:46
  • Use [`String#replaceAll()`](http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#replaceAll%28java.lang.String,%20java.lang.String%29). – Matt Ball Feb 23 '12 at 16:48

4 Answers4

2

You could use CharSequence.subSequence(int, int) and String.replaceAll(String, String) as follows:

source.subSequence(0, start).toString() + source.subSequence(start, end).toString().replaceAll("\\p{Cntrl}", "") + source.subSequence(end, source.length()).toString()
Go Dan
  • 15,194
  • 6
  • 41
  • 65
1

Assuming that you can get the whole source into memory, you can do this:

String tmp = source.toString();
String prefix = tmp.substring(0, start-1);
String suffix = tmp.substring(end+1);
String middle = tmp.substring(start, end).replaceAll("\\s", "");
CharSequence res = prefix + middle + suffix;
Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523
  • How does this do anything with control characters? – Highland Mark Feb 23 '12 at 16:56
  • 1
    @HighlandMark OP calls "control characters" what's commonly known as "whitespace" (by "control character" I mean undesirable characters like Tab and Return, line feeds and such...); `replaceAll()` removes all such characters from the beginning and from the end of a string. – Sergey Kalinichenko Feb 23 '12 at 17:00
  • Converting a CharSequence to a String will remove any special formatting (for instance bolded characters). – Aaron Oct 04 '16 at 16:06
1

Use Character.isISOControl(char) if using latest Guava library.
Yes char is Unicode.

speksy
  • 700
  • 8
  • 13
Highland Mark
  • 1,002
  • 1
  • 7
  • 12
1

Using Guava's CharMatcher:

return CharMatcher.JAVA_ISO_CONTROL.removeFrom(string);
Louis Wasserman
  • 191,574
  • 25
  • 345
  • 413