7

I have a big string having at most 100000 character. Instead of using string.charAt[index] to read a character from the string, I converted that string into char array using string.toCharArray() method and now i am working with charArray[index]. which takes less time than string.charAt[index] method. However i want to know that, is there any other way which is faster than string.toCharArray(); method?

ravi
  • 6,140
  • 18
  • 77
  • 154
  • How did you determine that that `string.charAt(index)` is slower? I wouldn't think it would be slower. – Louis Wasserman Mar 24 '12 at 11:35
  • 1
    For your convenience, maybe I can suggest using [StringReader](http://docs.oracle.com/javase/6/docs/api/java/io/StringReader.html) – Jakub Zaverka Mar 24 '12 at 11:37
  • I noticed the `System.getTimeinMills()` and `string.charAt(index)` also uses the array indexing. so better to have an array. – ravi Mar 24 '12 at 13:22
  • 1
    @Ravi Joshi: *"using string.charAt[index] to read a character from the string"*... String's *charAt* does *not* read a character from the String. It reads a Java *char*, which is totally inadequate to hold all the Unicode characters. A character, since Java 1.4, may need more than one Java *char* to be represented using *char*. A website like Stackoverflow, for example, fully supports Unicode and all the Unicode codepoints. Java's *char* primitive does not. – TacticalCoder Mar 24 '12 at 14:33
  • @TacticalCoder: Thank you for this information. i did not aware of this fact. However in my case the string is composed of only lower case alphabets i.e. a-z. – ravi Mar 24 '12 at 20:30
  • @TacticalCoder : what you say is wrong. A char primitive IS a unicode character. Maybe you are confusing with the byte primitive ? From the official doc : "The char data type is a single 16-bit Unicode character." source : http://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html Example : char rr = '華'; – Pierre Henry Nov 01 '13 at 14:34
  • @Pierre Henry: no, I am not confusing anything ; ) Many Unicode codepoints need two Java char to be encoded. By using the .charAt(...) method on such a Unicode codepoint you'd be reading only part of that codepoint. That is why in this day and age methods like *charAt* and *length* are mostly broken. You want to use *codePointAt* instead. Example: how do you put the character 'U+1040B' inside a Java *char*? You simply can't do it. See answer from 100K+ SO user here: http://stackoverflow.com/questions/12280801 (*"...a Java char holds a UTF-16 code unit instead of a Unicode character..."*) – TacticalCoder Nov 03 '13 at 17:45
  • Yes, you are right, sorry about that. I was persuaded that unicode used only 16 bits max. Thanks for pointing this out. I am not looking forward to have to work with those "astral" planes ;) – Pierre Henry Nov 07 '13 at 15:42
  • This problem already discussed in stack http://stackoverflow.com/questions/8894258/fastest-way-to-iterate-over-all-the-chars-in-a-string – it's me Sep 04 '14 at 16:38

1 Answers1

1

I do not think there is a faster way. But please correct me!

A String instance is backed by a char array. charAt() does some index checks which may be the cause for it being slower than working with the array returned by toCharArray(). toCharArray() simply does a System.arraycopy() of the backing array.

nansen
  • 2,912
  • 1
  • 20
  • 33
  • while using `string.charAt[index]`, Is every time an `char[string.length()]` is created? If yes, then it may be the reason of its less performance. – ravi Mar 24 '12 at 13:26