4

I'm switching from C to Java. I'm wondering about how to find a string inside a bytebuffer, is there something like memchr in java? The bytebuffer is only partly a string, the rest is raw bytes so any java method has to work on bytes + chars.

I am also searching for something like strsep in java to split strings.

Jonas
  • 121,568
  • 97
  • 310
  • 388
Blub
  • 13,014
  • 18
  • 75
  • 102

5 Answers5

5

You can convert the ByteBuffer into a String and use indexOf which likely to work.

ByteBuffer bb = /* non-direct byte buffer */
String text = new String(bb.array(), 0, bb.position(), bb.remaing());
int index = text.indexOf(searchText);

This has a non-trivial overhead as it creates a String. The alternative is a brute force String search which will be faster but takes time to write.

Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
  • This String constructor is deprecated, because it doesn't take the character encoding into account. Suggested: `String text = new String(bb.array(), 0, bb.position(), charset);` where `charset` is the encoding to use, or the default one `Charset.defaultCharset()` – mins Aug 24 '14 at 13:11
  • If you are reading raw C string it is most likely ISO-8859-1 encoded in which case this method is fine. Being explicit doesn't hurt performance much so it being clear is perhaps better. – Peter Lawrey Aug 24 '14 at 16:57
  • 2
    Downside to this approach, and similar, is that you have to read the whole string - not a streaming solution. – Jmoney38 Jun 20 '18 at 23:24
4

You would need to encode the character string into bytes using the correct character encoding for your application. Then use a string search algorithm like Rabin-Karp or Boyer-Moore to find the resulting byte sequence within the buffer. Or, if your buffers are small, you could just perform a brute force search.

I'm not aware of any open source implementations of these search algorithms, and they aren't part of core Java.

erickson
  • 265,237
  • 58
  • 395
  • 493
1

From Fastest way to find a string in a text file with java:

The best realization I've found in MIMEParser: https://github.com/samskivert/ikvm-openjdk/blob/master/build/linux-amd64/impsrc/com/sun/xml/internal/org/jvnet/mimepull/MIMEParser.java

/**
  * Finds the boundary in the given buffer using Boyer-Moore algo.
  * Copied from java.util.regex.Pattern.java
  *
  * @param mybuf boundary to be searched in this mybuf
  * @param off start index in mybuf
  * @param len number of bytes in mybuf
  *
  * @return -1 if there is no match or index where the match starts
  */

  private int match(byte[] mybuf, int off, int len) {

Needed also:

  private void compileBoundaryPattern();
Community
  • 1
  • 1
Grigory Kislin
  • 16,647
  • 10
  • 125
  • 197
0

The String class has a nice split method String.split

Fortyrunner
  • 12,702
  • 4
  • 31
  • 54
0

One option is to use a StringTokenizer, which can split the string into an iterable collection of tokens according to given delimiter(s). The tokens collection can contain the delimiter if needed. Example:

String s = "abc:def-ghi|jkl";
StringTokenizer tokenizer = new StringTokenizer(s, ":-|");
while (tokenizer.hasMoreTokens()) {
  System.out.print(tokenizer.nextToken());
}

Expected result:

abcdefghijkl

Yuval
  • 7,987
  • 12
  • 40
  • 54