0

In my legacy project this code executes more than one minute for 10000 elements

private ByteArrayInputStream getInputStreamFromContactFile(MyDTO contacts) {
        long start = System.currentTimeMillis();
        try {
            byte[] bytes = contacts.getLines()
                    .stream()
                    .map(lineItem -> lineItem.value)
                    .reduce(contacts.getHeader().concat("\n"), (partialString, el) -> partialString + el+ '\n')
                    .getBytes();
            return new ByteArrayInputStream(bytes);
        } finally {
            log.info("Duration is {}ms", System.currentTimeMillis() - start);
        }

Is there any obvious way to make it faster ?

gstackoverflow
  • 36,709
  • 117
  • 359
  • 710
  • 1
    You basically just want to join the lines with `\n`? – Leonard Brünings Aug 19 '21 at 14:46
  • Does this answer your question? [Java: convert List to a String](https://stackoverflow.com/questions/1751844/java-convert-liststring-to-a-string) – Hulk Aug 19 '21 at 14:55
  • @WJS As I've already mentioned this code is legacy – gstackoverflow Aug 19 '21 at 15:10
  • Didn't mean `you` personally. But I presume you know what each map/method returns and the expected final result. – WJS Aug 19 '21 at 15:14
  • @WJS I shared all the context I have. I know that this code is semantically correct but slow. – gstackoverflow Aug 19 '21 at 15:19
  • 1
    Is the return type required to be `ByteArrayInputStream` or is an `InputStream` sufficient? Other point: the code is *not* semantically correct. The reduction function is not associative and the identity element is not really an identity element. Java `Stream`’s `reduce` is not a left-fold. – Holger Aug 19 '21 at 18:13

3 Answers3

2

Well, for over 10_000_000 lines of 36 characters each, this ran in under 4 seconds. Not certain if it does what you want though.

private ByteArrayInputStream getInputStreamFromContactFile(MyDTO contacts) {
    long start = System.currentTimeMillis();
    try {
       StringBuilder sb = new StringBuilder(contacts.getHeader()).append("\n");
       for (String lineItem : contacts.getLines()) {
          sb.append(lineItem).append("\n");
        }
        return new ByteArrayInputStream(sb.toString().getBytes());

     } finally {
        log.info("Duration is {}ms", System.currentTimeMillis() - start);
     }
}
WJS
  • 36,363
  • 4
  • 24
  • 39
  • will share my results as soon as my environment is ready – gstackoverflow Aug 19 '21 at 16:27
  • 1
    Yes, it worked. I've shared my results here https://stackoverflow.com/questions/68849950/how-to-improve-pervormance-of-string-processing-using-streamreduce/68850086#comment121681789_68850086 – gstackoverflow Aug 19 '21 at 17:25
  • 2
    Note that `getBytes()` uses the system’s default charset. It’s better to specify a charset explicitly. You can use `ByteBuffer bb = Charset.defaultCharset().encode(CharBuffer.wrap(sb)); return new ByteArrayInputStream(bb.array(), bb.arrayOffset(), bb.limit());` which does the same but encodes the `StringBuilder`’s content directly, without copying it into a new `String` first. – Holger Aug 19 '21 at 18:29
  • @Holger Well of course, It's so obvious. :) I'm going to have to look at the source(s) and see exactly what is going on. That is something I would never have thought to do. – WJS Aug 20 '21 at 01:31
1

If speed is really important using a StringBuilder will help, but look a bit less functional.

StringBuilder builder = new StringBuilder();
builder.append(contacts.getHeader());
builder.append("\n");
contacts.getLines()
    .stream()
    .map(lineItem -> lineItem.value)
    .forEach(line -> {
      builder.append(line);
      builder.append("\n");
    });
builder.toString().getBytes();
Leonard Brünings
  • 12,408
  • 1
  • 46
  • 66
1

In order to improve performance it may be better to use intermediate ByteArrayOutputStream + OutputStreamWriter to concatenate the values.

The byte array of the concatenated result is returned by ByteArrayOutputStream::toByteArray

private ByteArrayInputStream getInputStreamFromContactFile(MyDTO contacts) throws IOException {
    long start = System.currentTimeMillis();
    try {
        ByteArrayOutputStream bos = new ByteArrayOutputStream();
        Writer writer = new OutputStreamWriter(bos);
        writer.write(contacts.getHeader());
        writer.write("\n");

        contacts.getLines().forEach(line -> { 
            try {
                writer.write(line.value);
                writer.write("\n");
            } catch (IOException ioex) { throw new RuntimeException(ioex);}
        });
        writer.flush();

        return new ByteArrayInputStream(bos.toByteArray());
    } finally {
        log.info("Duration is {}ms", System.currentTimeMillis() - start);
    }
}

Another approach could be to use Collectors.joining with prefix and suffix:

private ByteArrayInputStream getInputStreamFromContactFile(MyDTO contacts) {
    long start = System.currentTimeMillis();
    try {
        return new ByteArrayInputStream(
            contacts.getLines()
                .stream()
                .map(item -> item.value)
                .collect(Collectors.joining("\n", contacts.getHeader().concat("\n"), "\n"))
                .getBytes()
        );
    } finally {
        log.info("Duration is {}ms", System.currentTimeMillis() - start);
    }
}

If it is really necessary to use Stream::reduce operation (due to some reason) with StringBuilder, the following approach may be applied:

private static ByteArrayInputStream getInputStreamFromContactFileReducing(MyDTO contacts) {

    long start = System.currentTimeMillis();
    try {
        byte[] bytes = contacts.getLines()
                               .stream()
                               .map(lineItem -> lineItem.value)
                               .reduce(new StringBuilder().append(contacts.getHeader()).append("\n"),
                                       (sb, line) -> sb.append(line).append('\n'),
                                       (sb1, sb2) -> sb1.append(sb2))
                               .toString()
                               .getBytes();
        return new ByteArrayInputStream(bytes);
    } finally {
        log.info("Reducing: Duration is {}ms", System.currentTimeMillis() - start);
    }
}
Nowhere Man
  • 19,170
  • 9
  • 17
  • 42
  • why your second approach should be better than original in my topic? could you please explain? – gstackoverflow Aug 19 '21 at 15:25
  • This form of `Collectors.joining` uses `StringJoiner` which is based on a `StringBuilder` and here one instance of the `StringBuilder` should be created -- it is basically similar to @LeonardBrünings answer. In the original code, concatenation in `reduce` part: `partialString + el+ '\n'` is likely to create multiple Strings / StringBuilders. – Nowhere Man Aug 19 '21 at 15:48