3

I have an url like: String url = "https://.../foo/a/555/data1";

Goal: Transform the url to the string: a555data1

I want to build this result traversing the string only once. I decided for the following process:

  1. I want to "stream" the string starting from the back.
  2. If it is not a backslash insert/append at the front of a deque.
  3. If it is the third backslash end

I have successfully written a horrible solution below, can it be made pretty using streams?

Deque<String> lifo = new ArrayDeque<>();

int count = 0;
for (int i = testUrl.length() - 1; count < 3 ; --i) {
    if (testUrl.codePointAt(i) == ((int) '/') ) {
        ++count;
        continue;
    }

    result.addFirst(testUrl.substring(i,i+1));

}

String foo = result.stream().collect(Collectors.joining());
assertThat(foo).isEqualTo("a606KAM1");
Stefan Zobel
  • 3,182
  • 7
  • 28
  • 38
Shakka
  • 137
  • 8

6 Answers6

4

An alternative solution without loops and streams:

String[] split = url.split("/");
int n = split.length;
return split[n - 3] + split[n - 2] + split[n - 1];
Schidu Luca
  • 3,897
  • 1
  • 12
  • 27
  • Thats a nice solution if you never need to change amount of slashes you have to pass. – Worthless Nov 14 '18 at 14:26
  • @Worthless personally, I [would do it with a regex](https://stackoverflow.com/a/53302629/1059372), but this solution is the easiest one to read. – Eugene Nov 14 '18 at 14:36
4

Another way would be a regex:

String result = url.replaceAll(".+/(.+)/(.+)/(.+)", "$1$2$3");
Eugene
  • 117,005
  • 15
  • 201
  • 306
  • this is the slowest one, see my answer; this does not matter in tests, however regexps should be used with care in production code – Alex Nov 14 '18 at 15:14
  • @Alex have you ever thought that may be your [tests are wrong?](https://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java) – Eugene Nov 14 '18 at 15:19
  • they very well may be, but you are free to look at the code and point any mistakes to me or do your own benchmarking – Alex Nov 14 '18 at 15:30
  • @Alex I am at the same time free not to do that, also. – Eugene Nov 14 '18 at 15:31
  • @Alex when you really want a fast solution, you would use neither of these solutions, as it’s not the chosen framework, but the data movement that matters here. Compare with [this answer](https://stackoverflow.com/a/53308377/2711488). However, you should really check your test code against the points of [this answer](https://stackoverflow.com/a/513259/2711488). – Holger Nov 14 '18 at 20:41
  • @Holger so no one seems to trust my developer skills :) Okay, I've redone the benchmark with JMH and actually this changed nothing (updated my answer). I've added your algorithm and it's the fastest one as expected. However, regexp is still the slowest one, about 20 (!) times slower than yours, and about 3 times than a one-liner with streams. However this is the most upvoted answer, which really makes me sad. – Alex Nov 15 '18 at 12:38
  • @Alex there isn't really a surprise that regex is the slowest and a no copying one is the fastest; no idea why that would make you sad though. as long as the OP and me too, are happy with the *overall* performance of the app using this - I see no problem; think about it: those are still nano-seconds. Or think of the fact that a `HashSet` is nothing more that a `HashMap` that uses a wrapper, or `Collections::sort` used to do lots of copying - and people would still use that quite a lot... – Eugene Nov 15 '18 at 12:48
  • @Alex don’t forget, sometimes you don’t want to execute an operation thousands of times. If you need it only one time, this solution is simple and still reasonably fast, as all solutions run in less than a microsecond. Much less than the required development time. If you need it more often, even this solution could be tuned, e.g. prepare the `Pattern` once and also change the pattern, e.g. to avoid the catastrophic backtracking. – Holger Nov 15 '18 at 14:45
  • @Holger can you change it here btw? I thought about that, but really could not :| – Eugene Nov 15 '18 at 14:47
  • You may use `url.replaceFirst(".*/([^/]++)/([^/]++)/([^/]++)", "$1$2$3")`, but when not preparing the `Pattern`, the time to analyze the regex will dominate the operation. – Holger Nov 15 '18 at 14:57
4

If you want to do it really fast, you have to reduce the amount of data copying happening with every string construction.

int ix1 = url.lastIndexOf('/'), ix2 = url.lastIndexOf('/', ix1-1),
    ix3 = url.lastIndexOf('/', ix2-1);
String result = new StringBuilder(url.length() - ix3 - 3)
    .append(url, ix3+1, ix2)
    .append(url, ix2+1, ix1)
    .append(url, ix1+1, url.length())
    .toString();

Even when you expand it to support a configurable number of parts,

int chunks = 3;
int[] ix = new int[chunks];
int index = url.length();
for(int a = ix.length-1; a >= 0; a--) index = url.lastIndexOf('/', (ix[a] = index)-1);
StringBuilder sb = new StringBuilder(url.length() - index - chunks);
for(int next: ix) sb.append(url, index+1, index = next);
String result = sb.toString();

it’s likely faster than all alternatives.

Holger
  • 285,553
  • 42
  • 434
  • 765
3

You may do it like so,

final String[] splitStrArr = url.split("/");
String result = Arrays.stream(splitStrArr).skip(splitStrArr.length - 3)
                    .collect(Collectors.joining(""));
Ravindra Ranwala
  • 20,744
  • 6
  • 45
  • 63
  • 1
    "I want to "stream" the string starting from the back." You're not starting from the back – Michael Nov 14 '18 at 14:26
  • 3
    One more thing. You can omit the empty string in `Collectors.joining()`. The result will be the same. – ETO Nov 14 '18 at 14:28
  • @Michael What he wants is to concat last three tokens in the url. For an instance this is what he expects a555data1 given the url https://.../foo/a/555/data1. So the direction of traversal is rather an implementation detail. Some other strategy to solve the problem may mandate you to traverse backward. – Ravindra Ranwala Nov 14 '18 at 15:05
  • @ETO That makes sense. Thanks ! – Ravindra Ranwala Nov 14 '18 at 15:06
1

Initially my thought was that streams should not be used here due to supposed performance overhead, so I created a little performance test for solutions proposed in another answers:

import static java.util.Arrays.stream;

import java.util.concurrent.TimeUnit;
import java.util.stream.Collectors;

import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.OutputTimeUnit;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.State;

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Fork(value = 1)
public class CJMH {

    @State(Scope.Thread)
    public static class CState {
        public String url = "https://.../foo/a/555/data1";
    }

    @Benchmark
    public String fastest(CState state) {
        String url = state.url;
        int chunks = 3;
        int[] ix = new int[chunks];
        int index = url.length();
        for(int a = ix.length-1; a >= 0; a--) index = url.lastIndexOf('/', (ix[a] = index)-1);
        StringBuilder sb = new StringBuilder(url.length() - index - chunks);
        for(int next: ix) sb.append(url, index+1, index = next);
        return sb.toString();
    }

    @Benchmark
    public String splitAndStreams(CState state) {
        final String[] splitStrArr = state.url.split("/");
        String result = stream(splitStrArr).
                skip(splitStrArr.length - 3).
                collect(Collectors.joining(""));

        return result;
    };

    @Benchmark
    public String splitAndIterate(CState state) {
        final String[] splitStrArr = state.url.split("/");

        String result = "";
        for (int k=splitStrArr.length - 3; k<splitStrArr.length; k++) {
            result += splitStrArr[k];
        }

        return result;
    };

    @Benchmark
    public String splitAndSum(CState state) {
        String[] split = state.url.split("/");
        int n = split.length;
        return split[n - 3] + split[n - 2] + split[n - 1];
    };

    @Benchmark
    public String regexp(CState state) {
        return state.url.replaceAll(".+/(.+)/(.+)/(.+)", "$1$2$3");
    };
}

And the output was:

Benchmark             Mode  Cnt    Score    Error  Units
CJMH.fastest          avgt    5   46.731 ±  0.445  ns/op
CJMH.regexp           avgt    5  937.797 ± 11.928  ns/op
CJMH.splitAndIterate  avgt    5  194.626 ±  1.880  ns/op
CJMH.splitAndStreams  avgt    5  275.640 ±  1.887  ns/op
CJMH.splitAndSum      avgt    5  180.257 ±  2.986  ns/op

So surprisingly streams are in no way much slower than iterating over the array. The fastest one is a no-copy algorithm provided by @Holger in this answer. And do not use regexps if you could avoid it!

Alex
  • 644
  • 4
  • 10
  • Note that using `Collectors.joining()` may be faster than `Collectors.joining("")`. Also, the `splitAndSum` performance may depend on the platform; compiling for Java 9 and newer will use the improved string concatenation, hence, may perform even better. – Holger Nov 15 '18 at 14:54
  • @Holger yes, `Collectors.joining()` gives about 250 ns/op for `splitAndStreams` (10% faster) on Java 8, and using Java 11 reduces `split*` algorithms times for another 10-15%. Strangely, regexp algorithm performs even worse on Java 11 giving about 1000ns/op. – Alex Nov 16 '18 at 08:40
0

I'd probably simplify your code a bit to:

StringBuilder sb = new StringBuilder();

    char c;
    for (int i = testUrl.length() - 1, count = 0; count < 3 ; --i) {
        if ((c = testUrl.charAt( i )) ==  '/') {
            ++count;
            continue;
        }
        sb.append( c );
    }
    String foo = sb.reverse().toString();

In my opinion there is no real point in using stream here - the url isn't long enough to justify effor spent setting up stream. Also we can use StringBuilder - which will be used during joining anyways.

Worthless
  • 531
  • 3
  • 7