5

This post says that a += b is the equivalent of

a = new StringBuilder()
    .append(a)
    .append(b)
    .toString();

Let's say I have this code:

public class MultiThreadingClass extends SomeThirdPartyClassThatExtendsObject{
    public void beginmt(String st) throws IOException {
        //st is a thread number
        st = new File("c:\\somepath").getCanonicalPath()+"\\"+st;
        System.out.println(st);
    }
}

Assume that beginmt runs multiple times simultaneously (with thread numbers 1 to 15500) on a single instance of MultiThreading class. Could there be instances such that it could print the following i.e. some thread numbers are lost and some numbers are doubled?

c:\somepath\2
c:\somepath\1
c:\somepath\1
c:\somepath\4
c:\somepath\5
c:\somepath\6
c:\somepath\7
c:\somepath\8
c:\somepath\8
c:\somepath\10
...

Edit:

Will it be safe to say that the + operator won't get into some unsafe publication issue? I'm thinking the StringBuilder could be optimized into something that resembles an instance variable in which case it could be unsafely published.

Edit 2:

As far as the JLS, the abovementioned post, and a similar class file for the above code are checked, the StringBuilders to be used seem to have to get contained within different stackframes. However, I'd still like to check whether some form of aggressive optimization could cause the StringBuilders to be replaced by a centralized StringBuilder in some way. This sounds possible as it sounds logical for optimizers to optimize when it predicts that an object is just implemented in a non-constant way when in fact such object could be constant.

Found stringopts.cpp but haven't found the time yet to fully check it. I'm hopefully looking for answers involving details of this source file.

Edit 3:

I'm still looking for answers that include code on aggressive inlining for mutable objects.

Community
  • 1
  • 1
damat-perdigannat
  • 5,780
  • 1
  • 17
  • 33
  • 3
    Your code isn't complete :-) – Smutje Aug 22 '14 at 08:07
  • 1
    `beginmt` is not printing anything as such. – Mena Aug 22 '14 at 08:10
  • Sorry guys, please see my edit :D – damat-perdigannat Aug 22 '14 at 08:13
  • Your code is incomplete and doesn't seem to use threads,besides the primary thing being 'st' seems to be a local var – humblerookie Aug 22 '14 at 08:13
  • 1
    this is still incomplete (and uncompilable actually) ... we would need to see the `Thread` code passing in `st` to know where the duplicates are coming from ... – kares Aug 22 '14 at 08:14
  • 3
    "*I'm thinking the StringBuilder could be optimized into an instance variable in which case it could be unsafely published.*" => that would change the semantics and the JVM is not allowed to do that kind of optimisation. – assylias Aug 22 '14 at 08:32
  • @assylias not even something that could resemble it? – damat-perdigannat Aug 22 '14 at 08:37
  • 1
    The JLS says about [local variables](http://docs.oracle.com/javase/specs/jls/se7/html/jls-4.html#jls-4.12.3) "*Whenever the flow of control enters a block (§14.2) or for statement (§14.14), a new variable is created for each local variable declared in a local variable declaration statement immediately contained within that block or for statement.*" That would be violated by such state sharing. – kiheru Aug 28 '14 at 10:36
  • Possible duplicate of http://stackoverflow.com/questions/12825847/why-are-local-variables-thread-safe-in-java – Raedwald Sep 02 '14 at 12:08

6 Answers6

11

No, there is no state that's being shared between different threads so the situation you described can not happen.

If instead st was a member variable of that class, instead of being passed as an argument, and was incremented - that's a different story.

How it works now is that st will be put on the execution stack, each thread has it's own execution stack and they don't share stuff from there. Therefore each thread has it's own value of st. When it's a member variable of a class it's in memory (single value) and all threads would try to use it (the same one).

@Edit: well I guess it is possible if you call the method several times with the same value :-))

Mateusz Dymczyk
  • 14,969
  • 10
  • 59
  • 94
  • Let's say I didn't call the method several times with the same value. Please see another edit :D – damat-perdigannat Aug 22 '14 at 08:25
  • 1
    Not sure I understand your edit... The method is thread safe, whatever the value of st is... – assylias Aug 22 '14 at 08:30
  • @b16db0 - Strings are by default *immutable* in java. So, like *assylias* says, your method is thread-safe. – TheLostMind Aug 22 '14 at 08:37
  • @assylias yes but the above output is possible, you just need to pass the same value for st several times ;-)) – Mateusz Dymczyk Aug 22 '14 at 08:39
  • @TheLostMind String was used as a local variable so it shouldn't be the problem. However, I'm thinking the StringBuilder used for the plus operator could be optimized into something similar to an instance variable. StringBuilder is mutable. – damat-perdigannat Aug 22 '14 at 08:43
  • 1
    @b16db0: Do you have *any* evidence for the supposition that the JIT would mysteriously introduce an instance variable for no good reason? Why would it do that? If you're going to guess that the JIT will break this kind of code arbitrarily, do you not have confidence in *any* code? – Jon Skeet Aug 30 '14 at 16:10
  • 1
    @b16db0 as you pointed out yourself such an "optimization" might introduce problems, therefore it's not really logical for the JIT to do it. – Mateusz Dymczyk Aug 30 '14 at 16:25
  • @JonSkeet I don't know if the evidence has already been in front of me when I found the following comment in stringopts.hpp: "Examine the use of the SB alloc to see if it can be replace with a single string construction." I'm just thinking that since a single instance of MultithreadingClass is shared by multiple threads, it could be logical to predict that the first two parts of the concatenation (the "new File" stuff and "\\") could be predicted as constant: the third part being a "casualty". It could be that I'm just not familiar with stringopts and I want to know it more. – damat-perdigannat Sep 01 '14 at 02:13
  • 1
    @b16db0 but how does the compiler/JIT know that ```new File("c:\\somepath").getCanonicalPath()+"\\"``` will always return the same result? Even if you'd observe it's behavior over time it's not really safe to assume that if it returned X 9 times it will also return X next time you run it unless you somehow annotate the methods. Concatenating two string literals is a different story but since we are talking about local variables I still don't think anyone would dare to make such an optimization. Should you do exactly the same concat within one method - sure, otherwise let it be. – Mateusz Dymczyk Sep 01 '14 at 02:21
  • @MateuszDymczyk this webpage http://www.artima.com/designtechniques/hotspotP.html could be old but it's about HotSpot and it says that one aspect in optimization is to "make methods final or static wherever possible". I'm just thinking if the server compiler is somewhat aggressive then why not try and predict which variables are actually constant. Actually, I'm still trying to inspect some codes but from what I understand if the object to append is a string, int, or char it's a possible candidate for optimization. If you could help me understand the stringopt code I'd be grateful. :) – damat-perdigannat Sep 01 '14 at 02:41
  • 1
    @b16db0 I'm not saying that such optimization is bad all the time, I'm just pointing out that you can do it only in a few cases, you cannot perform it if a variable is involved (such as in this case since both ```st``` and ```getCanonicalPath()``` might change with each invocation. Should you write ```"/usr/bin/"+"test"``` and use it several times in the same method then sure go ahead and optimize it. Otherwise there's no reason for the compiler to touch your code as it simply cannot know if the value will be same every time. – Mateusz Dymczyk Sep 01 '14 at 02:47
  • @MateuszDymczyk but then `getCanonicalPath()` could be optimized since we are running on a single machine right? As for `st`, it's really what I'm talking about: Since there will be a lot of calls to StringBuilder and its append method at about the same location, `append` could "possibly" be optimized with something like scalar replacement. I think the answers should then boil down to stuff about whether such inlinings on mutable objects especially StringBuilder could happen, and whether concatenation is optimized per thread. After all, sometimes these stuff can be overlooked. – damat-perdigannat Sep 01 '14 at 03:30
  • 1
    @b16db0 yes but if you're the compiler how do you know that ```getCanonicalPath()``` will return the same thing with each invocation? – Mateusz Dymczyk Sep 01 '14 at 03:33
  • @MateuszDymczyk one possible way I could think of would be to notice that the argument is from the constant pool – damat-perdigannat Sep 01 '14 at 03:35
  • 1
    @b16db0 in the constructor yes but how do you know that the ```getCanonicalPath()``` or even the constructor don't use some randomized/volatile values to initialize the state/produce the result? – Mateusz Dymczyk Sep 01 '14 at 05:19
  • @MateuszDymczyk at least for Windows, it doesn't seem like so. Although how the compiler considers which variables are dynamic at runtime is one part of HotSpot I'd like to know also. :D – damat-perdigannat Sep 01 '14 at 05:36
  • 1
    @b16db0 on Windows it might not, no reason it should use any kind of randomization. But the only way for the JIT or the compiler to know that would be to recursively check every single method/variable that's being used by ```getCanonicalPath()```. – Mateusz Dymczyk Sep 01 '14 at 06:17
  • @MateuszDymczyk yeah it could do that. :) – damat-perdigannat Sep 01 '14 at 07:53
  • 1
    @b16db0 yes but first that might be pretty expensive to do and second even with such info it doesn't have the necessary context to make such an optimization as it can introduce synchronization (or other) problems, as you pointed out in your question. You can keep looking for that code but if you do that then following such logic you'd need to check every piece of every technology you use just to make sure it's 100% bug/design flaw free (and you won't such a technology :-)). – Mateusz Dymczyk Sep 01 '14 at 08:09
  • @MateuszDymczyk That's a lot. I'm hoping to look only at a few code including that for inlining, especially on String concatenation. I think I'd be more or less fine with this question once I know that the jvm is fine. – damat-perdigannat Sep 01 '14 at 08:19
5

Could there be instances such that it could print the following i.e. some thread numbers are lost and some numbers are doubled?

st is a method local variable, also st doesn't escape the method's scope so it is thread-safe. So, multithreading will have no effect on st . The messages can be printed out of order depending on which thread runs the method at what time.

TheLostMind
  • 35,966
  • 12
  • 68
  • 104
2

As you don't share any field between your threads, the order of printing can differ, but any problem concerning thread safety (race conditions) shouldn't appear.

Smutje
  • 17,733
  • 4
  • 24
  • 41
1

The Java Language Specification states

The result of string concatenation is a reference to a String object that is the concatenation of the two operand strings. The characters of the left-hand operand precede the characters of the right-hand operand in the newly created string.

So, although a compiler is free to optimize how the concatenation happens, it must do so while following that rule, "a" + "b" becomes "ab". In an unthread-safe, shared StringBuilder, implementation that would potentially not be the case. That implementation would therefore not be correct and could not be considered Java.

Sotirios Delimanolis
  • 274,122
  • 60
  • 696
  • 724
1

Each thread will always have individual StringBuilder instances. Thread-safety is no issue when threads don't share instances.

So, the following simple method ...

public class MyThreadSafeClass
{
  public String myMethod(String field1, String field2, String field3)
  {
     return field1 + field2 + field3;
  }
}

... will be compiled to use a local StringBuilder.

public class MyThreadSafeClass
{
  public String myMethod(String field1, String field2, String field3)
  {
     return new StringBuilder(field1).append(field2).append(field3).toString();
  }
}

Each time the method is entered, a new StringBuilder instance is created. This instance is only used withing the scope of this thread.

You are correct however that StringBuilders are not always thread-safe. (see below) If multiple threads start calling the saveEvent method, they may be using the builder simultaneously.

public class History
{
  // thread-safety issues !!!! 
  // In fact, here you should use a StringBuffer or some locking.
  private StringBuilder historyBuilder = new StringBuilder();

  public void saveEvent(String event)
  {
     historyBuilder.append(event).append('\n');
  }

  public String getHistoryString()
  {
     return historyBuilder.toString();
  }
}

But compiler optimizations will not create these kind of constructions. The StringBuilder is always created and used only within one and the same thread.

We could try to make things more complex (static fields, multiple classloaders, ...) but always again, each StringBuilder instance is created and used by only 1 thread.

EDIT:

Perhaps useful to know: This optimization happens during the generation of the byte-code. There are other optimizations later on during JIT compilation, but this optimization is not one of them. However the JIT compiler does have an important impact in the final performance.

bvdb
  • 22,839
  • 10
  • 110
  • 123
1

I have to partially disagree. This sentence is incomplete / missleading:

"If instead st was a member variable of that class, instead of being passed as an argument, and was incremented - that's a different story."

What matters in the original example is that the expression on the right is a rvalue. If it were not, the outcome would have been different. I will explain a bit.

So yes, Strings are immutable, and beginmt() receives a final reference to a String and this means a final reference to an immutable heap memory area. JVM will make a copy of this final reference and whatever you do inside the beginmt(), it is done on this copy, and this copy, immediately after the string is modified (st = ...), will point to another memory area. Now the point is that this final heap memory area has no pointer to it, because it is created inside the method and it seems that nothing points to it. Well, almost! The JVM may intern the string and if another thread points to the same value as the interned value it could be possible that they would actually share the same heap address. Now having a race condition exactly here is very hard to detect, so I will make a synthetic example to illustrate what could happen in case the expression on the right is a lvalue (induced by JVM's String intern-al):

public class AnExample { private static final int N = 20000;

private static class Foo {
    static HashMap<String, Foo> foos = new 
            HashMap<String, Foo>(N);

    static synchronized Foo createInstance(String i) {
        if (foos.containsKey(i))
            return foos.get(i);
        foos.put(i, new Foo(i));
        return foos.get(i);
    }

    String i;

    private Foo(String i) {
        this.i = i;
    }

    Foo inc() {
        synchronized(Foo.class){ 
            i += "1";
            return createInstance(i);
        }
    }

    @Override public String toString() {
        return i;
    }
}

private static class Bar {
    public void bar(Foo st) {
        st = st.inc();
        System.out.println(st);
    }
}

public static void main(String... args) {
    final Bar cucu = new Bar();
    for (int i = 0; i < N; i++) {
        final Foo st = Foo.createInstance(i + "");
        new Thread(new Runnable(){
            @Override public void run() {
                cucu.bar(st);
            }
        }).start();
    }

    try {
        Thread.sleep(10000);
    } catch (InterruptedException e) {
        e.printStackTrace();
    }
}

}

This will produce about 40% duplicates (I have less than 10100 unique values). Note that in the bar() method st = some_expression(st) In my example I intentionally produce a lvalue just to show what might happen in case the JVM will intern the expression and it happens to have a reference to that (which is sent back in another thread to the same method).

The conclusion is that your code is not correct because "st is not a member variable of that class" and "st becomes a local, copied reference", etc - but because the expression on the right is a rvalue.

Paul Ianas
  • 501
  • 3
  • 4