12

I'd like to extract the actual address of java objects for some research purpose. Just to be clear, I actually want the 48bits virtual address of the object, not ID or hashcode or any unique identifier, and I understand that those addresses are moved around by the GC. I've been reading the other posts from stackoverflow like here or here.

For the following I use the @Peter Lawrey -> Is there a way to get a reference address? method. So it uses the Unsafe class with the arrayBaseOffset method. What I found strange about those methods is that they give the same result for every run (at least on my computer) which is very unlikely to happen. Memory allocation is supposed to be randomized for security reasons.

Moreover, I tried to verify those methods with Pintools which is the instrumentation tool from Intel that I used to extract memory traces of the run. My problem is that I am not able to correlate what I see in the memory trace of Pintools with the addresses given by the above methods to get the memory addresses. The given addresses are never accessed in my memory trace.

So I am wondering what is returned by those methods and how those results have been verified it against other tools.

Some infos: my OS is an Ubuntu x86_64, my JVM is the openJDK 64bits 1.8.0_131, pintools version is v3.2

=================== Big Edit: I realize that my question is not well put, so let me get a more atomic example, here is the java that I try to analyze:

`import sun.misc.Unsafe;
import java.lang.reflect.Field;

public class HelloWorld {

    public static void main(String[] args) throws Exception {
    Unsafe unsafe = getUnsafeInstance(); 
    Integer i = new Integer(42);
    long addr_fromArray;
    long addr_fromObject;

/////////////////////////////////////   
    Object[] objects = {i};
    long baseOffset = unsafe.arrayBaseOffset(Object[].class);
    addr_fromArray = unsafe.getLong(objects, baseOffset);   

    long factor1 = 8;        
    long addr_withFactor = (unsafe.getInt(objects, baseOffset) & 0xFFFFFFFFL) * factor1;

    /////////////////////////////////////   
    class Pointer {
        Object pointer;
    }

    Pointer pointer = new Pointer();
    pointer.pointer = i;
    long offset =     unsafe.objectFieldOffset(Pointer.class.getDeclaredField("pointer"));
    addr_fromObject = unsafe.getLong(pointer, offset);


    System.out.println("Addr of i from Array : 0x" + Long.toHexString(addr_fromArray));
    System.out.println("Addr of i from Object : 0x" + Long.toHexString(addr_fromObject));

    System.out.println("Addr of i from factor1 : 0x" + Long.toHexString(addr_withFactor));

    System.out.println("!=1");//Launch the pintools instrumentation 
    for(int a= 0 ; a < 123 ;a++){   
        i = 10;
    }
    System.out.println("!=1");//Stop the pintools instrumentation 
}

private static Unsafe getUnsafeInstance() throws SecurityException,
NoSuchFieldException, IllegalArgumentException,
IllegalAccessException {
    Field theUnsafeInstance = Unsafe.class.getDeclaredField("theUnsafe");
    theUnsafeInstance.setAccessible(true);
    return (Unsafe) theUnsafeInstance.get(Unsafe.class);
    }
}`

I get the pointer to the i Integer from different methods that I have seen on stack overflow. Then I do a loop on i for an arbitrary number of time so I could recognize it in my memory trace (Note: I checked that no GC calls occurs within this code)

When pintools see the specific "!=1" written in the standard output, it starts /stop the instrumentation

On every access during the instrumentation phase, I execute this code:

VOID RecordAccess(VOID* ip, int id_thread , VOID * addr, int id)
{
    PIN_GetLock(&lock, id_thread);
    if(startInstru)
    {
        log1 << "Data accessed: " << addr << "\tThread:" << id_thread << endl;
        nb_access++;
        uint64_t dummy = reinterpret_cast<uint64_t>(addr);
        if(accessPerAddr.count(dummy) == 0)
            accessPerAddr.insert(pair<uint64_t,uint64_t>(dummy, 0));
        accessPerAddr[dummy]++;
    }
}

With this pintools, I generate a memory trace + a histogramm on how many times each memory addresses is accessed. Note: the pintool is launched with the "follow_execv" option in order to instrument every threads.

I see 2 Problems:

1) I see no accesses to any of the printed i addresses (or close to this address). I tend to trust Pintools because I have used quite a lot before but maybe Pintools is not able to retrieve the correct addresses here.

2) I see no addresses being accessed 123 times (or close to this). My thoughts for this is that maybe JVM performs optimization here because it sees that code executed has no effect so it doesn't execute it. However, I tried with a more complex instruction (that cannot be optimized like storing a random number) inside the loop than just a store to i without better results.

I don't care much about the GC effect here, maybe in the second step. I only want to be able to extract native addresses from my java app that I am pretty sure Pintools is giving me.

  • You might want to include your code – Mark Rotteveel Aug 04 '17 at 11:33
  • 'memory allocation is supposed to be randomized for security reasons.'. Do you have any reference this actually happens in java? This statement has some sense in C, where buffer overflow could result in code execution. – Bartosz Bilicki Aug 04 '17 at 11:37
  • Added some code if this helps ! – Grégory Vaumourin Aug 04 '17 at 11:46
  • @BartoszBilicki : You're right, I just assumed this because this is what I see in C/C++ programs/ So the deterministic addresses can be explained . it doesn't explain why I don't see memory accesses to this memory location though – Grégory Vaumourin Aug 04 '17 at 11:51
  • Are you asking about 0x76d1602b0->0x6c7204920->0x6c7204920 being the same on different runs on your machine, or about +-0x18 offsets? – Oleg Estekhin Aug 07 '17 at 08:31
  • One more thought - run the program with different GCs, may be they will use different virtual memory layout and will have different starting (virtual) addresses for allocation. Inspired by https://stackoverflow.com/questions/9208421/why-virtual-memory-address-is-the-same-in-different-process – Oleg Estekhin Aug 07 '17 at 08:35
  • I understand that GC move objets around and that is why addresses can change after GC calls (according to the specific implementation of the GC). My question is that is that why I don't see any accesses to those memory regions when generating a memory trace with Pintools. – Grégory Vaumourin Aug 07 '17 at 08:40
  • pintools instrument jvm , right? jvm either interprets the bytecode and/or does dynamic compilation of it (jit). It should cause pretty nasty multi-threaded work with sp, pc, ... I wonder if you hit a pintools limit somehow here or miss some switches so that it lies to you. – Serge Aug 09 '17 at 02:21

1 Answers1

3

So when I am instrumenting this run with pintools, with a similar script as here . I don't see any access performed on the mentionned addresses or nearby addresses

I think you should give more information how you run and what you see.

To explore object layouts you could use http://openjdk.java.net/projects/code-tools/jol.

import org.openjdk.jol.info.GraphLayout;

import java.io.PrintWriter;
import java.util.Arrays;
import java.util.Collections;
import java.util.SortedSet;

public class OrderOfObjectsAfterGCMain2 {
    public static void main(String... args) {
    Double[] ascending = new Double[16];
    for (int i = 0; i < ascending.length; i++)
        ascending[i] = (double) i;

    Double[] descending = new Double[16];
    for (int i = descending.length - 1; i >= 0; i--)
        descending[i] = (double) i;

    Double[] shuffled = new Double[16];
    for (int i = 0; i < shuffled.length; i++)
        shuffled[i] = (double) i;
    Collections.shuffle(Arrays.asList(shuffled));

    System.out.println("Before GC");
    printAddresses("ascending", ascending);
    printAddresses("descending", descending);
    printAddresses("shuffled", shuffled);

    System.gc();
    System.out.println("\nAfter GC");
    printAddresses("ascending", ascending);
    printAddresses("descending", descending);
    printAddresses("shuffled", shuffled);

    System.gc();
    System.out.println("\nAfter GC 2");
    printAddresses("ascending", ascending);
    printAddresses("descending", descending);
    printAddresses("shuffled", shuffled);

}

public static void printAddresses(String label, Double[] array) {
    PrintWriter pw = new PrintWriter(System.out, true);
    pw.print(label + ": ");
    // GraphLayout.parseInstance((Object) array).toPrintable() has more info
    SortedSet<Long> addresses = GraphLayout.parseInstance((Object) array).addresses();
    Long first = addresses.first(), previous = first;
    pw.print(Long.toHexString(first));
    for (Long address : addresses) {
        if (address > first) {
            pw.print(Long.toHexString(address - previous) + ", ");
            previous = address;
        }
    }
    pw.println();
}

With this tool I have approximately the same results:

Before GC
# WARNING: Unable to attach Serviceability Agent. Unable to attach even with escalated privileges: null
ascending: 76d430c7850, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 
descending: 76d430e4850, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 
shuffled: 76d43101850, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 

After GC
ascending: 6c782859856d88, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 
descending: 6c78285e856eb8, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 
shuffled: 6c782863856fe8, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 

After GC 2
ascending: 6c7828570548a8, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 
descending: 6c78285c0549d8, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 
shuffled: 6c782861054b08, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 

Process finished with exit code 0

With this example http://hg.openjdk.java.net/code-tools/jol/file/018c0e12f70f/jol-samples/src/main/java/org/openjdk/jol/samples/JOLSample_21_Arrays.java you could test GC affects on arrays.

UPD

You provided more info, I tried to help you by the time. First catched my eyes

for(int a= 0 ; a < 123 ;a++){   
    i = 10;
}

Java is smart enough to eliminate this cycle, as the result is always - one instruction "i = 10;". For example,

import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.OperationsPerInvocation;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.infra.Blackhole;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.options.OptionsBuilder;

@State(Scope.Benchmark)
public class TestLoop {

    static final int _123 = 123;
    int TEN = 10;

    @Benchmark
    @OperationsPerInvocation(_123)
    public void oneAssigment() {
        Integer i = 1;
        i = 10;
    }

    @Benchmark
    @OperationsPerInvocation(_123)
    public Integer oneAssigmentAndReturn() {
        Integer i = 1;
        i = TEN;
        return i;
    }

    @Benchmark
    @OperationsPerInvocation(_123)
    public void doWrong() {
        Integer i = 1;
        for (int a = 0; a < _123; a++) {
            i = 10;
        }
    }

    @Benchmark
    @OperationsPerInvocation(_123)
    public void doWrongWithLocalVariable() {
        Integer i = -1;
        for (int a = 0; a < _123; a++) {
            i = TEN;
        }
    }

    @Benchmark
    @OperationsPerInvocation(_123)
    public Integer doWrongWithResultButOneAssignment() {
        Integer i = -1;
        for (int a = 0; a < _123; a++) {
            i = TEN;
        }
        return i;
    }

    @Benchmark
    @OperationsPerInvocation(_123)
    public void doWrongWithConstant(Blackhole blackhole) {
        for (int a = 0; a < _123; a++) {
            blackhole.consume(10);
        }
    }

    @Benchmark
    @OperationsPerInvocation(_123)
    public void doRight(Blackhole blackhole) {
        for (int a = 0; a < _123; a++) {
            blackhole.consume(TEN);
        }
    }

    public static void main(String[] args) throws Exception {
        new Runner(
                new OptionsBuilder()
                        .include(TestLoop.class.getSimpleName())
                        .warmupIterations(10)
                        .measurementIterations(5)
                        .build()
        ).run();
    }


}

Will provide

Benchmark                                    Mode  Cnt             Score            Error  Units
TestLoop.doRight                            thrpt   50     352484417,380 ±    7015412,429  ops/s
TestLoop.doWrong                            thrpt   50  358755522786,236 ± 5981089062,678  ops/s
TestLoop.doWrongWithConstant                thrpt   50     345064502,382 ±    6416086,124  ops/s
TestLoop.doWrongWithLocalVariable           thrpt   50  179358318061,773 ± 1275564518,588  ops/s
TestLoop.doWrongWithResultButOneAssignment  thrpt   50   28834168374,113 ±  458790505,730  ops/s
TestLoop.oneAssigment                       thrpt   50  352690179375,361 ± 6597380579,764  ops/s
TestLoop.oneAssigmentAndReturn              thrpt   50   25893961080,851 ±  853274666,167  ops/s

As you can see, your method is the same as one assignment. See also:

egorlitvinenko
  • 2,736
  • 2
  • 16
  • 36
  • Sorry, with your answer, I realized I wasn't perfectly clear, so I edited my post, Thanks for your answer. It is good to see that these methodsd have been confirmed against other tools. Maybe, Pintools isn't just working properly with Java apps and is not able to retrieve the correct addresses – Grégory Vaumourin Aug 10 '17 at 10:12
  • GrégoryVaumourin I walked through your updated post and add information. – egorlitvinenko Aug 22 '17 at 14:30