2

A long-standing problem in Perl is how to identify a location with finer granularity than a line number. (Follow the link for more information.) This question is about how to get that.

The most promising way to do this is to use the Perl opcode address that is under consideration and deparse the statements around that. And at the level of the a subroutine, B::Deparse will recreate Perl given a code reference. So ideal would be to modify B::Deparse to allow you to give a supplied op to start deparsing. Failing that, it could instead deparse the enclosing subroutine, displaying op-code addresses for each statement encountered. See the code below for an example of this.

B::Concise can show a op-code disassembly for a subroutine. In its disassembly output, it gives addresses, and those addresses it gives match those returned, say, by Devel::Callsite.

The problem is that after instrumenting B::Deparse as done below, the OP addresses it gives does not match those given by B::Concise or Devel::Callsite. Output given below shows this.

I can normalize addresses so that they refer to relative offsets rather than absolute addresses. However this is a lot of work, is gross, and I'm not even totally sure this will work, since Deparse may change code by "pessimizing" or, I guess, undoing optimization.

For concreteness, below is some code that shows the mismatch. Note that none of the addresses given by deparse is shown in the disassembly.

use B::Deparse;
use B::Concise qw(set_style);
sub foo() {
    my $x=1; $x+=1;
}

my $deparse = B::Deparse->new("-p", "-l", "-sC");

$body = $deparse->coderef2text(\&foo);
print($body, "\n");
my $walker = B::Concise::compile('-basic', 'foo', \&foo);
B::Concise::set_style_standard('debug');
B::Concise::walk_output(\my $buf);
$walker->();            # walks and renders into $buf;
print($buf);

package B::Deparse;

# Modified to show OP addresses
sub lineseq {
    my($self, $root, $cx, @ops) = @_;
    my($expr, @exprs);

    my $out_cop = $self->{'curcop'};
    my $out_seq = defined($out_cop) ? $out_cop->cop_seq : undef;
    my $limit_seq;
    if (defined $root) {
    $limit_seq = $out_seq;
    my $nseq;
    $nseq = $self->find_scope_st($root->sibling) if ${$root->sibling};
    $limit_seq = $nseq if !defined($limit_seq)
               or defined($nseq) && $nseq < $limit_seq;
    }
    $limit_seq = $self->{'limit_seq'}
    if defined($self->{'limit_seq'})
    && (!defined($limit_seq) || $self->{'limit_seq'} < $limit_seq);
    local $self->{'limit_seq'} = $limit_seq;

    my $fn = sub {
        my ($text, $i) = @_;
        my $op = $ops[$i];
        push @exprs, sprintf("# op: 0x%x\n%s ", $op, $text);
    };
    $self->walk_lineseq($root, \@ops, $fn);
    # $self->walk_lineseq($root, \@ops,
    #              sub { push @exprs, $_[0]} );

    my $sep = $cx ? '; ' : ";\n";
    my $body = join($sep, grep {length} @exprs);
    my $subs = "";
    if (defined $root && defined $limit_seq && !$self->{'in_format'}) {
    $subs = join "\n", $self->seq_subs($limit_seq);
    }
    return join($sep, grep {length} $body, $subs);
}

The output I get from running this is:

() {
    # op: 0x14a4b30
#line 4 "deparse-so.pl"
    (my $x = 1) ;
    # op: 0x14a4aa0
#line 4 "deparse-so.pl"
    ($x += 1) ;
}
main::foo:
UNOP (0xeb9978)
    op_next     0
    op_sibling  0
    op_ppaddr   PL_ppaddr[OP_LEAVESUB]
    op_type     175
    op_flags    4
    op_private  65  
    op_first    0xeab7a0
LISTOP (0xeab7a0)
    op_next     0xeb9978
    op_sibling  0
    op_ppaddr   PL_ppaddr[OP_LINESEQ]
    op_type     181
    op_flags    12
    op_private  0   
    op_first    0xeab7e8
    op_last     0xeb9a20
COP (0xeab7e8)
    op_next     0xeab890
    op_sibling  0xeab848
    op_ppaddr   PL_ppaddr[OP_NEXTSTATE]
    op_type     182
    op_flags    1
    op_private  0   0
BINOP (0xeab848)
    op_next     0xeb99c0
    op_sibling  0xeb99c0
    op_ppaddr   PL_ppaddr[OP_SASSIGN]
    op_type     37
    op_flags    69
    op_private  2   
    op_first    0xeab890
    op_last     0xeab8d0
SVOP (0xeab890)
    op_next     0xeab8d0
    op_sibling  0xeab8d0
    op_ppaddr   PL_ppaddr[OP_CONST]
    op_type     5
    op_flags    2
    op_private  0   
    op_sv       0xea3c40
OP (0xeab8d0)
    op_next     0xeab848
    op_sibling  0
    op_ppaddr   PL_ppaddr[OP_PADSV]
    op_type     9
    op_flags    178
    op_private  128 
COP (0xeb99c0)
    op_next     0xeab768
    op_sibling  0xeb9a20
    op_ppaddr   PL_ppaddr[OP_NEXTSTATE]
    op_type     182
    op_flags    1
    op_private  0   0
BINOP (0xeb9a20)
    op_next     0xeb9978
    op_sibling  0
    op_ppaddr   PL_ppaddr[OP_ADD]
    op_type     63
    op_flags    70
    op_private  2   
    op_first    0xeab768
    op_last     0xeb9a68
OP (0xeab768)
    op_next     0xeb9a68
    op_sibling  0xeb9a68
    op_ppaddr   PL_ppaddr[OP_PADSV]
    op_type     9
    op_flags    50
    op_private  0   
SVOP (0xeb9a68)
    op_next     0xeb9a20
    op_sibling  0
    op_ppaddr   PL_ppaddr[OP_CONST]
    op_type     5
    op_flags    2
    op_private  0   
    op_sv       0xea3c10
B::Concise::compile(CODE(0xea3c70))
UNOP (0xeb9978)
    op_next     0
    op_sibling  0
    op_ppaddr   PL_ppaddr[OP_LEAVESUB]
    op_type     175
    op_flags    4
    op_private  65  
    op_first    0xeab7a0
LISTOP (0xeab7a0)
    op_next     0xeb9978
    op_sibling  0
    op_ppaddr   PL_ppaddr[OP_LINESEQ]
    op_type     181
    op_flags    12
    op_private  0   
    op_first    0xeab7e8
    op_last     0xeb9a20
COP (0xeab7e8)
    op_next     0xeab890
    op_sibling  0xeab848
    op_ppaddr   PL_ppaddr[OP_NEXTSTATE]
    op_type     182
    op_flags    1
    op_private  0   0
BINOP (0xeab848)
    op_next     0xeb99c0
    op_sibling  0xeb99c0
    op_ppaddr   PL_ppaddr[OP_SASSIGN]
    op_type     37
    op_flags    69
    op_private  2   
    op_first    0xeab890
    op_last     0xeab8d0
SVOP (0xeab890)
    op_next     0xeab8d0
    op_sibling  0xeab8d0
    op_ppaddr   PL_ppaddr[OP_CONST]
    op_type     5
    op_flags    2
    op_private  0   
    op_sv       0xea3c40
OP (0xeab8d0)
    op_next     0xeab848
    op_sibling  0
    op_ppaddr   PL_ppaddr[OP_PADSV]
    op_type     9
    op_flags    178
    op_private  128 
COP (0xeb99c0)
    op_next     0xeab768
    op_sibling  0xeb9a20
    op_ppaddr   PL_ppaddr[OP_NEXTSTATE]
    op_type     182
    op_flags    1
    op_private  0   0
BINOP (0xeb9a20)
    op_next     0xeb9978
    op_sibling  0
    op_ppaddr   PL_ppaddr[OP_ADD]
    op_type     63
    op_flags    70
    op_private  2   
    op_first    0xeab768
    op_last     0xeb9a68
OP (0xeab768)
    op_next     0xeb9a68
    op_sibling  0xeb9a68
    op_ppaddr   PL_ppaddr[OP_PADSV]
    op_type     9
    op_flags    50
    op_private  0   
SVOP (0xeb9a68)
    op_next     0xeb9a20
    op_sibling  0
    op_ppaddr   PL_ppaddr[OP_CONST]
    op_type     5
    op_flags    2
    op_private  0   
    op_sv       0xea3c10

Finally, as way of encouraging people to help here, if this is solved, the solution will probably appear in by Perl debugger Devel::Trepan and allow you to reliably know exactly where you are when stopped inside the debugger.

Note: edited to make the question clearer.

rocky
  • 7,226
  • 3
  • 33
  • 74

2 Answers2

2

svref_2object returns an object that allows you to extract information from the structure referenced by the argument passed to svref_2object.

You're printing the address of that object (a scalar blessed into class B::CV).

use B qw( );

sub foo { }

my $cv = B::svref_2object(\&foo);

printf "%x\n", \&foo;                 # Numification of 1st ref to &foo.
printf "%x\n", \&foo;                 # Numification of 2nd ref to &foo.
printf "%x\n", $cv;                   # Numification of ref to B::CV object.
printf "%x\n", $cv->object_2svref();  # Numification of 3rd ref to &foo.
printf "%x\n", $$cv;  # Address of struct referenced by svref_2object's arg (Undocumented)

References numify to the address of what they reference, so we get:

3c5eaf8
3c5eaf8
3c5e1b0
3c5eaf8
3c5eaf8
ikegami
  • 367,544
  • 15
  • 269
  • 518
  • Perhaps, I didn't make the problem clear. What would be helpful is an example using B::Deparse and B::Concise and having B::Deparse output with opcode addresses matching B::Deparse opcode addresses. Change your example to `use B qw( ); sub foo { }; my $foo_ref = B::svref_2object(\&foo); printf "0x%x -- note this\n", $foo_ref; my $foo_ref2 = B::svref_2object(\&foo); printf "0x%x -- now changed\n", $foo_ref2;` and you will see that the address changes. Why? – rocky Oct 28 '15 at 20:00
  • You just repeated yourself, so I'll do the same: The addresses are different because they are the addresses of different things. One's the address of `&foo`, and one's not. I showed how to get the address of `&foo` from the B::CV object. (In fact, I showed two ways.) – ikegami Oct 28 '15 at 20:01
  • I don't understand `$xxx... = B::svref_2object(\&foo)` is the same statement used both times. But independent of that, what I need is the same opcode address output given by B::Concise and B::Deparse. So how do I do that? – rocky Oct 28 '15 at 20:06
  • Well, yeah, you'll get a new object each time you ask to construct a new object. – ikegami Oct 28 '15 at 20:14
  • Re "what I need is the same opcode address output given by B::Concise and B::Deparse", In your question, you asked for the address of the structure a B::CV object reflects. You're now asking how to get the address of the structure a B::BINOP object reflects. The answer is mostly identical, except the first solution I gave, `$binop->object_2svref()` doesn't work (because you can't have a reference to an op), but the second solution I provided (`$$binop`) works. It's undocumented, but it's less likely to change than some of the other things you rely on. – ikegami Oct 28 '15 at 20:32
  • Again, you don't seem to understand main point of the question. I have revised the question hopefully to make it more clear what the end goal is: deparsing statements at or around a given OP address. I'm not interested in B::svref_2object per se. Perhaps that was a red herring on my part. – rocky Oct 28 '15 at 23:38
  • Re "Again, you don't seem to understand main point of the question", Then why did you copy my answer???? – ikegami Oct 29 '15 at 03:34
1

ikegami's answer suggestion buried in the comments lead me to find the conceptual flaw I made in my first-proposed solution: inside B::Deparse a lexical array variable stores OPs and those are implicit pointers to the actual code OP structures. Using the undocumented $$ to get the underlying address that the scalar implicitly points to gives the correct address. So in my monkey-patched code of B::Deparse::lineseq, changing:

push @exprs, sprintf("# op: 0x%x\n%s ", $op, $text);

to:

push @exprs, sprintf("# op: 0x%x\n%s ", $$op, $text);
                                        ^^

gives me an address that I can use to match up results.

Still, there's a bit of work still to get this usable, so if there are any other ways or suggestions, I'd love to hear them.

Devel::Trepan release 0.70 now makes use in its deparse command of the above code suitably modified to be able to show which of multiple statements is about to be run.

rocky
  • 7,226
  • 3
  • 33
  • 74