4

I'm writing a pretty printer in python for gdb, and am slowly getting the hang of the methodology. Trying to find actual documentation as to how this system works with examples of what is expected coming out of the methods is like pulling teeth. I've found bits and pieces here and there, but nothing that is all inclusive. Some of the info that I've figured out is through trial and error, which is slow going.

So far, it looks like a pretty printer's to_string() is only allowed to return a string (sure), but the children() can return a string or a pair of string and value, where value is either a python value or a value object described here, which is a wrapper of a c/c++ object that's being printed. I had actually hoped that I could return a pretty printer object and have that be called, but alas, that is not to be. I could return a string, but I want the payload elements to be collapsible in an IDE like VSCode, and for that I need to return a value object. The equivalent to this is a Synthetic Item in Natvis.

I've got a c++ class that is a buffer. Raw, it contains a byte vector and I need it to be processed in a way that will be readable.

Give the constraints, that I've gleaned, if I can wrap a pointer in a proxy value object using a pseudo-type, I might be able to break down the bytes into useable units. Here's a hardcoded example of what I'm talking about:

#include <cstdint>
struct alignas(std::uint16_t) buffer {
  enum id : char { id1, id2 };
  // structure is: payload_size, id, payload[]
  char buf[11] = { 2, id1, 1, 0, 2, 3
                 , 0, id1
                 , 1, id2, 1
                 };
  char* end = std::end(buf);
};

int main() {
  buffer b;
  return 0;
}

Putting a breakpoint on the return 0; on a big-endian machine, I would like to have something like the following show up:

(gdb) p b
$1 = buffer @ 0xaddre55 = { id1[2] = {1, 2, 3}, id1[0] = {}, id2 = {1} }

Here is what I got so far for the pretty printer python code:

class bufferPacketPrinter:
  def __init__(self, p_begin, p_end) -> None:
    self.p_begin = p_begin  # begining of packet
    self.p_end = p_end      # end of packet
    self.cmd_id       = self.p_begin[1].cast('buffer::id')
    self.payload_size = self.p_begin[0].cast('unsigned char').cast('int')

  def to_string(self):
    return 'packet {}[{}]' \
      .format(self.cmd_id, self.payload_size)

  def children(self):
    payload = self.p_begin + 2
    if self.cmd_id == 'id1':
      if self.payload_size == 0:
        return '{}'
      elif self.payload_size == 3:
        yield payload.cast(gdb.lookup_type('std::uint16_t').pointer())
        payload += 2
        yield payload[0].cast(gdb.lookup_type('unsigned char')).cast(gdb.lookup_type('int'))
        payload += 1
        return payload[0].cast(gdb.lookup_type('unsigned char')).cast(gdb.lookup_type('int'))
    elif self.cmd_id == 'id2':
      if self.payload_size == 1:
        return payload[0]
    return 'Invalid payload size of ' + str(self.payload_size)

class bufferPrinter:
  def __init__(self, val) -> None:
    self.val = val
    self.begin = self.val['buf'].cast(gdb.lookup_type('char').pointer())
    self.end = self.val['end']

  def to_string(self):
    return 'buffer @ {}'.format(self.val.address)
    
  def children(self):
    payload_size = self.begin[0].cast('unsigned char').cast('int')
    while self.begin != self.end:
      yield ??? # <=== Here is where the magic that I need is to happen
      self.begin += 2 + payload_size

(I'm still learning python as well as this API, so if there are any errors, please let me know.)

The second last line yield ??? is what I am stuck on. Any ideas? If this isn't the way to do it, let me know of another way.

user202729
  • 3,358
  • 3
  • 25
  • 36
Adrian
  • 10,246
  • 4
  • 44
  • 110
  • Why don't you return `string/string` pairs from `children()`? – ssbssa Nov 28 '21 at 16:53
  • 1
    @ssbssa, because I want the children to be collapsible in an IDE like VSCode. – Adrian Nov 28 '21 at 18:52
  • 1
    I also needed something similar once, so I [extended gdb](https://github.com/ssbssa/gdb/commit/95fbc18daed3e0bb80c8ddeee05144c2a1f66329) so you can return another pretty-printer in `children`, but I've never tested it outside of gdb itself. – ssbssa Nov 28 '21 at 19:33
  • @ssbssa, oh nice! I guess I could try to do a rebuild of gdb, but I've had very limited success when compiling things like compilers and the like. Seems that there is always some outstanding bug that keeps the system from compiling. :( :D I'll take a look. – Adrian Nov 29 '21 at 03:52
  • Instead of a pseudo-type you can probably also make a real type. See [Can we define a new data type in a GDB session - Stack Overflow](https://stackoverflow.com/questions/7272558/can-we-define-a-new-data-type-in-a-gdb-session?noredirect=1&lq=1) (not sure how well it works with Visual Studio however) – user202729 Dec 07 '21 at 07:35
  • @user202729, a real type? Looks like that example you are referring to is basically reading in the debug info into the current session, where it's not available by default. That's not what I want. – Adrian Dec 08 '21 at 02:24
  • If I understood correctly, the example compiles a new C file, which defines a new type, and load that to gdb so gdb understands a new data type. – user202729 Dec 08 '21 at 08:26
  • @user202729, It defines the same type in a different object file. It then loads the debug info from that symbol file into the gdb session. However, I do think I see what you're getting at. It is very similar to my 2nd solution [here](https://stackoverflow.com/a/70162512/1366368) except it is putting the actual type in another symbol file. Not sure if that is much better though. If I'm going to use a real type, why not just put it in the actual source? – Adrian Dec 08 '21 at 17:43
  • @Adrian In case you can't edit the source code? – user202729 Dec 08 '21 at 17:44
  • @user202729, Ah, I see that could be an advantage. Not in my case, but in general. – Adrian Dec 08 '21 at 17:45
  • Another idea: make a new gdb.Value of the old type (or its pointer or something, basically just something unique pretty-printable), but "special case" its address so the pretty printer treat it specially instead. May leak a lot of extra memory. – user202729 Dec 08 '21 at 17:45
  • @ssbssa, can you show some example code on how to let the `children` function return a customized user defined pretty printer. The detailed question is that we have see a similar question in wxWidgets' forum, [[solved] Creating a gdb Pretty-Printer for wxIPV4address](https://forums.wxwidgets.org/viewtopic.php?p=211552#p211552) thanks. – ollydbg23 Apr 19 '22 at 07:18
  • @ollydbg23 I've read through the topic, so your problem is that the wxIPV4address pretty printer isn't used when it's a class member? I've never had this problem myself with gdb, and I would consider this a very serious bug. – ssbssa Apr 19 '22 at 09:18

2 Answers2

0

Well, I found an answer, but it has some caveats.

Basically, I'm reusing the type, which means, the object is being output as a type, but when doing so, I change the view of the type and tell it to display again. I used this trick when I was working with Natviz for MS Visual Studio. Unfortunately, because gdb doesn't really have a 'view' system, I kludged one together by having a state for all bufferPrinters.

class bufferPrinter:
  # Which view of the bufferPrinter to see.
  view = 0
  # Parameters passed to the child view.
  stack = []
  
  def __init__(self, val) -> None:
    self.val = val
    self.begin = self.val['buf'].cast(gdb.lookup_type('char').pointer())
    self.end = self.val['end']

  def payload_size(self, packet_it):
    return packet_it[0] \
      .cast(gdb.lookup_type('unsigned char')) \
      .cast(gdb.lookup_type('int'))

  def cmd_id(self, packet_it):
      return packet_it[1].cast(gdb.lookup_type('buffer::id'))

  def payload(self, packet_it):
      return packet_it + 2

  def to_string(self):
    if bufferPrinter.view == 0:
      return 'buffer @ {}'.format(self.val.address)
  
  def children(self):
    packet_it = self.begin
    if bufferPrinter.view == 0:
      packet_counter = 0
      while packet_it < self.end:
        # Setting the view should be done before viewing self.val
        bufferPrinter.view = 1
        # A stack is a bit overkill in this situration as it is not
        # a recursive structure, but is here as a generic solution
        # to pass parameters to the next level.
        bufferPrinter.stack.append(packet_it)
        yield '{}[{}]'.format(self.cmd_id(packet_it), packet_counter), self.val
        packet_counter += 1
        packet_it += 2 + self.payload_size(packet_it)
      if packet_it != self.end:
        yield 'ERROR', 'Jumped {} bytes past end.'.format(packet_it - self.end)
    else:
      # Setting the view immediately and poping the stack to ensure
      # that they're not forgotten before leaving.
      bufferPrinter.view = 0
      packet_it    = bufferPrinter.stack.pop()
      payload_size = self.payload_size(packet_it)
      payload      = self.payload(packet_it)
      cmd_id       = self.cmd_id(packet_it)
      if str(cmd_id) == 'buffer::id1':
        if payload_size == 0:
          yield '[0]', '{}'
        elif payload_size == 4:
          yield '[0]', payload.cast(gdb.lookup_type('uint16_t').pointer())[0]
          payload += 2
          yield '[1]', payload[0].cast(gdb.lookup_type('unsigned char')).cast(gdb.lookup_type('int'))
          payload += 1
          yield '[2]', payload[0].cast(gdb.lookup_type('unsigned char')).cast(gdb.lookup_type('int'))
        else:
          yield 'ERROR', 'Invalid payload size of {} for {}.'.format(payload_size, cmd_id)
      elif str(cmd_id) == 'buffer::id2':
        if payload_size == 1:
          yield '[0]', payload[0].cast(gdb.lookup_type('unsigned char')).cast(gdb.lookup_type('int'))
        else:
          yield 'ERROR', 'Invalid payload size of {} for {}.'.format(payload_size, cmd_id)
      else:
        yield 'ERROR', 'id {} invalid.'.format(cmd_id)
    return 

def my_pp_fn(val):
  if str(val.type) == 'buffer': return bufferPrinter(val)

gdb.pretty_printers.append(my_pp_fn)

Which, for the following code:

#include <cstdint>
#include <vector>
struct alignas(std::uint16_t) buffer {
  enum id_e : char { id1 = 6, id2 };
  struct packet_header_t { unsigned char payload_size; id_e id; };
  // structure is: payload_size, id, payload[]
  char buf[13] = { 4, id1, 1, 0, 2, 3
                 , 0, id1
                 , 0, id1
                 , 1, id2, 1
                 };
  char* end = std::end(buf);
};

int main() {
  buffer b;
  // Have to use types buffer::packet_header_t and buffer:id or they aren't
  // saved in the symbol table.
  buffer::packet_header_t y = {};
  buffer::id_e x = buffer::id1;
  return 0; // <== set breakpoint here
}

Gives me:

(gdb) p b
$1 = buffer @ 0xc5f5bffad0 = {buffer::id1[0] = {[0] = 1, [1] = 2, [2] = 3}, buffer::id1[1] = {[0] = {}}, buffer::id1[2] = {[0] = {}}, buffer::id2[3] = {[0] = 1}}

VS Code will show this in the watch window:

VS Code watch window

Though this works on the command line, it doesn't work as expected when using within the VSCode IDE. Level 0 view of bufferPrinter and a level 1 view of bufferPrinter will get mixed up as VSCode will query the child elements directly, and even when it doesn't, which child is show may not be what is wanted. If gdb's pretty printing had a view system in place, this might be avoidable.

Although I've posted this as an answer, I am still holding out for a way to generate a pseudo-type so that this side-effect isn't a problem.

Adrian
  • 10,246
  • 4
  • 44
  • 110
0

Well, I've been able to do what I wanted to, except that instead of a pseudo-type, it is using an actual type that I had to put in myself. So given the following c++ code:

#include <cstdint>
#include <vector>
struct alignas(std::uint16_t) buffer {
  enum id_e : char { id1 = 6, id2 };
  struct packet_header_t { unsigned char payload_size; id_e id; };
  // structure is: payload_size, id, payload[]
  char buf[13] = { 4, id1, 1, 0, 2, 3
                 , 0, id1
                 , 0, id1
                 , 1, id2, 1
                 };
  char* end = std::end(buf);
};

int main() {
  buffer b;
  // Have to use types buffer::packet_header_t and buffer::id_e or they aren't
  // saved in the symbol table.
  buffer::packet_header_t y = {};
  buffer::id_e x = buffer::id1;
  return 0;
}

and this pretty-printer:

class bufferPrinterSuper:
  # Shared code between pretty-printers
  meaning = {
    #              +-- packet info
    #              |     +-- payload info
    #              |     | +-- element info
    #              v     v v
    'buffer::id1': { 0 : [                            ]                      
                   , 4 : [ [2, 'uint16_t*'            ]       
                         , [1, 'unsigned char*', 'int']  
                         , [1, 'unsigned char*', 'int']
                         ] 
                   }                                  
  , 'buffer::id2': { 1 : [ [1, 'unsigned char*', 'int']
                         ]
                   }
  }

  def payload_size(self, packet_it):
    return int(packet_it[0] \
      .cast(gdb.lookup_type('unsigned char')) \
      .cast(gdb.lookup_type('int')))

  def cmd_id(self, packet_it):
    return packet_it[1].cast(gdb.lookup_type('buffer::id_e'))

  def payload(self, packet_it):
    return packet_it + 2

  def get_value(self, it, element_info):
    for i in range(1, len(element_info)):
      if element_info[i][-1] == '*':
        pointer = gdb.lookup_type(element_info[i][0:-1]).pointer()
        it = it.cast(pointer).dereference()
      else:
        assert it.type.strip_typedefs() != gdb.TYPE_CODE_PTR
        value = gdb.lookup_type(element_info[i])
        it = it.cast(value)
    return it

class bufferHeaderPrinter(bufferPrinterSuper):
  def __init__(self, val):
    self.val = val
    self.begin = self.val['payload_size'].address
    self.end = self.begin + self.payload_size(self.val['payload_size'].address)

  def to_string(self):
    return 'packet @ {}'.format(self.val.address)

  def children(self):
    packet_it = self.begin
    cmd_id = self.cmd_id(packet_it)
    if str(cmd_id) in self.meaning:
      payload_info = self.meaning[str(cmd_id)]
      payload_size = self.payload_size(packet_it)
      if payload_size in payload_info:
        payload_it = packet_it + 2
        payload_info = payload_info[payload_size]
        payload_counter = 0
        for element_info in payload_info:
          yield '[{}]' \
            .format(payload_counter), self.get_value(payload_it, element_info)
          payload_it += element_info[0]
          payload_counter += 1

        # Error handling
        if payload_it > packet_it + 2 + payload_size:
          yield 'ERROR: payload_info {} exceeds payload size {}' \
            .format(payload_info, payload_size), 0
        elif packet_it + 2 + payload_size > payload_it:
          bytes_unaccounted_for = (packet_it - payload_it + 2 + payload_size) 
          # A warning because they could be padding
          yield "WARNING: payload_info doesn't account for {} bytes: {}" \
            .format(bytes_unaccounted_for
                , '['
                + ', '.join('{:02x}'.format(int(payload_it[i]))
                            for i in range(0, bytes_unaccounted_for))
                + ']'), 0
      else:
        yield 'ERROR: Size {} for id {} not recognized.'.format(payload_size, cmd_id), 0
    else:
      yield 'ERROR: Command {} not recognized.'.format(cmd_id), 0


class bufferPrinter(bufferPrinterSuper):
  def __init__(self, val) -> None:
    self.val = val
    self.begin = self.val['buf'].cast(gdb.lookup_type('char').pointer())
    self.end = self.val['end']

  def to_string(self):
    return 'buffer @ {}'.format(self.val.address)

  def children(self):
    packet_it = self.begin
    packet_counter = 0
    while packet_it < self.end:
      cmd_id = self.cmd_id(packet_it)
      yield '[{}] {}({})' \
        .format(packet_counter, self.cmd_id(packet_it), self.payload_size(packet_it)) \
        , packet_it.cast(gdb.lookup_type('buffer::packet_header_t').pointer()).dereference()
      packet_counter += 1
      packet_it += 2 + self.payload_size(packet_it)

    if packet_it != self.end:
      yield 'ERROR', 'Jumped {} bytes past end.'.format(packet_it - self.end)
    return 

def my_pp_fn(val):
  if str(val.type) == 'buffer': return bufferPrinter(val)
  if str(val.type) == 'buffer::packet_header_t': return bufferHeaderPrinter(val)

gdb.pretty_printers.append(my_pp_fn)

I get the following output:

(gdb) p b
$1 = buffer @ 0x8c47fffa20 = {[0] buffer::id1(4) = packet @ 0x8c47fffa20 = {[0] = 1, [1] = 2, [2] = 3}, [1] buffer::id1(0) = packet @ 0x8c47fffa26, [2] buffer::id1(0) = packet @ 0x8c47fffa28, [3] buffer::id2(1) = packet @ 0x8c47fffa2a = {[0] = 1}}

Some issues with this is that I have to ensure that I use this header type so that it remains in the symbol table. This can be a bit tricky as the optimizer may wish to remove it if it deems it unnecessary, which in truth it isn't necessary for the operator of the program, this is only for debugging.

Unless someone can tell me how I can generate a pseudo-type, or some other way to generate collapsible children, then I think I'll have to mark this as the answer, which I rather wouldn't. sigh

user202729
  • 3,358
  • 3
  • 25
  • 36
Adrian
  • 10,246
  • 4
  • 44
  • 110