1

I've been playing around with scapy and want to read through and analyse every hex byte. So far I've been using scapy simply because I don't know another way currently. Before just writing tools myself to go through the pcap files I was wondering if there was an easy way to do it. Here's what I've done so far.

packets = rdpcap('file.pcap')
tcpPackets = []
  for packet in packets:
    if packet.haslayer(TCP):
      tcpPackets.append(packet)

When I run type(tcpPackets[0]) the type I get is:

<class 'scapy.layers.l2.Ether'>

Then when I try to covert the Ether object into a string it gives me a mix of hex and ascii (as noted by the random parenthesis and brackets).

str(tcpPackets[0])
"b'$\\xa2\\xe1\\xe6\\xee\\x9b(\\xcf\\xe9!\\x14\\x8f\\x08\\x00E\\x00\\x00[:\\xc6@\\x00@\\x06\\x0f\\xb9\\n\\x00\\x01\\x04\\xc6)\\x1e\\xf1\\xc0\\xaf\\x07[\\xc1\\xe1\\xff0y<\\x11\\xe3\\x80\\x18 1(\\xb8\\x00\\x00\\x01\\x01\\x08\\n8!\\xd1\\x888\\xac\\xc2\\x9c\\x10%\\x00\\x06MQIsdp\\x03\\x02\\x00\\x05\\x00\\x17paho/34AAE54A75D839566E'"

I have also tried using hexdump but I can't find a way to parse through it.

Henry C Wong
  • 101
  • 1
  • 10
  • `str(...)` here probably just confuses you, because it converts your bytes string/stream into a text representation with the `b` still in the start. Instead, do `tcpPackets[0].decode('hex')`. An example would be `str(b'moo')` will become `"b'moo'"` when in fact it's `b'moo'` to begin with, but as a representation (to tell you) that it's a bytes string, `str()` will "honor" and keep the bytes indicator there so you can understand the data type better (very loosely described). – Torxed May 11 '20 at 19:29
  • @Torxed .decode (or in 3.7 I'm using ```codec.decode()```) doesn't work because it doesn't take scrapy Ether() objects, only byte-like object. And I can't seem to find a way to convert Ether() to anytype of object instead str(). – Henry C Wong May 11 '20 at 19:34
  • I see, I didn't know it was a `scapy.layers.l2.Ether` instance, I thought you managed to extract the raw representation of the ethernet frame as the data suggested. But in that case, you probably aught to use `packet.dst` or `packet.src` instead. Or try the some what nasty hack I references to below. – Torxed May 11 '20 at 19:43

2 Answers2

2

I can't find the proper dupe now, but this is just a miss-use/miss-understanding of str(). The original data is in a bytes format, for instance x = b'moo'.

When str() retrieves your bytes string, it will do so by calling the __str__ function of the bytes class/object. That will return a representation of itself. The representation will keep b at the beginning because it's believed to distinguish and make it easier for humans to understand that it's a bytes object, as well as avoid encoding issues I guess (alltho that's speculations).

Same as if you tried accessing tcpPackets[0] from a terminal, it would call __repr__ and show you something like <class 'scapy.layers.l2.Ether'> most likely.

As an example code you can experiment with, try this out:

class YourEther(bytes):
    def __str__(self):
        return '<Made Up Representation>'

print(YourEther())

Obviously scapy's returns another representation, not just a static string that says "made up representation". But you probably get the idea.

So in the case of <class 'scapy.layers.l2.Ether'> it's __repr__ or __str__ function probably returns b'$\\xa2\\....... instead of just it's default class representation (some correction here might be in place tho as I don't remember/know all the technical namification of the behaviors).

As a workaround, this might fix your issue:

hexlify(str(tcpPackets[0]))

All tho you probably have to account for the prepended b' as well as trailing ' and remove those accordingly. (Note: " won't be added in the beginning or end, those are just a second representation in your console when printing. They're not actually there in terms of data)

Scapy is probably more intended to use tcpPackets[0].dst rather than grabing the raw data. But I've got very little experience with Scapy, but it's an abstraction layer for a reason and it's probably hiding the raw data or it's in the core docs some where which I can't find right now.

More info on the __str__ description: Does python `str()` function call `__str__()` function of a class?

Last note, and that is if you actually want to access the raw data, it seams like you can access it with the Raw class: Raw load found, how to access?

Torxed
  • 22,866
  • 14
  • 82
  • 131
0

You can put all the bytes of a packet into a numpy array as follows:

for p in tcpPackets:
    raw_pack_data = np.frombuffer(p.load, dtype = np.uint8)
    # Manipulate the bytes stored in raw_pack_data as you like.

This is fast. In my case, rdpcap takes ~20 times longer than putting all the packets into a big array in a similar for loop for a 1.5GB file.

Marcel
  • 393
  • 3
  • 9