8

I am trying to analyse packets using Python's Scapy from the beginning. Upon recent searching, I found there is another module in python named as dpkt. With this module I can parse the layers of a packet, create packets, read a .pcap file and write into a .pcap file. The difference I found among them is:

  1. Missing of live packet sniffer in dpkt

  2. Some of the fields need to be unpacked using struct.unpack in dpkt.

Is there any other differences I am missing?

RatDon
  • 3,403
  • 8
  • 43
  • 85
wonder
  • 885
  • 1
  • 18
  • 32
  • How about performance? Did you test them to see how they compare? – ZLMN Mar 29 '16 at 15:11
  • 2
    Scapy gives a better performance as compared to dpkt. – wonder Jun 22 '16 at 09:18
  • Hmm, [I'm not sure dpkt is always faster - it depends on what you're doing](https://libtins.github.io/benchmark/) – Dan Jan 12 '17 at 21:32
  • 1
    But never forget about packet capture interface buffering. You must address this or suffer huge performance penalties for live monitoring – Dan Jan 12 '17 at 21:34
  • As I see it, dpkt is python-only while scapy is reaching out to c libraries. This actually made dpkt together with pypy our choice for parsing pcaps since it's a lot quicker thanks to reduced c<->python overhead. Your mileage may vary. – domenukk Nov 18 '17 at 16:48
  • 1
    The above is wrong. Scapy is pure python – Cukic0d Mar 24 '19 at 23:41

2 Answers2

7

Scapy is a better performer than dpkt.

  1. You can create, sniff, modify and send a packet using scapy. While dpkt can only analyse packets and create them. To send them, you need raw sockets.
  2. As you mentioned, Scapy can sniff live. It can sniff from a network as well as can read a .pcap file using the rdpcap method or offline parameter of sniff method.
  3. Scapy is generally used to create packet analyser and injectors. Its modules can be used to create a specific application for a specific purpose.

There might be many other differences also.

RatDon
  • 3,403
  • 8
  • 43
  • 85
  • 3
    Well, dpkt is a lot faster, like a factor 10 or so for some (rather common) operations. I am processing about 50 million packets in 15 minutes with some relatively basic operations but still. I wasn't ready to wait for the scapy implementation that I had before to finish. – AdamKalisz Apr 20 '20 at 20:40
  • @AdamKalisz I agree. But with a little tweak to scapy, scapy performs par with dpkt. Just need to change few common operations to inbuilt functions. – RatDon Apr 21 '20 at 08:05
  • 1
    Would be great, if you would point to a write up of these changes you mention and maybe get it merged into the main Scapy documentation. – AdamKalisz Apr 22 '20 at 15:57
4

I don't understand why people say that Scapy is better performer. I quickly checked as shown below and the winner is dpkt. It's dpkt > scapy > pyshark.

My input pcap file used for testing is about 12.5 MB. The time is derived with bash time command time python testing.py. In each snippet I ensure that the packet is indeed decoded from raw bites. One can assign variable FILENAME with the needed pcap-file name.

dpkt

from dpkt.pcap import *
from dpkt.ethernet import *
import os

readBytes = 0
fileSize  = os.stat(FILENAME).st_size

with open(FILENAME, 'rb') as f:
    for t, pkt in Reader(f):
        readBytes += len(Ethernet(pkt))
        print("%.2f" % (float(readBytes) / fileSize * 100))

The average time is about 0.3 second.


scapy -- using PcapReader

from scapy.all import *
import os

readBytes = 0
fileSize  = os.stat(FILENAME).st_size

for pkt in PcapReader(FILENAME):

    readBytes += len(pkt)
    print("%.2f" % (float(readBytes) / fileSize * 100))

The average time is about 4.5 seconds.


scapy -- using RawPcapReader

from scapy.all import *
import os

readBytes = 0
fileSize  = os.stat(FILENAME).st_size

for pkt, (sec, usec, wirelen, c) in RawPcapReader(FILENAME):

    readBytes += len(Ether(pkt))
    print("%.2f" % (float(readBytes) / fileSize * 100))

The average time is about 4.5 seconds.


pyshark

import pyshark
import os

filtered_cap = pyshark.FileCapture(FILENAME)

readBytes = 0
fileSize  = os.stat(FILENAME).st_size

for pkt in filtered_cap:
     readBytes += int(pkt.length)
     print("%.2f" % (float(readBytes) / fileSize * 100))

The average time is about 12 seconds.


I do not advertise dpkt at all -- I do not care. The point is that I need to parse 8GB files currently. So I checked that with dpkt the above-written code for a 8GB pcap-file is done for 4.5 minutes which is bearable, while I would not even wait for other libraries to ever finish. At least, this is my quick first impression. If I have some new information I will update the post.

JenyaKh
  • 2,040
  • 17
  • 25
  • 1
    There shouldn't really be a comparison. In general, you probably shouldn't use any Python-based software to handle lots and lots of data very fast. `dpkt` is faster, but it's waaayyy easier to add your own layer in `scapy`. `pyshark` wraps `tshark` = wireshark, so you can't add anything at all, but it benefits from Wireshark stability and variety. – Cukic0d May 14 '19 at 23:14
  • 1
    FTR if you try Scapy using pypy, it speeds it up x4 – Cukic0d Dec 20 '19 at 13:26
  • For those only parsing large PCAP files, I found that pypacker https://gitlab.com/mike01/pypacker is 3x faster to parse than dpkt. – Pickled Jul 29 '21 at 11:05