7

I am comparing scapy and dpkt in terms of speed. I have a directory with pcap files which I parse and count the http requests in each file. Here's the scapy code :

import time
from scapy.all import *

def parse(f):
 x = 0
 pcap = rdpcap(f)
 for p in pcap:
    try:
        if p.haslayer(TCP) and p.getlayer(TCP).dport == 80 and p.haslayer(Raw):
            x = x + 1
    except:
        continue
print x

if __name__ == '__main__':\

  path = '/home/pcaps'
  start = time.time()
  for file in os.listdir(path):
    current = os.path.join(path, file)
    print current
    f = open(current)
    parse(f)
    f.close()
 end = time.time()
 print (end - start)

The script is really slow (it gets stuck after a few minutes) compared to the dpkt version :

import dpkt
import time
from os import walk
import os
import sys


def parse(f):
 x = 0
 try:
    pcap = dpkt.pcap.Reader(f)
 except:
    print "Invalid Header"
    return
 for ts, buf in pcap:
        try:
            eth = dpkt.ethernet.Ethernet(buf)
        except:
            continue
        if eth.type != 2048:
             continue
        try:
            ip = eth.data
        except:
            continue

        if ip.p == 6:
            if type(eth.data) == dpkt.ip.IP:
                tcp = ip.data


                if tcp.dport == 80:
                    try:
                        http = dpkt.http.Request(tcp.data)
                        x = x+1
                    except:
                        continue

print x

if __name__ == '__main__':

path = '/home/pcaps'
start = time.time()
for file in os.listdir(path):
    current = os.path.join(path, file)
    print current
    f = open(current)
    parse(f)
    f.close()
end = time.time()
print (end - start)

So it there something wrong with the way I am using scapy? Or is it just that scapy is slower than dpkt?

Bellerofont
  • 1,081
  • 18
  • 17
  • 16
svink
  • 101
  • 1
  • 9
  • Haven't you already answered your own question mostly? If the first is truly hanging as opposed to just taking a long time, then you either a) have your answer or b) we can't tell without your input data. I would be most surprised if scapy doesn't eventually raise a Python exception, but I've been surprised before. – msw Jan 16 '17 at 21:37
  • 1
    Well I haven't answered my question, I wanted to know if my scapy code was flawed or is scapy really slower than dpkt. I tried on a single capture and the difference in speed was x20. My input are large pcap files (300 MB+) – svink Jan 16 '17 at 21:59
  • Try `from scapy.utils import PcapReader`? This one doesn't read all packets at once. – lilydjwg May 30 '17 at 07:29
  • @lilydjwg yes I have tried PcapReader and used it as an iterator, but didn't see a noticeable difference . `with PcapReader('file.pcap') as pack: for p in pack: ..... ` – svink Jun 06 '17 at 19:49

1 Answers1

1

You inspired me to compare. 2 GB PCAP. Dumb test. Simply counting the number of packets.

I'd expect this to be in single digit minutes with C++ / libpcap just based on previous timings of similar sized files. But this is something new. I wanted to prototype first. My velocity is generally higher in Python.

For my application, streaming is the only option. I'll be reading several of these PCAPs simultaneously and doing computations based on their contents. Can't just hold in memory. So I'm only comparing streaming calls.

scapy 2.4.5:

from scapy.all import *
import datetime

i=0
print(datetime.datetime.now())
for packet in PcapReader("/my.pcap"):
    i+=1
else:
    print(i)
    print(datetime.datetime.now())

dpkt 1.9.7.2:

import datetime
import dpkt
print(datetime.datetime.now())
with open(pcap_file, 'rb') as f:
    pcap = dpkt.pcap.Reader(f)
    i=0
    for timestamp, buf in pcap:
        i+=1
    else:
        print(i)
        print(datetime.datetime.now())

Results:

Packet count is the same. So that's good. :-)

dkpt - Just under 10 minutes.

scapy - 35 minutes.

dkpt went first. So if disk cache were helping a package, it would be scapy. And I think it might be marginally. I did this previously with scapy only, and it was over 40 minutes.

In summary, thanks for your 5 year old question. It's still relevant today. I almost bailed on Python here because of the overly long read speeds from scapy. dkpt seems substantially more performant.

Side note, alternative packages:

https://pypi.org/project/python-libpcap/ I'm on python 3.10 and 0.4.0 seems broken for me, unfortunately.

https://pypi.org/project/libpcap/ I'd like to compare timings to this, but have found it much harder to get a minimal example going. Haven't spent much time though, to be fair.

Evan
  • 2,441
  • 23
  • 36
  • Digging a little further, I see dpkt does less work in that iteration than scapy does. I'm now doing dpkt.ethernet.Ethernet(buf) in the loop, and timings have not changed appreciably. Just over 10 minutes instead of just under for dpkt when doing the line above and unpacking the start of the data. – Evan May 08 '22 at 16:13