2

this is tcpdump originally in pcap, changed into txt. Input:- sip.txt

Ôò¡          Ü     ªkã_¹¦ R  R   hIÿkRT 4V E`D]  @9°Ã'ö%1æËÄÞ÷0ûðSIP/2.0 403 Forbidden
Via: SIP/2.0/UDP XXX.XX.XX.X:57079;branch=94tsjam66cmay5bpswyfta0nebw34zhfctjuuge2baevikbk03opf15t6wvovnb82mjih3v;received=IP;rport=57079
From: "IP" <sip:IP@IP>;tag=0c26cd11
To: <sip:XXXXXX@XXX.XX.XX.XXX>;tag=as3a5a21bf
Call-ID: 88c218486f66a6aa214d483d988dfa9c
CSeq: 2 INVITE
Server: Asterisk
Allow: INVITE, ACK, CANCEL, OPTIONS, BYE, REFER, SUBSCRIBE, NOTIFY, INFO, PUBLISH, PRACK, MESSAGE
Supported: replaces, timer
Reason: Q.850;cause=21
Content-Length: 0

My code

import re
from collections import defaultdict
import io

with io.open('file location','rb',encoding='utf-8') as f:

    text = f.readlines()
result = []
blocks = text.split('\n\n\n')

# print(blocks)

print(len(blocks))
IP_add_dict_list = defaultdict(list)
IP_add_dict_set = defaultdict(set)

for block in blocks:
    if ("CSeq: 1 INVITE" or "CSeq: 1 INVITE") in block:
        caller = r";received=\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3};"
        x = re.search(caller, block)
        callerIP = x.group()
        callerIP = callerIP[10:-1]
        to = r"To: <sip:(\d|[+]|(0-9))*@"
        y = re.search(to, block)
        toSip = y.group()
        toSip = toSip[9:-1]
        call = (callerIP, toSip)
        result.append(call)
        # print(callerIP, toSip)
        IP_add_dict_list[callerIP].append(toSip)
        IP_add_dict_set[callerIP].add(toSip)
print(IP_add_dict_list)
print("---------------")
print(IP_add_dict_set)

My Output:- error

with io.open('file location', 'rb', encoding='utf-8') as f:
ValueError: binary mode doesn't take an encoding argument
newbee
  • 29
  • 4
  • 1
    Please [edit] your question and give us something we can use to test. Show us the _exact_ contents of your input file (make sure to use the [formating tools](https://unix.stackexchange.com/help/formatting) to format it as code), and a minimal example of your script that actually runs and reproduces the problem. Of course, the error suggests that what you have isn't actually a UTF-8 file, but we can't know since you're not giving us the information we would need. – terdon Dec 23 '20 at 17:06
  • 3
    AFAIK [`.pcap` are binary files](https://wiki.wireshark.org/Development/LibpcapFileFormat). Changing name to `.txt` only changes the name. What is the file from? and what are you trying to do? – Kamil Maciorowski Dec 23 '20 at 17:06
  • This is the input text, I am getting after saving the file through tcpdump directly into txt. I just need to read the file, but because of this "Gibberish" language, I can't read it. Or possible, can I read the file through dpkt directly to the .pcap file. – newbee Dec 23 '20 at 18:05
  • 1
    @akki how do you save it? What exactly does "originally in pcap, changed into txt" involve? – ilkkachu Dec 23 '20 at 22:28
  • @ilkkachu I used simply -- tcpdump -i eth0 -s 1500 udp port 5060 -w sip.txt in my system to get txt file. – newbee Dec 23 '20 at 22:30
  • @alecxs ```tcpdump -w >> dump.txt``` would be great, because it will save all the strings , while ```-l``` only save a single line as I mentioned . – newbee Feb 16 '21 at 12:01
  • @alecxs -E show only ```tcpdump version 3.9.4 libpcap version 0.9.4``` and ```usage :``` – newbee Feb 16 '21 at 13:37
  • @alecxs perfect, please let me know – newbee Feb 16 '21 at 14:00
  • please try the `-A` flag: Print each packet in ASCII `tcpdump -lA > dump.txt` – alecxs Feb 16 '21 at 14:08
  • @newbee, the `CHANGES` file in Debian's tcpdump source has the date "September 19, 2005" next to "Summary for 3.9.4 tcpdump release". That's over 15 years ago. You might want to consider updating to a somewhat more current version if you face problems with the software. 4.9.x has existed since 2017. – ilkkachu Feb 16 '21 at 14:09
  • @alecxs if I am using -A , the result is like this ```length: 542 E`.8....@.o@.'.....J...\.$.~SIP/2.0 401 Unauthorized``` and in real it is ```length: 542 SIP/2.0 401 Unauthorized``` so basically -A is no use if I need actually string – newbee Feb 16 '21 at 14:33
  • hm.. on my linux the output is similar to `-w` except the first line is decoded into similar from `-l` (timestamp, address.port IP, port, length) – alecxs Feb 16 '21 at 14:37
  • @alecxs yes I tried everything, fortunately, I got result while doing ```>> dump.txt``` – newbee Feb 16 '21 at 14:40
  • can not reproduce this, no matter if `>` or `>>` dump.txt still contains packet in binary format. while with `-A` flag the strings get converted into ASCII. i can see the difference with/without `-A` with `cat dump.txt` but no difference in redirection `>` or `>>` – alecxs Feb 16 '21 at 14:43
  • regarding your python script - either open binary file without encoding or open ascii file with `'r'` `io.open('dump.txt','r',encoding='utf-8')` - but i run into next error https://stackoverflow.com/q/30042334 – alecxs Feb 16 '21 at 15:10
  • i recommend to upload any example sip.txt in binary format to any filehoster (must not contain any confidentiality) so others with python knowledge can test your script. i could neither run your script with python2 nor python3, that script has to be fixed. furthermore i recommend to convert sip.txt into ascii before actually processing with python: `tcpdump -r sip.txt -A > dump.txt` – alecxs Feb 16 '21 at 15:21
  • 1
    @alecxs also you can directly append -v into > txt file ```tcpdump -v > dump.txt``` – newbee Feb 16 '21 at 22:10

1 Answers1

2

You said in comments that the command you used to save the output was

tcpdump -i eth0 -s 1500 udp port 5060 -w sip.txt

tcpdump -w saves the captured packets in a binary format described e.g. here and here. The name of the file doesn't affect this. The description in the tcpdump man page says:

-w
Write the raw packets to file rather than parsing and printing them out.

It the -w option is not set, tcpdump instead parses the packets and prints them out in text. If you want to save this output, you can just redirect the output to a file, e.g. with tcpdump -l ... > dump.txt or tcpdump -l ... >> dump.txt, where the -l options tells it to make stdout line buffered, making sure the output gets actually written even if tcpdump gets terminated with e.g. Ctrl-C.

Using both -w and an output redirection probably will not do much, as with -w, there will be little to no output on standard output.

ilkkachu
  • 6,221
  • 16
  • 30
  • just need to add ```>> name.txt``` it will save as a text file – newbee Feb 14 '21 at 01:17
  • @alecxs I tried with > but the saved txt file was containing "ö%1æËÄÞ÷0ûð" things, but when I tried >> I got the perfect result i., text file withing any gibberish thing – newbee Feb 16 '21 at 09:53
  • @akki, you said in the comments to the Q that you used `tcpdump -w foo.txt`, that would be consistent with getting a binary file, because then it's tcpdump saving the raw packets to the file. `tcpdump > foo.txt` would redirect the usual (textual) output to the file, without changes, and it doesn't matter if it's `>` or `>>` there, the latter just appends while the first truncates the output file first. If you use both, `tcpdump -w foo.txt > foo.txt`, then you'd get the raw packets in the file again, because with `-w`, it doesn't output anything to stdout. – ilkkachu Feb 16 '21 at 10:13
  • @akki, it doesn't matter what filename you use. `>` redirects the standard output, `-w` asks tcpdump to save the binary data. You're not showing the actual commands you used, so we can't know what happens. – ilkkachu Feb 16 '21 at 10:15
  • @ilkkachu i also tried to using ``` tcpdump -w foo.txt > foo.txt ``` as ```tcpdump -w test2.txt > test2.txt ``` but got same gibberish language (▒+`▒" 00hI▒kRT4E`"▒d@▒▒▒'▒▒c,▒▒? into my txt file. – newbee Feb 16 '21 at 10:21
  • @akki, yes... because you're telling tcpdump to save the raw packets with `-w`. Don't do that if you don't want the raw packets, but the textual output instead. – ilkkachu Feb 16 '21 at 10:26
  • @ilkkachu in that case how to make sure (command ) that we only need textual output instead of raw ones. – newbee Feb 16 '21 at 10:31
  • @akki, honestly, there's nothing else to say than what I've already said, and I'm getting tired of repeating the same thing and for getting pings on these comments. For redirections (`>` and friends), see http://mywiki.wooledge.org/BashGuide/InputAndOutput . For the options to `tcpdump`, see the man page: https://man7.org/linux/man-pages/man1/tcpdump.1.html . – ilkkachu Feb 16 '21 at 10:39
  • i tried with ```tcpdump -v > dump.txt``` and it worked . – newbee Feb 16 '21 at 22:13