0

I need to plot a few index values as addressed in Pandas - Calculate Relative time from csv

Sample data
It is a huge file,this is just a snippet of it

highest_layer,transport_layer,src_ip,dst_ip,src_port,dst_port,ip_flag,packet_length,transport_flag,time,timestamp,geo_country,data
    LAN_DISCOVERY,UDP,192.168.1.6,224.0.0.251,5353,5353,0,82,-1,2020-06-10 19:38:08.479232,1591832288479,Unknown,        LAN_DISCOVERY,UDP,fe80::868:621b:c2ff:cee2,ff02::fb,5353,5353,-1,102,-1,2020-06-10 19:38:08.479261,1591832288479,Unknown,        LAN_DISCOVERY,UDP,192.168.1.3,192.168.1.6,5353,5353,16384,409,-1,2020-06-10 19:38:08.506399,1591832288506,Unknown,
    DNS,UDP,192.168.1.6,192.168.1.1,32631,53,0,89,-1,2020-06-10 19:38:08.863846,1591832288863,Unknown,
    DNS,UDP,192.168.1.6,192.168.1.1,31708,53,0,79,-1,2020-06-10 19:38:08.864186,1591832288864,Unknown,
    DNS,UDP,192.168.1.6,192.168.1.1,16807,53,0,79,-1,2020-06-10 19:38:08.866492,1591832288866,Unknown,
    SSDP,UDP,192.168.1.6,239.255.255.250,58185,1900,0,216,-1,2020-06-10 19:38:08.887298,1591832288887,Unknown,
    TCP,TCP,192.168.1.6,208.117.252.25,53725,443,16384,66,16,2020-06-10 19:38:10.107603,1591832290107,Unknown,
    TCP,TCP,192.168.1.6,208.117.252.25,53725,443,16384,66,16,2020-06-10 19:38:10.109444,1591832290109,Unknown,
    TCP,TCP,192.168.1.6,208.117.252.25,53725,443,16384,66,16,2020-06-10 19:38:10.109847,1591832290109,Unknown,
    TCP,TCP,192.168.1.6,208.117.252.25,53725,443,16384,66,16,2020-06-10 19:38:10.111238,1591832290111,Unknown,
    TCP,TCP,192.168.1.6,208.117.252.25,53725,443,16384,66,16,2020-06-10 19:38:10.111676,1591832290111,Unknown,

The code:

datadis = pd.read_csv('data.txt', sep=',')
dfd = (datadis[(datadis.src_port == 53725)])
if not dfd.empty:  # only proceed if the dataframe is not empty
    dfd1 = dfd.drop(columns=['highest_layer', 'transport_layer','ip_flag', 'transport_flag','geo_country','data']).reset_index()
    dfd1.index = dfd1['timestamp'] - dfd1.loc[0,'timestamp']
    dfd2 = dfd1.groupby(['src_ip'])['packet_length'].cumsum()
    dfd2.plot(x='timestamp',y=['packet_length'])

I want to plot relative timestamp(dfd1.index) in the x-axis and dfd2 in y-axis. Say if the difference in time stamp starts at 3000, i want the plot to start from 3000 and not 0(in the example given above it starts at 0) in the x-axis.

  • Could you please add some sample data to the question? See https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples – Roy2012 Jun 25 '20 at 04:24
  • I'm not sure if the request makes sense. The index is `dfd1['timestamp'] - dfd1.loc[0,'timestamp']`, which means the **first value** in the `timestamp` column is being subtracted from each value in the `timestamp` column. This means, the first value in the `timestamp` column is always subtracted from itself, so the first index value will always be `0`. – Trenton McKinney Jun 25 '20 at 04:44
  • @TrentonMcKinney yes it is – user13770492 Jun 25 '20 at 05:03
  • @Roy2012 it was me who posted the same question. anyways, i add the data for this question too – user13770492 Jun 25 '20 at 05:05
  • After you filter by port, the data frame is empty. It would be great if you could post a minimal reproducible example. A handful of rows should suffice. – Roy2012 Jun 25 '20 at 05:16
  • @Roy2012. Sorry, since the file had lots of data, i myself was confused. I have updated a few more rows now – user13770492 Jun 25 '20 at 05:40

1 Answers1

0

If I understand correctly you're trying to plot the timestamp from one dataframe (dfd2) against a column from another dataframe (dfd1).

The easy way to do that is as follows:

import matplotlib.pyplot as plt
plt.plot(dfd2, dfd1.packet_length)
plt.show()

The result, for the sample data in the question, is:

enter image description here

As you can see, the x-axis doesn't start with 0 - but rather with ~66 which is the first timestamp in this data.

Roy2012
  • 11,755
  • 2
  • 22
  • 35