parsing ns2 trace file

Question

I'm using NS 2.35 and am trying to determine the end-to-end delay of my routing algorithm.

I think anyone with some good scripting experience should be able to answer this question, sadly that person is not me.

I have a trace file, that looks something like this:

- -t 0.548 -s 2 -d 7 -p cbr -e 500 -c 0 -i 1052 -a 0 -x {2.0 17.0 6 ------- null}
h -t 0.548 -s 2 -d 7 -p cbr -e 500 -c 0 -i 1052 -a 0 -x {2.0 17.0 -1 ------- null}
+ -t 0.55 -s 2 -d 7 -p cbr -e 500 -c 0 -i 1056 -a 0 -x {2.0 17.0 10 ------- null}
+ -t 0.555 -s 2 -d 7 -p cbr -e 500 -c 0 -i 1057 -a 0 -x {2.0 17.0 11 ------- null}
r -t 0.556 -s 2 -d 7 -p cbr -e 500 -c 0 -i 1047 -a 0 -x {2.0 17.0 1 ------- null}
+ -t 0.556 -s 7 -d 12 -p cbr -e 500 -c 0 -i 1047 -a 0 -x {2.0 17.0 1 ------- null}
- -t 0.556 -s 7 -d 12 -p cbr -e 500 -c 0 -i 1047 -a 0 -x {2.0 17.0 1 ------- null}

But here is what I need to do.

A line that starts with + is when a new packet is added to the network. A line starting with r is when a packet has been received by the destination. the double-typed number after the -t is the time at which that event happened. And finally, after the -i is the identity of the packet.

For me to calculate average end-to-end delay, I need to find every line that has a certain id after the -i. from there I need to calculate the timestamp of the r minus the timestamp of the +

So I figure there could be a regular expression separated by spaces. I could put each of the segements into their own variables. Then I would check the 15th (the packet ID).

But I'm not sure where to go from there, or how to put it all together.

I know there are some AWK scripts on the web for doing this, but they are all outdated and don't fit the current format (and I'm not sure how to change them).

Any help would be greatly appreciated.

EDIT:

Here is an example of a full packet route that I'm looking to find. I've taken out a lot of lines in between these ones, so that you can see a single packets events.

# a packet is enqueued from node 2 going to node 7. It's ID is 1636. this was at roughly 1.75sec 
+ -t 1.74499999999998 -s 2 -d 7 -p cbr -e 500 -c 0 -i 1636 -a 0 -x {2.0 17.0 249 ------- null}
# at 2.1s, it left node 2. 
- -t 2.134 -s 2 -d 7 -p cbr -e 500 -c 0 -i 1636 -a 0 -x {2.0 17.0 249 ------- null}
# at 2.134 it hopped from 2 to 7 (not important)
h -t 2.134 -s 2 -d 7 -p cbr -e 500 -c 0 -i 1636 -a 0 -x {2.0 17.0 -1 ------- null}
# at 2.182 it was received by node 7
r -t 2.182 -s 2 -d 7 -p cbr -e 500 -c 0 -i 1636 -a 0 -x {2.0 17.0 249 ------- null}
# it was the enqueued by node 7 to be sent to node 12
+ -t 2.182 -s 7 -d 12 -p cbr -e 500 -c 0 -i 1636 -a 0 -x {2.0 17.0 249 ------- null}
# slightly later it left node 7 on its was to node 12
- -t 2.1832 -s 7 -d 12 -p cbr -e 500 -c 0 -i 1636 -a 0 -x {2.0 17.0 249 ------- null}
# it hopped from 7 to 12 (not important)
h -t 2.1832 -s 7 -d 12 -p cbr -e 500 -c 0 -i 1636 -a 0 -x {2.0 17.0 -1 ------- null}
# received by 12
r -t 2.2312 -s 7 -d 12 -p cbr -e 500 -c 0 -i 1636 -a 0 -x {2.0 17.0 249 ------- null}
# added to queue, heading to node 17
+ -t 2.2312 -s 12 -d 17 -p cbr -e 500 -c 0 -i 1636 -a 0 -x {2.0 17.0 249 ------- null}
# left for node 17
- -t 2.232 -s 12 -d 17 -p cbr -e 500 -c 0 -i 1636 -a 0 -x {2.0 17.0 249 ------- null}
# hopped to 17 (not important)
h -t 2.232 -s 12 -d 17 -p cbr -e 500 -c 0 -i 1636 -a 0 -x {2.0 17.0 -1 ------- null}
# received by 17 notice the time delay
r -t 2.28 -s 12 -d 17 -p cbr -e 500 -c 0 -i 1636 -a 0 -x {2.0 17.0 249 ------- null}

The ideal output of the script would recognize 2.134 as the start time, and 2.28 as the end, and then give me the delay of 0.146sec. It would do this for all packet IDs and only report the average.

It was requested that I expand a bit on how the file works, and what I am expecting.

The file is listing descriptions of about 10,000 packets. Each packet can be in a different state. The important states are + which means a packet has been enqueued at a router, and r which means the packet has been received by its destination.

It is possible that a packet that is enqueued (so a + entry) is not actually received and is instead dropped. This means we cannot assume that for every + entry there will be a r entry.

What I'm trying to measure is the average end to end delay. What this means, is that if you look at a single packet, it will have a time it was enqueued, and a time it was received. I need to make this calculation to find its end-to-end delay. But I also need to do it for 9,999 other packets to get an average.

I've thought about it more, and heres generally how I think the algorithm needs to work.

remove all lines that don't begin with a + or an r because they are unimportant.
go through all of the packet IDs (that is the numbers after -i, such as 1052 in the example), and put them into some sort of groups (multiple arrays perhaps).
each group should now contain all of the information about a particular packet.
inside the group, check if there is a +, ideally we want the very first +. Record its time.
look for any more + lines. Look at their time. It's possible the log is slightly jumbled. So its possible there is a + line later on that is actually earlier in the simulation.
If this new + line has an earlier time, then update the time variable with that.
assuming there are no more + lines, look for an r line.
if there is no r line, the packet was dropped so don't worry about it.
for every r line you find, all we need to do is find the one who has the lastest timestamp
The r line with the latest timestamp is where the packet was finally received.
subtract the + time from the r time, this gives us the time it took for the packet to travel.
Add this value to an array so that later it can be averaged.
repeat this process on every packet ID group, and then finally average the created array of delays.

Thats a lot of typing, but I think its as clear as I can be in what I want. I wish i was a regex master, but I just don't have time to learn it well enough to pull this off.

Thanks for all your help, and let me know if you have any questions.

If you don't mind, can you show what your end result should look like? It's a little difficult to understand. — jaypal singh, Dec 01 '11 at 04:43
1) When you run this, will it be for one specific identity, or do you want to do this matching for all identities that are in the file? 2) How large/how many lines do you expect in your working files? Are we talking about multi-MB files? — Elroy Flynn, Dec 01 '11 at 05:44
What do you need? A regex to put each item into a matching group, or just to match the `-i ####` section? — Nightfirecat, Dec 01 '11 at 06:01
Can you give us some more data to work with. The only complete example I can see (1047) works out to 0 is that correct ? — , Dec 01 '11 at 09:31
I've added more data, what the expected output is, and some general pseudo-code on how to solve the problem (from my perspective). To answer the questions posed here... ElroyFlynn It will need to match each ID and put their events into an array. In short, it will need to match many things, because I need an average. @Iain That is not correct. 1047 is received by node 7, then sent to 12. But I never showed it being received by 12. So its not calculate-able from that — user974703, Dec 01 '11 at 15:14
By "double-typed", it seems that you mean "floating point". :) — Kaz, Jun 14 '13 at 21:17
If + means that a new packet is added, why in your data are there references to a packet before the first + line which mentions it? In what sense is it "new", if it has already been logged at earlier times? — Kaz, Jun 14 '13 at 21:20

flesk · Answer 1 · 2011-12-02T09:06:37.863

There's not much to work with here, as Iain said in the comments to your question, but if I understand what you want to do correctly, something like this should work:

awk '/^[+r]/{$1~/r/?r[$15]=$2:r[$15]?d[$15]=r[$15]-$2:1} END {for(p in d){sum+=r[p];num++}print sum/num}' trace.file

It skips all lines not starting with '+' or 'r'. If the line starts with 'r' it adds time to the r array. Otherwise, it calculates the delay and adds it to the d array if the element is found in the r array. Finally it loops over the elements in the d array, adds up the total delay and number of elements and calculates the average from this. In your case the average is 0.

The :1 at the end of the main block is just in there so I can get away with a ternary expression instead of the significantly more verbose if statement.

EDIT: New expression to work with the added conditions:

awk '/^[+r]/{$1~/r/?$3>r[$15]?r[$15]=$3:1:!a[$15]||$3<a[$15]?a[$15]=$3:1} END {for(i in r){sum+=r[i]-a[i];num++}print "Average delay", sum/num}'

or as an awk-file

/^[+r]/ {
  if ($1 ~ /r/) {
    if ($3 > received[$15])
      received[$15] = $3;
  } else {
    if (!added[$15] || $3 < added[$15])
      added[$15] = $3;
  }
} END {
  for (packet in received) {
    sum += received[packet] - added[packet];
    num++
  }
  print "Average delay", sum/num
}

According to your algorithm it seems like 1.745 would be the start time, while you write that 2.134 is.

parsing ns2 trace file

1 Answers1