This question is an extension of How can I sum rows that with non-numeric factor in R?. I have data frame in data.txt looking like:
Latency Port TrafficType Time
1 27821 Port1 ssh "2016/02/05 15:18:25"
2 24186 Port1 http "2016/02/05 15:18:25"
3 17963 Port1 ssh "2016/02/05 15:18:25"
4 20208 Port1 ftp "2016/02/05 15:18:25"
5 20703 Port2 ftp "2016/02/05 15:18:25"
6 29735 Port3 ssh "2016/02/05 15:18:25"
7 20975 Port1 https "2016/02/05 15:18:25"
8 29489 Port1 ssh "2016/02/05 15:18:25"
9 19319 Port4 ssh "2016/02/05 15:18:25"
10 18224 Port1 ssh "2016/02/05 15:18:25"
11 17952 Port1 ftp "2016/02/05 15:18:25"
12 17972 Port1 ssh "2016/02/05 15:18:25"
13 17300 Port1 ssh "2016/02/05 15:18:25"
14 20937 Port1 ssh "2016/02/05 15:18:25"
15 18769 Port1 ssh "2016/02/05 15:18:25"
16 18104 Port2 ssh "2016/02/05 15:18:25"
17 17496 Port2 ssh "2016/02/05 15:18:26"
18 23268 Port1 https "2016/02/05 15:18:26"
19 19457 Port1 ssh "2016/02/05 15:18:26"
20 20937 Port1 ssh "2016/02/05 15:18:25"
21 18769 Port1 ssh "2016/02/05 15:18:25"
22 18104 Port2 ssh "2016/02/05 15:18:25"
23 17496 Port2 ssh "2016/02/05 15:18:26"
24 23268 Port1 https "2016/02/05 15:18:26"
25 19457 Port1 ssh "2016/02/05 15:18:27"
....
I used tapply() to do some statistics:
data <- read.table("data.txt")
fact <- factor(data$Port)
lat <- tapply(data$Latency, fact,
function(x) {
c(max(x),
mean(x),
median(x),
quantile(x, c(0.90,0.99,0.9999)))
})
Then I got:
$Port1
90% 99% 99.99%
29489.00 20941.78 19832.50 25276.50 29205.44 29486.16
$Port2
90% 99% 99.99%
20703.00 18380.60 18104.00 19663.40 20599.04 20701.96
$Port3
90% 99% 99.99%
29735 29735 29735 29735 29735 29735
$Port4
90% 99% 99.99%
19319 19319 19319 19319 19319 19319
I wanted to append more statistics to the table above, like this:
$Port1
90% 99% 99.99% ftp http https ssh peak
29489.00 20941.78 19832.50 25276.50 29205.44 29486.16 2 1 3 12 14
$Port2
90% 99% 99.99% ftp http https ssh peak
20703.00 18380.60 18104.00 19663.40 20599.04 20701.96 1 0 0 4 3
$Port3
90% 99% 99.99% ftp http https ssh peak
29735 29735 29735 29735 29735 29735 ? ? ? ? ?
$Port4
90% 99% 99.99% ftp http https ssh peak
19319 19319 19319 19319 19319 19319 ? ? ? ? ?
yesterday, I asked in How can I sum rows that with non-numeric factor in R?, thanks to @akrun who taught me an approach applying table() function on the subset of data to get the counts of all traffic types:
t <- table(data[c("Port", "TrafficType")])
t
TrafficType
Port ftp http https ssh
Port1 2 1 3 12
Port2 1 0 0 4
Port3 0 0 0 1
Port4 0 0 0 1
Now, my question is:
how can I append this result to the table (after the 99.99% column)?
how can I compute the peak flow rate (flows/second) for each port? I.e., Port1 has 14 flows in 2016/02/05 15:18:25, 3 flows in 2016/02/05 15:18:26 and 1 in 2016/02/05 15:18:27, so its peak, I need a number 14 in the place.
Hopefully I described my question clear enough. Thanks a lot for your patience and kind response.
Updated: I found an ugly approach, that is computing the msg rate seperately:
rate_df <- as.data.frame(data[c("Port", "Time")])
rate_fc <- factor(rate_df$Port)
peak <- tapply(rate_df$Freq, rate_fc, max) # <-
then using print function to append the peak's values after latency. It looks so ugly. Need experts' advises here. Thanks a lot.