1

I'm trying to take data from a CSV file that looks like this:

datetime,bytes
2014-10-24T10:38:49.453565,52594
2014-10-24T10:38:49.554342,86594
2014-10-24T10:38:49.655055,196754
2014-10-24T10:38:49.755772,272914
2014-10-24T10:38:49.856477,373554
2014-10-24T10:38:49.957182,544914
2014-10-24T10:38:50.057873,952914
2014-10-24T10:38:50.158559,1245314
2014-10-24T10:38:50.259264,1743074

and compute rates of change of the bytes value (which represents the number of bytes downloaded so far into a file), in a way that accurately reflects my detailed time data for when I took the sample (which should approximately be every 1/10 of a second, though for various reasons, I expect that to be imperfect).

For example, in the above sampling, the second row got (86594-52594=)34000 additional bytes over the first, in (.554342-.453565=).100777 seconds, thus yielding (34000/0.100777=)337,378 bytes/second.

A second example is that the last row compared to its predecessor got (1743074-1245314=)497760 bytes in (.259264-.158559=).100705 seconds, thus yielding (497760/.100705=)4,942,753 bytes/sec.

I'd like to get a graph of these rates over time, and I'm fairly new to R, and not quite figuring out how to get what I want.

I found some related questions that seem like they might get me close:

But none of them seem to quite get me there... When I try using strptime, I seem to lose the precision (even using %OS); plus, I'm just not sure how to plot this as a series of deltas with timestamps associated with them... And the stuff in that one answer (second link, the answer with the AAPL stock delta graph) about diff(...) and -nrow(...) makes sense to me at a conceptual level, but not deeply enough that I understand how to apply it in this case.

I think I may have gotten close, but would love to see what others come up with. What options do I have for this? Anything that could show a rolling average (over, say, a second or 5 seconds), and/or using nice SI units (KB/s, MB/s, etc.)?

Edit:

I think I may be pretty close (or even getting the basic question answered) with:

my_data <- read.csv("my_data.csv")
my_deltas <- diff(my_data$bytes)
my_times <- strptime(my_data$datetime, "%Y-%m-%dT%H:%M:%S.%OS")
my_times <- my_times[2:nrow(my_data)]
df <- data.frame(my_times,my_deltas)
plot(df, type='l', xlab="When", ylab="bytes/s")

It's not terribly pretty (especially the y axis labels, and the fact that, with a longer data file, it's all pretty crammed with spikes), though, and it's not getting the sub-second precision, which might actually be OK for the larger problem (in the bigger graph, you can't tell, whereas with the sample data above, you really can), but still is not quite what I was hoping for... so, input still welcomed.

Community
  • 1
  • 1
lindes-hw
  • 939
  • 8
  • 11

1 Answers1

0

A possible solution:

# reading the data
df <- read.table(text="datetime,bytes
2014-10-24T10:38:49.453565,52594
2014-10-24T10:38:49.554342,86594
2014-10-24T10:38:49.655055,196754
2014-10-24T10:38:49.755772,272914
2014-10-24T10:38:49.856477,373554
2014-10-24T10:38:49.957182,544914
2014-10-24T10:38:50.057873,952914
2014-10-24T10:38:50.158559,1245314
2014-10-24T10:38:50.259264,1743074", header=TRUE, sep=",")

# formatting & preparing the data
df$bytes <- as.numeric(df$bytes)
df$datetime <- gsub("T"," ",df$datetime)
df$datetime <- strptime(df$datetime, "%Y-%m-%d %H:%M:%OS")
df$sec <- as.numeric(format(df$datetime, "%OS6"))

# calculating the change in bytes per second 
df$difftime <- c(NA,diff(df$sec))
df$diffbytes <- c(NA,diff(df$bytes))
df$bytespersec <- df$diffbytes / df$difftime

# creating the plot
library(ggplot2)
ggplot(df, aes(x=sec,y=bytespersec/1000000)) +
  geom_line() +
  geom_point() +
  labs(title="Change in bytes\n", x="\nWhen", y="MB/s\n") +
  theme_bw()

which gives:

enter image description here

Jaap
  • 81,064
  • 34
  • 182
  • 193
  • I think this misses what I was hoping for, in that the labels on the Y axis are in scientific notation, rather than more human-friendly MB/s or similar, and that the X axis labels seem to lack the actual datestamp (I want to know what time, not just relative to my first entry). I've mostly moved on from this problem, though I could still make use of hints for other problems, I think, so feel free to update your answer, and/or ask for clarification if I've failed to word things well. (P.S. Also, sorry for the delay in this response.) – lindes-hw Dec 02 '14 at 02:21
  • No problem, see my updated answer. I hope that this better fits your needs. – Jaap Dec 03 '14 at 12:04