0

in R, I would like to find a way to write a for loop using the following math equation and a .csv file.

Here is an example showing two rows in a .csv file.

6/27/2010 8:45  131.04
6/27/2010 9:00  111.11

The second column would be x in the following equation.

https://i.stack.imgur.com/inOKV.jpg

I need help writing the equation above and a for loop that writes .csv file with load variability.

Paul Hiemstra
  • 59,984
  • 12
  • 142
  • 149
Python_R
  • 41
  • 2
  • 9
  • 2
    Can you read your csv into R, like with `X=read.csv("pathtofile")`. If so, can you add dput(head(X)) to your question? – Seth Aug 10 '12 at 18:31
  • Yes, I have the code, but did want to post because I know that it is completely wrong. What does dput(head(X)) do? I searched and it is for a debian package upload tool? I am confused. – Python_R Aug 10 '12 at 20:16
  • `dput` is an R function, as is `head`. Use `?function_name` to get the documentation of that function, e.g. `?head`. – Paul Hiemstra Aug 10 '12 at 21:07

1 Answers1

4

To get the L_var for a certain set of numbers I believe this would work:

l_var = sd(x) / mean(x)

where x is the vector of numbers. Next we wrap it in a function:

l_var = function(x) sd(x) / mean(x)
outcome = l_var(input)

where input is a vector of numbers, and outcome the outcome of the math equation.

If your timestamp column is of class POSIXlt, you can use strftime to create a factor column where you categorize your data. See this SO answer for more details on this step. Next you can use ddply from the plyr package to get the l_var per category (say a day):

result = ddply(df, .(cat), summarise, l_var = l_var(value))

where df is the input data.frame where cat is the time category, and value the x value in your equation above. To write the result to file you can use write.csv:

write.csv(result, file = "out.csv")

I think this covers about all the steps...

Community
  • 1
  • 1
Paul Hiemstra
  • 59,984
  • 12
  • 142
  • 149
  • 1
    You might want to add `na.rm=TRUE` to sd() and mean() or some other method for addressing missing values. – IRTFM Aug 10 '12 at 19:09
  • 1
    I'm a bit worried that there is no consideration for the 15 minute period in this solution. I would have thought that a `zoo:rollapply` approach would have been needed. – IRTFM Aug 10 '12 at 19:59
  • I agree with @DWin, I do not have much experience with zoo. Could you provide an example @DWin? – Paul Hiemstra Aug 10 '12 at 20:12
  • 1
    If the questioner decides that his data needs to have a rolling average or an aggregation step, which I am still unclear about, he should first do a search: [r] rolling average ... or perhaps: [r] aggregate interval – IRTFM Aug 10 '12 at 20:26
  • Can you explain how to wrap it in a function? It did not work. Also i need to get a sd by taking the difference b/w lines and don't I need to write a for loop for that? – Python_R Aug 10 '12 at 20:49
  • I wrapped the calculation in your equation in a function, what did not work? Your equation does not include any difference between lines, so please specify your question more accurately. What do you want to do. – Paul Hiemstra Aug 10 '12 at 21:03
  • I have 860 rows that have two columns. The first column is composed of timestamps in 15 minute interval and the second column- load (kw). I need to get load variability in 15 minute interval using the equation I posted in my question. Is that pretty clear? – Python_R Aug 10 '12 at 21:17
  • Please provide the new information as an edit to your question. Ain addition, provide us with a reproducible example. – Paul Hiemstra Aug 10 '12 at 21:24
  • I believe I have already done that. Can you clarify what you mean by a reproducible example? – Python_R Aug 15 '12 at 23:16
  • Reproducible in the sense that it is a piece of R code which reproduces your problem and situation, see also http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example. – Paul Hiemstra Aug 16 '12 at 05:13