Little backstory, I'm still new to R, only recently learning to format data in long format. But even now I'm not sure I'm doing it correctly all the time.
I've imported my data to a list of dataframes
# after import, data looks something like this
a <- data.frame(x = 1:50, y = rnorm(50), z = "A")
b <- data.frame(x = 1:40, y = rnorm(40), z = "B")
c <- data.frame(x = 1:20, y = rnorm(20), z = "C")
set <- list(a,b,c)
As you can see, nrow() is not the same for each dataframe.
Now I need to perform calculations on the data, such as what is the x at max y, or how long it takes for y to reach a certain value, or calculate how long y is above 0, etc.
I've done this before with my data in the wide format, where each variable has its own column, and I can perform my functions on all columns pretty easily with apply and then perhaps throw results into another row.
At this point however the furthest I've gotten is reformatting the data and graphing it with ggplot
require(ggplot2)
t <- as.data.frame(do.call(rbind.fill, set))
ggplot(t, aes(x = x, y = y, color = z)) + geom_line()
My instinct is that it's good to have the data starting in the long format and that I should be using something from reshape
to get from a long to wide dataframe to perform my calculations.
So far I've made hardly any progress due to the differing number of rows.
In summary, my question is how to get my data, as compiled above in t
to a format where I can perform some calculations on the entire list of variables, throw results somewhere later for graphing and reporting, etc.
Thanks