-1

Little backstory, I'm still new to R, only recently learning to format data in long format. But even now I'm not sure I'm doing it correctly all the time.

I've imported my data to a list of dataframes

# after import, data looks something like this
a   <- data.frame(x = 1:50, y = rnorm(50), z = "A")
b   <- data.frame(x = 1:40, y = rnorm(40), z = "B")
c   <- data.frame(x = 1:20, y = rnorm(20), z = "C")
set <- list(a,b,c)

As you can see, nrow() is not the same for each dataframe.

Now I need to perform calculations on the data, such as what is the x at max y, or how long it takes for y to reach a certain value, or calculate how long y is above 0, etc.

I've done this before with my data in the wide format, where each variable has its own column, and I can perform my functions on all columns pretty easily with apply and then perhaps throw results into another row.

At this point however the furthest I've gotten is reformatting the data and graphing it with ggplot

require(ggplot2)    
t <- as.data.frame(do.call(rbind.fill, set))

ggplot(t, aes(x = x, y = y, color = z)) + geom_line()

My instinct is that it's good to have the data starting in the long format and that I should be using something from reshape to get from a long to wide dataframe to perform my calculations.

So far I've made hardly any progress due to the differing number of rows.

In summary, my question is how to get my data, as compiled above in t to a format where I can perform some calculations on the entire list of variables, throw results somewhere later for graphing and reporting, etc.

Thanks

tastycanofmalk
  • 628
  • 7
  • 23
  • i recommend looking at `dplyr`or `data.table` packages for this kind of task. there a good introductions to either one available online. the best format for plotting using ggplot() would probably be to make one data frame out of your list. something like `bind_rows(a,b,c)` in `dplyr` would do the trick. The question is however a bit vague to offer a specific answer. – yoland Mar 21 '17 at 13:38
  • you could take a look at using `group_by` in `dplyr` – MeetMrMet Mar 21 '17 at 13:39
  • yes, and if you want a quick introduction into how to do this, try this: https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf – yoland Mar 21 '17 at 13:40
  • Thanks yoland, the `bind_rows` approach seems to give the same result as above (`t`). To rephrase without fluff, I'm looking for a way to convert from long to wide format, using dataframes with different nrows. – tastycanofmalk Mar 21 '17 at 13:54

1 Answers1

0

This took way too long to find the answer to, so I'm posting it here for me and others in the future.

To summarize above, long format is useful for graphing using ggplot, wide is nice for applying functions to dataframes. Unfortunately the data I'm importing has uneven row lengths making conversion to a dataframe troublesome.

To solve the problem, rather than import using rbind.fill, I basically need a cbind.fill, which doesn't exist. That is until I found it posted here:

cbind a df with an empty df (cbind.fill?)

tastycanofmalk
  • 628
  • 7
  • 23