0

My previous question, though unanswered, allowed me to better formulate the issue in my head. I am still a beginner in R and I have no programming experience, just a strong desire (more like 'need') to use R in statistics. I will try to be as descriptive as possible.

I have a dataframe (df.x) that looks like this

      Channel  Week.nr    MT    DT    EF    PT    LF    NT 
         A       40    76.0% 83.6% 81.2% 33.7% 76.6% 65.6%
         A       41    79.0% 83.4% 81.3% 35.6% 86.6% 74.3%
         A       42    76.5% 82.0% 83.6% 33.8% 83.5% 71.9%
         B       40    42.2% 68.9% 68.1% 67.4% 70.9%  0.0%
         B       41    47.8% 78.9% 77.5% 76.5% 81.2%  0.0%
         B       42    47.8% 79.1% 78.0% 76.0% 81.2%  0.0%

I want to create multiple dataframes for each column (except Channel and Week.nr), so in the end I should have df.x.MT, df.x.DT, df.x.EF etc.

Furthermore, the format of the dataframe should be changed, meaning that the week number will be columns, instead of elements inside a column. To better visualise what I mean, here is an example of how I need df.x.MT to look

      Channel  W40   W41   W42 
         A    76.0% 79.0% 76.5%
         B    42.2% 47.8% 47.8%

keep in mind that there will be atleast 52 weeks, and around 150 channels, so in the end the dataframe will be huge.

So I have thought of isolating data for each week, so it would be simpler (i hope so) to recall it later in the df.x.MT dataframe. For this I have split the df.x dataframe in several dataframes (one for each week). I have used this code:

unique.weeks <- unique(df.x$Week)   #identify unique weeks number
for(i in unique.weeks) {
  assign(paste("df.x.week.", i, sep = ""), subset(df.x, Week == i)) 
}

Now all I need to do is to populate df.x.MT with df.x.week.40$MT, df.x.week.41$MT, df.x.week.42$MT etc. How can I do this in an automatic manner, rather than manually writing the code for each week? It feels like it should be a manner to call the names with a for*, but I can't figure it out.

*something like this (i know it's not correct)

for(i in unique.weeks) {
df.x.MT$[unique.weeks] <- df.x.week.[unique.weeks]
} 
kozoCMT
  • 3
  • 1
  • 2
    I'd strongly suggest your read my answer to [How do I make a list of data frames](https://stackoverflow.com/a/24376207/903061). I also think the FAQ about [dynamically selecting columns](https://stackoverflow.com/q/18222286/903061) would be a good reference for you. – Gregor Thomas Oct 24 '19 at 14:16
  • And lastly, the FAQ on [reshaping data from long to wide](https://stackoverflow.com/q/5890584/903061) is particularly relevant. – Gregor Thomas Oct 24 '19 at 14:19
  • thanks! I am reading it right now – kozoCMT Oct 24 '19 at 14:24

2 Answers2

1

tidyr

Convert the input to long form and then back out to the required wide form giving df2. Finally split it into a list of data frames L.

library(dplyr)
library(tidyr)

L <- df.x %>% 
  gather("id", "value", -c(Channel, Week.nr)) %>%
  spread(Week.nr, value) %>%
  `names<-`(sub("(\\d+)", "W\\1", names(.))) %>%
  split(.$id)

This could also be written as this which gives the same result. It uses pivot_longer/pivot_wider in place of gather/spread. Both pairs are from tidyr.

L <- df.x %>% 
  pivot_longer(cols = -(1:2), names_to = "id", values_to = "value") %>%
  pivot_wider(names_from = Week.nr, values_from = value, names_prefix = "W") %>%
  split(.$id)    

data.table

This works in much the same way converting df.x to long form, m, and then converting that back to wide form, wide. Finally split that giving L.

library(data.table)

m <- melt(df.x, id = 1:2)
wide <- dcast(m, Channel + variable  ~ ...)
names(wide) <- sub("(\\d+)", "W\\1", names(wide))
L <- split(wide, wide$variable)

reshape

This alternative uses only base R. Like the other alternatives it reshapes the input into long form and then back to wide form splitting at the end.

Time <- names(df.x)[-(1:2)]
long <- reshape(df.x, dir = "long", idvar = 1:2, times = Time, 
  varying = list(Time), v.names = "value", timevar = "id")
wide <- reshape(long, dir = "wide", idvar = c("Channel", "id"), timevar = "Week.nr")
names(wide) <- sub("value.", "W", names(wide))
L <- split(wide, wide$id)

Note

The input in reproducible is assumed to be as follows. I have changed it slightly from the question to show it keeps the week ordering as desired.

Lines <- "
   Channel  Week.nr    MT    DT    EF    PT    LF    NT 
         A       8    76.0% 83.6% 81.2% 33.7% 76.6% 65.6%
         A       41    79.0% 83.4% 81.3% 35.6% 86.6% 74.3%
         A       42    76.5% 82.0% 83.6% 33.8% 83.5% 71.9%
         B       8    42.2% 68.9% 68.1% 67.4% 70.9%  0.0%
         B       41    47.8% 78.9% 77.5% 76.5% 81.2%  0.0%
         B       42    47.8% 79.1% 78.0% 76.0% 81.2%  0.0%"

df.x <- read.table(text = Lines, header = TRUE, as.is = TRUE)
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
0

Making separate data frames is a terrible idea. Here's how to make a nice list:

library(tidyr)

cols_to_widen = setdiff(names(df.x), c("Channel", "Week.nr"))
result = lapply(cols_to_widen, FUN = function(cc)
  pivot_wider(df.x, id_cols = "Channel", names_from = "Week.nr", values_from = cc, names_prefix = "W")
)
names(result) = cols_to_widen

And then you can access the individual data frames:

result[["MT"]]
# # A tibble: 2 x 4
#   Channel W40   W41   W42  
#   <fct>   <fct> <fct> <fct>
# 1 A       76.0% 79.0% 76.5%
# 2 B       42.2% 47.8% 47.8%'

result[["LF"]]
# # A tibble: 2 x 4
#   Channel W40   W41   W42  
#   <fct>   <fct> <fct> <fct>
# 1 A       76.6% 86.6% 83.5%
# 2 B       70.9% 81.2% 81.2%

Using this input data:

df.x = read.table(text = 'Channel  Week.nr    MT    DT    EF    PT    LF    NT 
         A       40    76.0% 83.6% 81.2% 33.7% 76.6% 65.6%
         A       41    79.0% 83.4% 81.3% 35.6% 86.6% 74.3%
         A       42    76.5% 82.0% 83.6% 33.8% 83.5% 71.9%
         B       40    42.2% 68.9% 68.1% 67.4% 70.9%  0.0%
         B       41    47.8% 78.9% 77.5% 76.5% 81.2%  0.0%
         B       42    47.8% 79.1% 78.0% 76.0% 81.2%  0.0%', header = T)
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
  • this works perfectly and it's exactly what I was looking for. I can see why having separate dataframes is a bad idea, but that was the only way I could've thought right now, as a beginner, of a solution for this simple issue. I will try to study more and to understand exactly what your code does :D – kozoCMT Oct 24 '19 at 14:35