1

I have many dataframes. I would like to split them based on the values in a column (a factor). Then I would like to store the result of the split in separate data frame that have a specific name.

For the sake of a mrp, consider some generated data,

for (i in 1:10) {
assign(paste("df_",i,sep = ""), data.frame(x = rep(1,12), y  = c(rep("a",4),rep("b",4),rep("c",4))))
}

here we have 10 dfs, df_1, df_2... to df_10. (real data is similar to generated data, but in real data column z is different for each df).

Now, I want to split the dfs by 'y' (column 2).

For 1 df, I can do the following;

splitdf <- split(df_1,df_1$y)
namessplit <- c("a","b","c")
for (i in 1:length(splitdf)) {
  assign(paste("df_1_",namessplit[[i]],sep = ""),splitdf[[i]])
}

While this works for 1 df, how can I do it for all the dfs?

Big thanks in advance!

NiGS
  • 96
  • 5
  • 2
    [Don't ever create ,`d1` `d2` `d3`, ..., `dn` in the first place. Create a list `d` with `n` elements.](https://stackoverflow.com/a/24376207/1422451). – Parfait Feb 12 '22 at 17:05

1 Answers1

2

It is not recommended to create multiple objects in the global env, but if we want to know how to create the objects from a nested list - Loop over the outer list sequence and then in the inner list sequence, paste the corresponding names to assign the extracted inner list element

lst1 <- lapply(mget(ls(pattern = "^df_\\d+$")), \(x) split(x, x$y))
for(i in seq_along(lst1)) {
   for(j in seq_along(lst1[[i]]))  {
   assign(paste0(names(lst1)[i], "_", names(lst1[[i]][j])), lst1[[i]][[j]])
  }
}

-checking for objects created in the global env

> ls(pattern = "^df_\\d+_[a-z]+$")
 [1] "df_1_a"  "df_1_b"  "df_1_c"  "df_10_a" "df_10_b" "df_10_c" "df_2_a"  "df_2_b"  "df_2_c"  "df_3_a"  "df_3_b"  "df_3_c"  "df_4_a" 
[14] "df_4_b"  "df_4_c"  "df_5_a"  "df_5_b"  "df_5_c"  "df_6_a"  "df_6_b"  "df_6_c"  "df_7_a"  "df_7_b"  "df_7_c"  "df_8_a"  "df_8_b" 
[27] "df_8_c"  "df_9_a"  "df_9_b"  "df_9_c" 
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thanks, it works great on the mrp! However, could you please the working of the code so I could apply it to my real data? – NiGS Feb 12 '22 at 16:27
  • 1
    @NiGS Here, I am assuming that you want to split by column name `y` so I used `x$y` in the first line. Similarly, as you have created objects in your first assign with `df_1, `df_2`, etc, u used `pattern = '^df_\\d+$'` to load all objects created as `df_` followed by one or more digits. There is no info in your post about how your real data looks like or the column names – akrun Feb 12 '22 at 16:30
  • You're right Arun, indeed I didn't provide info on my real data. I have 5 dfs - bike1, dvan1, evan1, avsdv1, avmdv1. each have 5 columns and 9 rows. I am afraid the pattern lookup is does not work. the column 'y' in this mrp is 'dbdx' in the real data. Big thanks for your help :) – NiGS Feb 12 '22 at 16:33
  • 1
    @NiGS IN that use `lst1 <- lapply(list(bike1 =bike1, dvan1 = dvan1, evan1 = evan1, avsdv1 = avsdv1, avmdv1 = avmdv1), \(x) split(x, x$dbdx))` and the rest of the code is same – akrun Feb 12 '22 at 16:34
  • 1
    Works, thanks a bunch! – NiGS Feb 12 '22 at 16:36