split dataframe by factor and name new df by the factor and addidtional description like "new_dataframe(factor)"

Question

I need to split my df into several new df by a factor via loop. Problem, the factor consist of numbers, and the new df are called "1" "2" and so on, this makes it hard to call them for the next pice of code,... Any help how I can rename/name the new df´s. like new_df_1, new_df_2?

what I have so far:

new_df<- split(df, df$cluster)
new_names <- as.character(unique(df$cluster))
for (i in 1:length(new_df))
{assign(new_names[i],new_df[[i]])}

I also tried lapplybut was only able to save, not to make a df in the Global Environment, as I actually don´t need it saved for later.

new_df<- split(df, df$cluster)
lapply(names(new_df),function(nm)
write.csv(new_df[[nm]],paste("new_df",nm,".csv")))

It works, but makes a file: new_df 1.csv

Thanks for any suggestions!

It is better not to make global variables. Iinstead have it as a `list` — akrun, Dec 12 '18 at 15:56

akrun · Answer 1 · 2018-12-12T16:03:48.973

1

If we need objects in the global environment, use list2env

names(new_df) <- paste(new_df, seq_along(new_df), sep="_")
list2env(new_df, envir = .GlobalEnv)

NOTE: Not recommended to create multiple global objects. Instead, it can be all processed as a list ('new_df')

Or using assign

nm1 <- names(new_df) # after creating the names with `paste`
for (nm in nm1) {
      assign(nm,new_df[[i]])
 }

edited Dec 12 '18 at 16:03

answered Dec 12 '18 at 15:57

akrun

874,273
37
540
662

Thanks for quick suggestions! I used the `list2env` approach, but R calculates for ever and finally crash (my `df` is big, `1135 obs. of 22384 variables`). Before I used the loop, `subset` was quite fast `new_df_1<-subset(df,cluster==1,select= -cluster)` – Konrad Weber Dec 12 '18 at 16:42
@KonradWeber I would not recommend creating these many variables in your global space. It should be kept in a `list` and use `lapply/sapply` etc to process/manipulate the contents – akrun Dec 12 '18 at 16:44
sounds the way to go, but the `list`help doesn't help me a lot, any suggestions for any related examples, thx! – Konrad Weber Dec 12 '18 at 17:18
@KonradWeber Can you describe why a list doesn't help. You could do almost anything in a list of datasets – akrun Dec 12 '18 at 17:19
I didn't manage to make the list(the help function wasn't really helpful). But its actually very simple (I got some inspiration here [link] (https://stackoverflow.com/questions/17499013/how-do-i-make-a-list-of-data-frames) . What I did: `cluster_list = split(df, f = df&cluster)` That works, and retains a well organized list I can work on, thanks a lot! One followup question, how can I delete the `cluster`column afterwords in my list, like I did above in the subset using `select= -cluster`? – Konrad Weber Dec 13 '18 at 08:50

score 0 · Accepted Answer · answered Jan 18 '19 at 14:40

0

cluster_list <- split(df, f = df$clust)

builds the cluster list, to manipulate the list use: lapply.

answered Jan 18 '19 at 14:40

Konrad Weber

147
10

split dataframe by factor and name new df by the factor and addidtional description like "new_dataframe(factor)"

2 Answers2