0

I'm trying to make an Rmarkdown script that can be as 1-click as possible for broader use at my company. I'm reading in a csv file with 4 columns and 20 rows. I want to subset this dataframe into other dfs based on a value in 1 column and name the new dfs 'df_value1', 'df_value2', and so on.

I'm trying to copy the functionality of this block:

df <- read.csv("data_template", header = TRUE)

df_0000 <- subset.data.frame(df, subset = ID == '0000')
df_1111 <- subset.data.frame(df, subset = ID == '1111')

So far I have this:

IDs = unique(data$ID)
IDs = list(IDs)

for (i in IDs[[1]]){
  i = subset.data.frame(data, subset = ID == i)
}

Which I'm sure you can tell, just ends up with variable i storing a df that is the right output but has all the data from the last value in the list.

I imagine that I'll want to store the ID 'i' in a str variable and then iteratively name the df but I don't know how to access the data stored in the variable without reassigning the variable.

Lochlin
  • 3
  • 1

2 Answers2

0

If the column that identifies the dataframes is in a column called mygroups, then you could use dplyr group_split

library(dplyr)

my_list_of_dfs <- df %>% 
   group_by(mygroups) %>%
   group_split()

and access each dataframe by using normal list notation.

my_list_of_dfs[[1]]
Joe Erinjeri
  • 1,200
  • 1
  • 7
  • 15
0

Using base R:

df_list = split(df, df$ID)
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294