I have a large dataframe (just over 8,500,000 cells in total) and I need to create some subsets of this dataframe based on the values in a specific column.
I am aware that I can create said subsets by hand and am happy doing this when there are only a few values. At present, I obtain the unique values:
table(df$ColumnX)
and then construct the individual dataframes from there as there are only a few values:
df.subset1 <- df[df$ColumnX == "Subset1", ]
df.subset2 <- df[df$ColumnX == "Subset2", ]
...
df.subsetX <- df[df$ColumnX == "SubsetX", ]
But when there are significantly more unique values is where I see a problem which would require my computer's processing power to achieve my goal in a timely manner.
What I want to know is if this process can be automated.
Something like this is what I am hoping to achieve:
- List values in Column X
- Create a new dataframe/subset for each value in Column X
Or:
for(all unique values in Column X)
create a new dataframe
end for
Therefore, I would have something like this based on the values of ColumnX
:
df.subset1
df.subset2
...
df.subsetX