2

I have a massive csv file I'm working with here. What I want is to be able to store all unique values in a certain column into a dataframe. For a small example, if I had:

1   a   
2   b
3   c
4   a
5   b

I would want my code to sort this into x amount of dataframes where x would be the amount of unique values from column two (in this example that would be 3).

What I did is find the name of every unique variable in my desired column and did vectorization. Which would be:

    DF1 <- Data[Data$Column2 == "a",]
    DF2 <- Data[Data$Column2 == "b",]
    DF3 <- Data[Data$Column2 == "c",]

But that was super tedious, and since I have way more than just 3 unique values it took me waaaaay too long. There has got to be an easier way to do it. Maybe using loops? Any help is greatly appreciated!

Uwe
  • 41,420
  • 11
  • 90
  • 134
Beebs
  • 37
  • 1
  • 4

1 Answers1

1

We can use split to split the data.frame into separate data.frame in a list and then use lapply to loop over the list and write the files as csv

lst1 <- split(Data, Data$Column2)
lapply(names(lst1), function(nm) write.csv(lst1[[nm]], paste0(nm, ".csv"),
     quote = FALSE, row.names = FALSE))

By doing the split, we effectively reduce the burden of naming multiple objects in the global environment, reduce the clutter, and reduce the time in finding those objects and saving one by one with write.csv

akrun
  • 874,273
  • 37
  • 540
  • 662
  • Wow that works perfectly. Will I be able to index the list? In my excel file the data I am actually interested in would be in the columns 3 and on, but I just needed to separate based off of column 2. – Beebs Sep 25 '19 at 21:31
  • @Beebs Yes, you can use index as well `lapply(seq_along(lst1), function(i) lst1[[i]])` – akrun Sep 25 '19 at 21:36