5

I have a large data frame that I would like to convert in to smaller subset data frames using a for loop. I want the new data frames to be based on the the values in a column in the large/parent data frame. Here is an example

x<- 1:20
y <- c("A","A","A","A","A","A","A","A","B","B","B","B","B","B","B","B","B","C","C","C")
df <- as.data.frame(cbind(x,y))

ok, now I want three data frames, one will be columns x and y but only where y == "A", the second where y== "B" etc etc. So the end result will be 3 new data frames df.A, df.B, and df.C. I realize that this would be easy to do out of a for loop but my actual data has a lot of levels of y so using a for loop (or similar) would be nice.

Thanks!

wraymond
  • 295
  • 1
  • 6
  • 17
  • Can you give an example of your data so we can see all the levels? Generally this kind of subsetting can be done external of a loop in most cases. – Badger Oct 16 '15 at 23:00
  • I would start by creating the data frame properly. `df <- data.frame(x, y)`. The way you've done it has made the first column into factors. – Rich Scriven Oct 16 '15 at 23:02

2 Answers2

14

If you want to create separate objects in a loop, you can use assign. I used unique because you said you had many levels.

 for(i in unique(df$y)) {
        nam <- paste("df", i, sep = ".")
        assign(nam, df[df$y==i,])
        }

> df.A
  x y
1 1 A
2 2 A
3 3 A
4 4 A
5 5 A
6 6 A
7 7 A
8 8 A
> df.B
    x y
9   9 B
10 10 B
11 11 B
12 12 B
13 13 B
14 14 B
Pierre Lapointe
  • 16,017
  • 2
  • 43
  • 56
5

I think you just need the split function:

 split(df, df$y)
$A
  x y
1 1 A
2 2 A
3 3 A
4 4 A
5 5 A
6 6 A
7 7 A
8 8 A

$B
    x y
9   9 B
10 10 B
11 11 B
12 12 B
13 13 B
14 14 B
15 15 B
16 16 B
17 17 B

$C
    x y
18 18 C
19 19 C
20 20 C

It is just a matter of properly subsetting the output to split and store the results to objects like dfA <- split(df, df$y)[[1]] and dfB <- split(df, df$y)[[2]] and so on.

SabDeM
  • 7,050
  • 2
  • 25
  • 38
  • The split function might be the way to go but I am string to avoid having creating all thoes dfA, dfB.... because my real data have many levels. Something started like for (1=i in unique(df$y)){ – wraymond Oct 16 '15 at 23:11