Split data frame by two factors

Question

I have a data frame (sampdata) that looks something like this:

A B  C   D
1 X  5 0.3
2 Y 10 0.9
3 Y  7 0.2
4 Y  5 0.4
5 X 10 0.7

Basically, I want to create two new data frames based on both column B and C. On earlier posts I have seen how to subset the data using 'split' based on one factor which I did do

test <- split(sampdata, sampdata$B)
str(test)

So far so good. But, when I tried to add in a second split:

testBC <- split(test, test$C)

I received an error message:

Error in split.default(test, test$Product) : group length is 0 but data length > 0

I also tried:

testBC <- split(test$B, test$C)

but got another error message. So, then I tried a second method, based on ddply and plyr package:

test2 <- ddply(sampdata, c("B", "C"))

This did organize the data by row such that:

A B  C   D
1 X  5 0.3
5 X 10 0.7 
2 Y 10 0.9
3 Y  7 0.2
4 Y  5 0.4

However, other threads only show how to access a specific data frame based on one col (test2$B) but not both. I would prefer to simply generate a new data frame based on a subset of B and C such that:

newdf1
A B C   D
1 X 5  .3
5 X 10 .9

newdf2
A B C   D
2 Y 7  .2
3 Y 5  .4
4 Y 10  .7

After trying a couple methods what is likely a straightforward/simple task is surprisingly difficult (for me at least).

Any help most appreciated.

score 12 · Answer 1 · answered Oct 07 '17 at 04:56

12

If we need to split by multiple columns place it in a list

split(df1, list(df1$B, df1$C), drop = TRUE)
#$X.5
#  A B C   D
#1 1 X 5 0.3

#$Y.5
#  A B C   D
#4 4 Y 5 0.4

#$Y.7
#  A B C   D
#3 3 Y 7 0.2

#$X.10
#  A B  C   D
#5 5 X 10 0.7

#$Y.10
#  A B  C   D
#2 2 Y 10 0.9

answered Oct 07 '17 at 04:56

akrun

874,273
37
540
662

1

`drop = T` is to remove empty groups... – jgarces May 25 '23 at 11:35

score 0 · Answer 2 · edited Apr 29 '20 at 23:52

I tried other suggestion, but I couldn't get it to work with my 'real' data.

Here is what I did

test10<-sampdata
test10$C<-10
test10$B<-"X"
test.10.X<-test10

This gave me a single data frame that only had values associated with X and 10 based on cols B and C. Then I will have to repeat for each combination of X, Y and 10, 5, 7 for cols B and C.

I am not good at writing for loops, but maybe I could write some sort of loop so I am not copying and pasting the same code and just changing the values?

Anyhow, this worked for my purposes.

Split data frame by two factors

2 Answers2

Linked