I have a data frame (sampdata
) that looks something like this:
A B C D
1 X 5 0.3
2 Y 10 0.9
3 Y 7 0.2
4 Y 5 0.4
5 X 10 0.7
Basically, I want to create two new data frames based on both column B and C. On earlier posts I have seen how to subset the data using 'split' based on one factor which I did do
test <- split(sampdata, sampdata$B)
str(test)
So far so good. But, when I tried to add in a second split:
testBC <- split(test, test$C)
I received an error message:
Error in split.default(test, test$Product) : group length is 0 but data length > 0
I also tried:
testBC <- split(test$B, test$C)
but got another error message. So, then I tried a second method, based on ddply
and plyr
package:
test2 <- ddply(sampdata, c("B", "C"))
This did organize the data by row such that:
A B C D
1 X 5 0.3
5 X 10 0.7
2 Y 10 0.9
3 Y 7 0.2
4 Y 5 0.4
However, other threads only show how to access a specific data frame based on one col (test2$B
) but not both. I would prefer to simply generate a new data frame based on a subset of B and C such that:
newdf1
A B C D
1 X 5 .3
5 X 10 .9
newdf2
A B C D
2 Y 7 .2
3 Y 5 .4
4 Y 10 .7
After trying a couple methods what is likely a straightforward/simple task is surprisingly difficult (for me at least).
Any help most appreciated.