I am trying to understand a quirk with the subset()
function in R and the use of the $
operator. I'll use the CO2
dataset in R as an example:
I can run
sub <- subset(CO2, CO2$Type=="Quebec")
without error to arrive at the same dataset as if I were to run
sub <- subset(CO2, Type=="Quebec")
However, I've observed that this is not always the case.
Sometimes including the $
within subset()
function will produce the following error
$ operator is invalid for atomic vectors
What is triggering the '$ operator is invalid for atomic vectors' error?
Why is it the $
allowed in some instances (like the CO2 example above) but not in others? (I'm particularly frustrated when I bring in my own data through read.csv()
and sometimes I get the error when trying to subset with $
and sometimes I do not without any discernible pattern)
Thanks!
Per comments below, I'm attempting to post reproducible examples.
Here is the situation that triggers the error:
Moose<-structure(list(Moose = 1:25, Tagging_Loc = structure(c(1L, 1L,
2L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 1L,
2L, 1L, 1L, 2L, 1L, 2L, 2L), .Label = c("A", "B"), class = "factor"),
Gender = structure(c(2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L,
2L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 1L,
2L), .Label = c("F", "M"), class = "factor"), Age = c(20L,
23L, 14L, 15L, 10L, 9L, 5L, 10L, 19L, 22L, 21L, 21L, 7L,
16L, 19L, 9L, 23L, 5L, 9L, 10L, 16L, 8L, 13L, 14L, 6L), Weight = c(1366L,
1006L, 888L, 1359L, 899L, 635L, 400L, 1000L, 1012L, 1480L,
1001L, 1100L, 482L, 1414L, 971L, 725L, 1400L, 416L, 790L,
970L, 921L, 560L, 1103L, 904L, 669L), Distance = c(250.5,
410.239, 457.6402591, 245.8523, 430.9975, 308.8673107, 212.5212497,
414.2093545, 439.6581, 215.6491489, 464.2384, 425.4256828,
233.5635555, 207.98, 453.7098751, 390.0506365, 235.5212497,
207.368, 427.5084899, 443.0452824, 459.8999274, 274.6856592,
350.5661674, 456.9600032, 330.146)), .Names = c("Moose",
"Tagging_Loc", "Gender", "Age", "Weight", "Distance"), class = "data.frame", row.names = c(NA,
-25L))
sub_Moose<-subset(Moose, Moose$Tagging_Loc=="A")
sub_Moose<-subset(Moose, Tagging_Loc=="A")'
But if I only change the name of the dataset, both versions of subset()
run fine - no error:
mOose<-structure(list(Moose = 1:25, Tagging_Loc = structure(c(1L, 1L,
2L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 1L,
2L, 1L, 1L, 2L, 1L, 2L, 2L), .Label = c("A", "B"), class = "factor"),
Gender = structure(c(2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L,
2L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 1L,
2L), .Label = c("F", "M"), class = "factor"), Age = c(20L,
23L, 14L, 15L, 10L, 9L, 5L, 10L, 19L, 22L, 21L, 21L, 7L,
16L, 19L, 9L, 23L, 5L, 9L, 10L, 16L, 8L, 13L, 14L, 6L), Weight = c(1366L,
1006L, 888L, 1359L, 899L, 635L, 400L, 1000L, 1012L, 1480L,
1001L, 1100L, 482L, 1414L, 971L, 725L, 1400L, 416L, 790L,
970L, 921L, 560L, 1103L, 904L, 669L), Distance = c(250.5,
410.239, 457.6402591, 245.8523, 430.9975, 308.8673107, 212.5212497,
414.2093545, 439.6581, 215.6491489, 464.2384, 425.4256828,
233.5635555, 207.98, 453.7098751, 390.0506365, 235.5212497,
207.368, 427.5084899, 443.0452824, 459.8999274, 274.6856592,
350.5661674, 456.9600032, 330.146)), .Names = c("Moose",
"Tagging_Loc", "Gender", "Age", "Weight", "Distance"), class = "data.frame", row.names = c(NA,
-25L))
sub_Moose<-subset(mOose, mOose$Tagging_Loc=="A")
sub_Moose<-subset(mOose, Tagging_Loc=="A")