I am hoping someone can help with the following problem i am having while creating subsets of my data:
I have a data set titled 'LakeK_all'. One of the columns is titled 'Lake' and contains a list of lake names (S001-Out, S002-Out, Y001-Out, Y002-Out,...). I would like to pull out the subset of data that start with an 'S'. I find it works fine if my data are alphabetically sorted so that all the sites starting with 'S' are first and those starting with Y are last. If the lakes are mixed up it does not work. I could sort my data first, but if possible i would like to solve the problem directly and keep the steps simple.
Here is my code:
seki_vector = LakeK_all[grep("^[S].*", LakeK_all$Lake, value=TRUE)]
seki_vector
LakeK = subset(LakeK_all, subset=(LakeK_all$Lake==seki_vector))
LakeK
Here is the output i am getting:
> seki_vector = LakeK_all[grep("^S", LakeK_all$Lake, value=TRUE)]
Error in `[.data.frame`(LakeK_all, grep("^S", LakeK_all$Lake, value = TRUE)) :
undefined columns selected
> seki_vector
[1] "S005-Out" "S003-Out" "S004-Out" "S001-Out" "S040-Out" "S043-Out" "S044-Out" "S048-Out" "S049-Out" "S041-Out" "S047-Out" "S042-Out" "S046-Out" "S039-Out"
LakeK = subset(LakeK_all, subset=(LakeK_all$Lake==seki_vector))
Warning messages:
1: In is.na(e1) | is.na(e2) :
longer object length is not a multiple of shorter object length
2: In `==.default`(LakeK_all$Lake, seki_vector) :
longer object length is not a multiple of shorter object length
> LakeK
[1] Y Year WYear Lake Panel Lat Long Cen LowerDL UpperDL InclProb PanelProb AdjInclProb
<0 rows> (or 0-length row.names)
It seems the vector is working, but not the subset step. Again, if i sort the data then it works just fine.
Reading through previous questions it sounds like it is better to use [] instead of 'subset'. I tried this and it did not fix the issue.