-1

I am hoping someone can help with the following problem i am having while creating subsets of my data:

I have a data set titled 'LakeK_all'. One of the columns is titled 'Lake' and contains a list of lake names (S001-Out, S002-Out, Y001-Out, Y002-Out,...). I would like to pull out the subset of data that start with an 'S'. I find it works fine if my data are alphabetically sorted so that all the sites starting with 'S' are first and those starting with Y are last. If the lakes are mixed up it does not work. I could sort my data first, but if possible i would like to solve the problem directly and keep the steps simple.

Here is my code:

seki_vector = LakeK_all[grep("^[S].*", LakeK_all$Lake, value=TRUE)]
seki_vector

LakeK = subset(LakeK_all, subset=(LakeK_all$Lake==seki_vector))
LakeK

Here is the output i am getting:

> seki_vector = LakeK_all[grep("^S", LakeK_all$Lake, value=TRUE)]

Error in `[.data.frame`(LakeK_all, grep("^S", LakeK_all$Lake, value = TRUE)) : 
  undefined columns selected

> seki_vector
 [1] "S005-Out" "S003-Out" "S004-Out" "S001-Out" "S040-Out" "S043-Out" "S044-Out" "S048-Out" "S049-Out" "S041-Out" "S047-Out" "S042-Out" "S046-Out" "S039-Out"

LakeK = subset(LakeK_all, subset=(LakeK_all$Lake==seki_vector))

Warning messages:
1: In is.na(e1) | is.na(e2) :
  longer object length is not a multiple of shorter object length
2: In `==.default`(LakeK_all$Lake, seki_vector) :
  longer object length is not a multiple of shorter object length
> LakeK
 [1] Y           Year        WYear       Lake        Panel       Lat         Long        Cen         LowerDL     UpperDL     InclProb    PanelProb   AdjInclProb
<0 rows> (or 0-length row.names)

It seems the vector is working, but not the subset step. Again, if i sort the data then it works just fine.

Reading through previous questions it sounds like it is better to use [] instead of 'subset'. I tried this and it did not fix the issue.

MrFlick
  • 195,160
  • 17
  • 277
  • 295
andi12
  • 3
  • 2
  • It's very unclear exactly what class each of the variables involved here are. When posting, you really should include a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). I'm guessing you want `seki_vector = grep("^S", LakeK_all$Lake, value=TRUE)` and `LakeK = subset(LakeK_all, subset=Lake %in% seki_vector)` but you've provided no way to test of that will work. – MrFlick Feb 03 '15 at 23:44
  • Thank you for the link to the reproducible example. The answer below solved my issue, but i will certainly refer to this next time i post. – andi12 Feb 04 '15 at 01:10

1 Answers1

0

I think I spot a couple problems. In grep you don't want to set value to be TRUE. Setting value to be true returns the matched word instead of the index of the row. Also you are missing a comma (hence the undefinied columns error).

Try This: LakeK_all[grep("^S", LakeK_all$Lake), ]

Jamie
  • 16