-1

I have one search that uses which, and another search that uses grep as follows:

dates <- myframe[grepl(abreviation,myframe$geo),"date"]
dates <- c(dates, myframe[which(myframe$geo == fullname),"date"])

abreviation and fullname are are two different strings.

I tried using | which returned 0 entries. I also tried endsWith, but this returned the warning that only the top result was going to be used, and the list only had one result.

The issue I'm having with this is that it's not returning a date in a string format, which is what date is, but instead a number integers.

What do I need to do differently to get a vector of these dates

Edit: Here is a sample dataset-

pastebin.com/yXq6khNV

oguz ismail
  • 1
  • 16
  • 47
  • 69
jfa
  • 1,047
  • 3
  • 13
  • 39
  • It would be helpful if you provided a sample of your data.frame myframe. [This SO post](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) has advice about well structured questions. – G5W Nov 30 '16 at 22:05
  • @kaliczp That was an error, `myframe2` == `myframe` – jfa Nov 30 '16 at 22:31
  • It looks like your `which` call is probably redundant. You should be able to subset appropriately with just the `==` call. That would also allow you to use `|` more easily. – rosscova Nov 30 '16 at 22:32
  • @rosscova Unfortunately "abreviation" is not a subset of "fullname" – jfa Nov 30 '16 at 22:34
  • It shouldn't need to be. `grepl(abreviation,myframe$geo) | myframe$geo == fullname` should work as a subset condition. – rosscova Nov 30 '16 at 22:37
  • @rosscova gave you the solution: `dates <- myframe[myframe$geo == fullname | grepl(abreviation,myframe$geo), "date"]` should work. – kaliczp Nov 30 '16 at 22:43
  • Here is a data sample http://pastebin.com/yXq6khNV – jfa Nov 30 '16 at 22:46

2 Answers2

1

which outputs an integer vector, whereas grepl outputs a boolean one. To get them to match and work together, try it without the which call. You also need to tidy up those NAs in the geo column (I also changed your fullname to "New York, NY" since "New York, USA" didn't appear in your table):

dates <- myframe[ !is.na( myframe$geo ) & 
                      ( grepl(abbreviation,myframe$geo) | myframe$geo == fullname ), 
                  "date" ]

Which gives (the tibble format is because I used readr to read in your dataset):

> dates
# A tibble: 1 × 1
               date
              <chr>
1 12/30/10 02:37 PM

If you're losing the format along the way for some reason, you could specify it, although it's not in Date format, so I'll just specify character here:

dates <- myframe[ !is.na( myframe$geo ) & 
                      ( grepl(abbreviation,myframe$geo) | myframe$geo == fullname ), 
                  as.character( "date" ) ]
rosscova
  • 5,430
  • 1
  • 22
  • 35
  • Unfortunately I've tried that, it seems to be returning the entirety of the data set, whereas when I try the two separately, I get a much smaller number. – jfa Nov 30 '16 at 22:52
  • Can you include the `abbreviation` and `fullname` objects? Preferably using `dput` instead of linking externally. – rosscova Nov 30 '16 at 22:58
  • They're generated based on the input, but the sample I've been using is `abbreviation = ", NY"` and `fullname = "New York, USA"`, so abbreviation is looking based on the city, and the second is the state. – jfa Nov 30 '16 at 23:03
  • How do you read in this dataset? R silently convert characters to factor. Maybe it is the reason why was the result integer. If you raw data in the file `sample.csv` read it without enabling conversion. E.g. `myframe <- read.table("sample.csv", sep=";", header = TRUE, stringsAsFactor = FALSE)` – kaliczp Nov 30 '16 at 23:08
  • I think your problem is that your `geo` column has some `NA` values, so `grepl` returns NAs there. – rosscova Nov 30 '16 at 23:08
  • @kaliczp I have a function that casts the date and groups by year, and it keeps failing because of the output of these two lines unfortunately. I think I'm taking too much of a C based approach, and that somehow I should be concatenating data frames a different way. – jfa Dec 01 '16 at 02:30
0

I solved this issue by passing the already filtered data frame to grepl:

dates <- myframe[grepl(abreviation,myframe[which(myframe$geo == fullname),]),"date"]
jfa
  • 1,047
  • 3
  • 13
  • 39