1

Ok so I cannot figure this out for the life of me, I want to filter my data based on a partial string match. here is my data, I am just showing the column i want to filter, but there are more rows in the overall set. I only want to show the rows that begin with "CAO" --this is easily achievable in the viewer

dataviewer image:

dataviewer image

Basically I want the R "code" that would reproduce this exact result. I have tried using grepl like so

filter(longdata, grepl("^CAO",longdata[,1]))

I have tried using subset

subset(longdata,longdata[,1]=="^CAO")

I have tried subset with grepl and no matter what I do I cant figure it out. I am new to R so please try and explain it thoroughly.

curious
  • 1,504
  • 5
  • 18
  • 32
  • 2
    If you read ?subset or any introduction to dplyr, you'll see that you can/should use column names instead of numbers there... Btw, yes, you want grepl, not `==`. Without a reproducible example, I don't know that anyone can help beyond that. See https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/28481250#28481250 for guidance. – Frank Aug 25 '17 at 19:46
  • Have you tried `subset(longdata, grepl("^CAO",longdata[,1]))`? – Rui Barradas Aug 25 '17 at 19:48
  • I was actully wondering about that too, so I used the tidyverse function "read_csv" so technically its a tibble. the column name has a space in it, so would I reference it like this: `filter(longdata, grepl("^CAO",Issue ID))` or `filter(longdata, grepl("^CAO","Issue ID"))` – Travers Woodward Aug 25 '17 at 19:48
  • Rui Barradas, When I try that code a "+" appears, Is there something I am supposed to do after that? – Travers Woodward Aug 25 '17 at 19:54
  • No, that happens when you don't close a parenthesis, but everything seems right. – Rui Barradas Aug 25 '17 at 20:14

1 Answers1

2

The second argument of grepl wasn´t recognized in your first code

library(tidyverse) #in this case access to dplyr and to tibble´s data_frame() function which preserves the spaces in the column names
longdata <- data_frame(`Issue ID`=c("CAO-2017-20", "CAO-2017-20", "CAO-2017-20", "AO-2017-20", "CA-2017-20"))
longdata %>% filter(grepl("CAO", `Issue ID`)) #patern "^CAO" also works

%>% is a piping operator that passes the outcomes of the previous operations further, here it´s loaded by dplyr.

Basically what I did was to load the tidyverse set of packages (read more on tidyverse here). Those ones of interest are tibble and dplyr. Then I created a sample data frame with tibble´s function data_frame() Then I applied an adjusted function that you suggested, namely

filter(longdata, grepl("^CAO",`Issue ID`))

which is the same in its piped form:

longdata %>% filter(grepl("CAO", `Issue ID`))
Patrik_P
  • 3,066
  • 3
  • 22
  • 39
  • hey, thank you so much, this works perfectly, could you mind explaining what '%>%' means, and also is the single quotation how you reference a column name with a space in it? – Travers Woodward Aug 25 '17 at 19:58
  • Glad to help. Check the update. `dplyr` is extremely useful for data manipulation in R. – Patrik_P Aug 25 '17 at 20:04