-1

I apologize if this question has been answered. I have searched this for way too long.

I have coded data that has a prefix of a letter and suffix of numbers. ex:

A01, A02,...A99 ### (for each letter A-Z)

I need R code that mirrors this SAS code:

Proc SQL;
Create table NEW as
Select *
From DATA
Where VAR contains 'D';
Quit;

EDIT

Sorry y'all, I'm new! (also, mediocre in R at best.) I thought posting the SAS/SQL code would help make it easier.

Anyway, the data is manufacturing data. I have a variable whose values are the A01...A99, etc. values.

(rough) example of the dataframe:

OBS PRODUCT PRICE PLANT

1 phone 8.55 A87

2 paper 105.97 X67

3 cord .59 D24

4 monitor 98.65 D99

The scale of the data is massive, and I'm only wanting to focus on the observations that come from the plant 'D', so I'm trying to subset the data based on the 'PLANT' variable that contains (or starts with) 'D'. I know how to filter the data with a specific value (ie. ==, >=, != , etc.). I just can't figure out how to do it when only part of the value is known and I have yet to find anything about a 'contains' operator in R. I hope that clarifies things more.

aboone
  • 33
  • 3
  • 2
    Welcome to Stack Overflow! Your question could be improved with a few edits. First, please read [here](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) about how to share some sample data to make your question reproducible. Further, please explain in words, not SAS code, what it is you're trying to do. That will enable those of us who don't know SAS to answer more easily. I'm guessing you're trying to create a binary variable for whether a string contains the character "D", but you should make that explicit if that's the case. – josliber Jun 04 '15 at 21:16
  • 1
    `NEW <- DATA[grep('D', DATA$VAR), ]`? Is var a column name or are your columns called a01, a02, etc? – rawr Jun 04 '15 at 21:16

2 Answers2

1

Assuming DATA is your data.frame and VAR is your column value,

DATA <- data.frame(
    VAR=apply(expand.grid(LETTERS[1:4], 1:3), 1, paste0, collapse=""),
    VAL = runif(3*4)
)

then you can do

subset(DATA, grepl("D", VAR))
MrFlick
  • 195,160
  • 17
  • 277
  • 295
0

A slight alternative to MrFlick's solution: use a vector of row-indices:

DATA[grep('D', DATA$VAR), ]

   VAR        VAL
4   D1 0.31001091
8   D2 0.71562382
12  D3 0.00981055

where we defined:

DATA <- data.frame(
    VAR=apply(expand.grid(LETTERS[1:4], 1:3), 1, paste0, collapse=""),
    VAL = runif(3*4)
)
smci
  • 32,567
  • 20
  • 113
  • 146