How to use str-extract and not lose data when pattern doesn't match

Question

I have a large data set and I'm trying to format the addresses and move the apartment numbers to a different column. I want to extract that pattern from the address and move it into a new column, however, some rows already have the correct format I need. I've been using "str_extract(column, pattern)" but if the row doesn't match the pattern, it's replacing that row with NA. Please help.

An example of the data:

> data
name          add1              add2
John Doe      123 Main St       Apt 2
Jane Doe      2 S Main Apt 5
Jerry Day     77 N Main Unit 1  
Tom May       11 E Main         PO Box 7

I want it to look like this:

> data
name          add1               add2
John Doe      123 Main St        Apt 2
Jane Doe      2 S Main           Apt 5
Jerry Day     77 N Main          Unit 1
Tom May       11 E Main          PO Box 7

My code is as follows and my result:

> data$add2 <- str_extract(data$add1, "Apt [0-9]{1}|Unit [0-9]{1}")
> data
name          add1               add2
John Doe      123 Main St        NA
Jane Doe      2 S Main           Apt 5
Jerry Day     77 N Main          Unit 1
Tom May       11 E Main          NA

Can someone help with my problem? Thank you,

[How to make a great R reproducible example?](http://stackoverflow.com/questions/5963269) — Sotos, Jan 09 '18 at 15:04
I'm sorry I didn't make this easier for you to reproduce, thank you for the article - I know better for my next question. — SGarg, Jan 09 '18 at 15:52

Patrik_P · Accepted Answer · 2018-01-09T15:34:57.283

You can assign the data$add2 the right indices of the extracted string as follows:

Your data

data <- data.frame(
name=c("John Doe", "John Doe", "Jerry Day", "Tom May"),
add1=c("123 Main St", "2 S Main Apt 5", "77 N Main Unit 1", "11 E Main"),
add2=c("Apt 2", NA, NA, "PO Box 7"), stringsAsFactors = F)

Assign the variable add2 the correct values

data$add2[is.na(data$add2)] <- str_extract(data$add1, "Apt [0-9]{1}|Unit [0-9]{1}")[is.na(data$add2)]

And remove the relevant string from data$add1

data$add1 <- trimws(gsub("Apt [0-9]{1}|Unit [0-9]{1}", "", data$add1))

Result is

data
#       name        add1     add2
#1  John Doe 123 Main St    Apt 2
#2  John Doe    2 S Main    Apt 5
#3 Jerry Day   77 N Main   Unit 1
#4   Tom May   11 E Main PO Box 7

In case you do not have NAs in data$add2 but "", you can modify the code:

data$add2[data$add2==""] <- str_extract(data$add1, "Apt [0-9]{1}|Unit [0-9]{1}")[data$add2==""]

You could also do already vectorised ifelse in order to get the values in data$add2:

data$add2 <- ifelse(data$add2=="", 
                    str_extract(data$add1, "Apt [0-9]{1}|Unit [0-9]{1}"), 
                    data$add2)

This worked perfectly! I also appreciate the multiple ways you gave that I can do this - Thank you! — SGarg, Jan 09 '18 at 15:51

How to use str-extract and not lose data when pattern doesn't match

1 Answers1