Partially merge two datasets and fill in NAs in R

Question

I have two datasets

a = raw dataset with thousands of observations of different weather events

   STATE       EVTYPE
1     AL WINTER STORM
2     AL      TORNADO
3     AL    TSTM WIND
4     AL    TSTM WIND
5     AL    TSTM WIND
6     AL         HAIL
7     AL    HIGH WIND
8     AL    TSTM WIND
9     AL    TSTM WIND
10    AL    TSTM WIND

b = a dictionary table, which has a standard spelling for some weather events.

                    EVTYPE       evmatch
1    HIGH SURF ADVISORY          <NA>
2         COASTAL FLOOD COASTAL FLOOD
3           FLASH FLOOD   FLASH FLOOD
4             LIGHTNING     LIGHTNING
5             TSTM WIND          <NA>
6       TSTM WIND (G45)          <NA>

both are merged into df_new by evtype

library(dplyr)
df_new <- left_join(a, b, by = c("EVTYPE"))

   STATE       EVTYPE           evmatch
1     AL WINTER STORM      WINTER STORM
2     AL      TORNADO              NA
3     AL    TSTM WIND THUNDERSTORM WIND
4     AL    TSTM WIND THUNDERSTORM WIND
5     AL    TSTM WIND THUNDERSTORM WIND
6     AL         HAIL              NA
7     AL    HIGH WIND         HIGH WIND
8     AL    TSTM WIND THUNDERSTORM WIND
9     AL    TSTM WIND THUNDERSTORM WIND
10    AL    TSTM WIND THUNDERSTORM WIND
11    AL   HEAVY RAIN        NA
12    AL  FLASH FLOOD       NA
13    AL    TSTM WIND THUNDERSTORM WIND
14    AL   HEAVY RAIN        NA
15    AL    TSTM WIND THUNDERSTORM WIND

Fill in the missing NAs

As you can see in the df_new$evmatch, there are a NAs. How can I merge the dataset, but have all NA's in evmatch filled in by the corresponding word from EVTYPE. For example...

Wanted output

 STATE       EVTYPE           evmatch
1     AL WINTER STORM      WINTER STORM
2     AL      TORNADO           TORNADO
3     AL    TSTM WIND THUNDERSTORM WIND
4     AL    TSTM WIND THUNDERSTORM WIND
5     AL    TSTM WIND THUNDERSTORM WIND
6     AL         HAIL              HAIL
7     AL    HIGH WIND         HIGH WIND
8     AL    TSTM WIND THUNDERSTORM WIND
9     AL    TSTM WIND THUNDERSTORM WIND
10    AL    TSTM WIND THUNDERSTORM WIND
11    AL   HEAVY RAIN        HEAVY RAIN
12    AL  FLASH FLOOD       FLASH FLOOD
13    AL    TSTM WIND THUNDERSTORM WIND
14    AL   HEAVY RAIN        HEAVY RAIN
15    AL    TSTM WIND THUNDERSTORM WIND

Seems like a simple `ifelse(is.na(evmatch), EVTYPE, evmatch)`, no? Or am I'm missing something. Or using base in [here](http://stackoverflow.com/questions/15629885/replace-na-in-column-with-value-in-adjacent-column). With `data.table` this could be `setDT(df_new)[is.na(evmatch), evmatch := EVTYPE]` — David Arenburg, Jan 24 '16 at 18:08
or `df_new$evmatch[is.na(df_new$evmatch] <- df_new$EVTYPE[is.na(df_new$evmatch]` — scoa, Jan 24 '16 at 18:09
`dplyr` version: `filter(df_new, is.na(evmatch) %>% select(evmatch) <- filter(df_new, is.na(evmatch) %>% select(EVTYPE)` ...but I'd probably use the `ifelse` version above, honestly. — alistaire, Jan 24 '16 at 18:21
when running `ifelse`, an error `object 'evmatch' not found` is returned. — user3200293, Jan 24 '16 at 18:34

score 1 · Answer 1 · edited Jan 24 '16 at 19:51

Answers as given in the comments to the question:

1: using base R

Method 1:

df_new$evmatch <- with(df_new, ifelse(is.na(evmatch), EVTYPE, evmatch))

Method 2:

df_new$evmatch[is.na(df_new$evmatch] <- df_new$EVTYPE[is.na(df_new$evmatch]

Note: Make sure that both vars are characters or erroneous results will occur. If needed transform with as.character.

2: using data.table

library(data.table)
setDT(df_new)[is.na(evmatch), evmatch := EVTYPE]

3: using dplyr

library(dplyr)
filter(df_new, is.na(evmatch) %>% 
         select(evmatch) <- filter(df_new, is.na(evmatch) %>% 
                                     select(EVTYPE)

You should probably turn this into a community wiki answer. – alistaire Jan 24 '16 at 18:46 — alistaire, Jan 24 '16 at 18:46

Partially merge two datasets and fill in NAs in R

Fill in the missing NAs

Wanted output

1 Answers1

Linked

Related