2

I have data which looks like this:

Linking <- data.frame(
  ID = c(round((runif(20, min=10000, max=99999)), digits=0), rep(NA,10)),
  PSU = c(paste("A", round((runif(20, min=10000, max=99999)), digits = 0), sep = ''), rep(NA,10)),
  qtr = c(rep(1:10, 2), rep(NA,10)), 
  date = rep("13/04/56", 30),
  Direct = rep(c('D','M','U','U','M'), 6),
  stringsAsFactors = F)

Linking$Key <- paste(Linking$ID, Linking$PSU, Linking$qtr, sep='_')
Linking$Key[c(21:30)] <- c("87654_A15467_1", "45623_A23456_2", "67891_A12345_4", "65346_A23987_7", 
                       "E3456782_A456321_6", "E3421986_A34564_8", "E9859873_A123456_9", "E3452_A12345_6", "R765498765_A455634_2", "54678_A12345_5")

I want to extract the separate portions of the "Key" variable, to populate ID, PSU, and qtr, where these values are NA.

I can use this code:

 test <- filter(Linking, is.na(ID)) %>%
 select(Key)
 test2 <- data.frame(do.call(rbind, strsplit(test$Key, "_")), test$Key)
 names(test2) <- c("ID", "PSU", "qtr", "Key")

To extract the information which I need for the ID, PSU, and qtr where there are NA values. But how do I add this back in to the original dataset 'Linking'? Merge won't work, because I'll end up with two values for PSU, ID, and qtr (N and the real value)

I asked a similar question here Populate the NA values in a variable with values from a different variables in R , but this question includes variable length values, and includes a more complete dataset, with variables not just related to the 'Key'. Thanks.

Joshua
  • 40,822
  • 8
  • 72
  • 132
Laura
  • 499
  • 5
  • 13
  • 1
    Removed the unnecessary requirement "using substr" from the title. strsplit and its vectorized version stringr::str_split are better. Removed [tag:na] from the tags. – smci May 25 '18 at 01:24

1 Answers1

3

Here´s one approach:

tmp <- data.frame(do.call(rbind, strsplit(Linking$Key, "_")), Linking$Key)
names(tmp) <- names(Linking)
tmp

This Works since Linking$Key contains all relevant data for creating your data.frame.

Jilber Urbina
  • 58,147
  • 10
  • 114
  • 138
  • I have updated my question with your code, and given a more complete picture of the data which I am working with; to ask a further question. Thanks. – Laura May 25 '18 at 00:25