0

I want to map string matches to variables with a 2 dimensional dataset with missing values.

I would be interested if there is a solution with map_df or another vectorised approach.

Input:

list(
  c(a = "72 a", b = "74 c"),
  c(a = "12 a", b = "146 d"),
  c(a = "24 a", bb = "145 c", cx = "14 d")
)

Desired output:

[[1]]
match1 match2 match3 
    72     74     NA 
[[2]]
match1 match2 match3 
    12    NA     146 
[[3]]
match1 match2 match3 
    24    145     14 

As you can see, " a" Matches to match1, " c" Matches to match2 and " d" Matches to match3.

What i tried:

library(magrittr)
library(purrr)
l %>% map_df(~list(
  match1 = ifelse(
    test = grepl(pattern = " a", x = .), 
    yes = gsub(pattern = " a", replacement = "", x = .), 
    no = NA
  ),
  match2 = ifelse(
    test = grepl(pattern = " c", x = .), 
    yes = gsub(pattern = " c", replacement = "", x = .), 
    no = NA
  ),
  match3 = ifelse(
    test = grepl(pattern = " d", x = .), 
    yes = gsub(pattern = " d", replacement = "", x = .), 
    no = NA
  )
))
Tlatwork
  • 1,445
  • 12
  • 35

1 Answers1

1

Well, for your specific use-case you can do that:

Your input:

l <- list(
   c(a = "72 a", b = "74 c"),
   c(a = "12 a", b = "146 d"),
   c(a = "24 a", bb = "145 c", cx = "14 d")
)

Code (with tidyr::pivot_wider):

library(tidyverse)
d <- map_dfr(l, ~tibble(value = .x), .id = "id") %>%
   mutate(
    case = case_when(
     grepl(" a", value) ~ "match1",
     grepl(" c", value) ~ "match2",
     grepl(" d", value) ~ "match3"
   ),
    value = gsub(" a| c| d", "", value)
   ) %>%
  pivot_wider(id_cols = id, names_from = case, values_from = value)

Update Code (with tidyr::spread):

library(tidyverse)
d <- map_dfr(l, ~tibble(value = .x), .id = "id") %>%
  mutate(
    case = case_when(
      grepl(" a", value) ~ "match1",
      grepl(" c", value) ~ "match2",
      grepl(" d", value) ~ "match3"
   ),
    value = gsub(" a| c| d", "", value)
  ) %>%
  spread(case, value)

Spread is retired though, so probably you want to switch to the pivot_wider/pivot_longer syntax at some point.

Your output is a data frame with columns match1, match2, and match3

# A tibble: 3 x 4
id    match1 match2 match3
<chr> <chr>  <chr>  <chr> 
1 1     72     74     NA    
2 2     12     NA     146   
3 3     24     145    14    
  • That Looks great, thanks a lot. Great start on #SO! I get an error which seems due to a newer Version of rlang: `Error in mutate_impl(.data, dots) : Evaluation error: `as_dictionary()` is defunct as of rlang 0.3.0. Please use `as_data_pronoun()` instead.`. But it seems to be a subcall so i am not sure how to replace it correctly. – Tlatwork Nov 04 '19 at 09:04
  • Well it seems that you just have to update rlang library (see that post: [link](https://stackoverflow.com/questions/52957136/defunct-as-of-rlang-0-3-0-and-mutate-impl)). Simply update the whole tidyverse (install.packages("tidyverse")) should do the trick. I added an update which uses spread, probably this will solve the issue as well. I'm using rlang version 0.4.1 and tidyverse version 1.2.1 – Matthias Uckert Nov 04 '19 at 09:37
  • ah i read it the other way round, that i am ahead. That works perferctly, thank you! – Tlatwork Nov 04 '19 at 11:40