Combine 2 dataframes with function, like cartesian product or cross join

Question

I want to combine two dataframes of different sizes with a binary function (namely str_count()), such that the rows of df1 (containing regex) become columns of df2 (containing the text data).

Sample product table result

library(dplyr)

# dummy data
df1 <- 
  tribble(
    ~regex_name, ~regex_data
    , "reg1", "(\\w+ )"
    , "reg2", "\\d+"
  )

df2 <- 
  tribble(
    ~metadata, ~text
    , "meta1", "text 1"
    , "meta2", "text2 3 4"
  )

# should result in something like
df1_2 <- 
  tribble(
    ~metadata, ~text,       ~reg1, ~reg2
    , "meta1", "text 1",    1,     2
    , "meta2", "text2 3 4", 0,     3
  )

What I've tried so far

After searching online for a bit, I think there are a few possible approaches that I could take that involves some problems or perhaps some unnecessary intermediate steps.

a. Use a full_join ( join by= what tho?) b. Followed by tidyr::spread(), (or pivot_wider()??)
Use purrr::cross2() (or cross_dfr()) (but it gives the wrong structure?) followed by (b1.b)
Use some combination of purrr::map2() and mutate (I've not been able to get this to work properly, and map2 requires the dataframes to be of the same length)

The use of regex is just as an example (also what I'm working with). Also, although I'm using tidyverse libraries, any other elegant(simple?) solution that works is fine (I'm just prone to make mistakes if there are too many intermediate steps).

The sample data doesn't give the correct results for `str_detect`. But it's a sample so I guess that's ok? — Geeky I, Oct 13 '19 at 00:22

score 0 · Answer 1 · answered Oct 13 '19 at 00:21

This answer mentions using tidyr::crossing. It preserves dataframes unlike purrr::cross2. This is probably as simple as it will get, but I'm wondering if it's possible in one step?

library(dplyr)
library(tidyr)
library(stringr) # for example function

crossing(df1, df2) %>%
  mutate(regex_data = text %>% str_count(regex_data)) %>%
  pivot_wider(names_from = regex_name, values_from = regex_data) # as alternative to spread
#> # A tibble: 2 x 4
#>   metadata text       reg1  reg2
#>   <chr>    <chr>     <int> <int>
#> 1 meta1    text 1        1     1
#> 2 meta2    text2 3 4     2     3

Combine 2 dataframes with function, like cartesian product or cross join

What I've tried so far

1 Answers1