0

I have a data frame that looks like the example below.

element string
x abc
y def
z ghi
z jkl
x mno
y pqr
z stu
x vwx
y yza
z bcd
z efg
z hij
x klm
y nop
z qrs
z tuv
z wxy

All the strings in the string column vary but the values in the element column always follow an x-y-z pattern, although the number of z's varies. I would like to take the strings in the strings column from each x-y-z set and concatenate them - so the strings column in the dataframe above would look like this:

string
abc def ghi jkl
mno pqr stu
vwx yza bcd efg hij
klm nop qrs tuv wxy

I was thinking there might be a way to do this using dplyr::rowwise? The variable # of z rows per each set is tripping me up though in figuring out something that might work...

user438383
  • 5,716
  • 8
  • 28
  • 43
BleepBloop
  • 29
  • 4
  • https://stackoverflow.com/questions/15933958/collapse-concatenate-aggregate-a-column-to-a-single-comma-separated-string-w, where your group is `cumsum(element == "x")`. – Henrik Jul 28 '22 at 14:05
  • @Henrik That worked beautifully, thanks so much! – BleepBloop Jul 28 '22 at 14:47

1 Answers1

0

The tricky part is that you need to group by chunks of x/y/z. Below is one approach. Once you have your id to group by you can simply summarize and concatenate the strings.

library(tidyverse)

df <- data.frame(element = c(letters[24:26], 'z', letters[24:26]),
                 string = c('abc', 'def', 'ghi', 'ijk', 'lmn', 'o', 'p'))

df %>%
  mutate(id = cumsum(if_else(element < lag(element), 1, 0, missing = 1))) %>%
  group_by(id) %>%
  summarize(strong = str_c(string, collapse = ' '), .groups = 'drop')

With the above test data, this gives:

# A tibble: 2 x 2
     id strong         
  <dbl> <chr>          
1     1 abc def ghi ijk
2     2 lmn o p        
deschen
  • 10,012
  • 3
  • 27
  • 50