splitstring and obtain all combinations of substrings after split value with only 1 item per combination can come from value before split

Question

Here's the same task we solved in Python. I've tried utilizing a similar approach of creating an empty dictionary from pre-split (R's strsplit) keys and unpacking all corresponding post-split strings as values. Then, next step is to create all combinations but no more than one pre-split string can exist in the resulting combinations.

Here is my input list:

list('ROOM1-abc',
'ROOM1-def',
'ROOM2-abc',
'ROOM2-lol',
'ROOM3-whatever')

And the desired output (with 2-length combinations (needs to be able to pick the length of combination elements returned)):

['ROOM1-abc', 'ROOM2-lol'],
['ROOM1-abc', 'ROOM3-whatever'],
['ROOM1-def', 'ROOM2-abc'],
['ROOM1-def', 'ROOM2-lol'],
['ROOM1-def', 'ROOM3-whatever'],
['ROOM2-abc', 'ROOM3-whatever'],
['ROOM2-lol', 'ROOM3-whatever']]

I'm struggling with the sub-item list indexing syntax in Python vs. R as well as having to learn R for a specific need on a problem we've solved already via Python .

Please use `dput` to show the input. Not clear about the the structure of the string. May be you need `combn(unique(unlist(strsplit(str1, '"'))), 2)` — akrun, Jun 05 '19 at 05:09

score 1 · Answer 1 · 2019-06-05T14:04:59.213

If I get it right what you want to do is

df <- expand.grid(unlist(lst1), unlist(lst1))
df
             Var1           Var2
1       ROOM1-abc      ROOM1-abc
2       ROOM1-def      ROOM1-abc
3       ROOM2-abc      ROOM1-abc
4       ROOM2-lol      ROOM1-abc
5  ROOM3-whatever      ROOM1-abc
6       ROOM1-abc      ROOM1-def
7       ROOM1-def      ROOM1-def
8       ROOM2-abc      ROOM1-def
9       ROOM2-lol      ROOM1-def
10 ROOM3-whatever      ROOM1-def
11      ROOM1-abc      ROOM2-abc
12      ROOM1-def      ROOM2-abc
13      ROOM2-abc      ROOM2-abc
14      ROOM2-lol      ROOM2-abc
15 ROOM3-whatever      ROOM2-abc
16      ROOM1-abc      ROOM2-lol
17      ROOM1-def      ROOM2-lol
18      ROOM2-abc      ROOM2-lol
19      ROOM2-lol      ROOM2-lol
20 ROOM3-whatever      ROOM2-lol
21      ROOM1-abc ROOM3-whatever
22      ROOM1-def ROOM3-whatever
23      ROOM2-abc ROOM3-whatever
24      ROOM2-lol ROOM3-whatever
25 ROOM3-whatever ROOM3-whatever

This gives a matrix with all possible combinations. So the difference to the sugegstion of akrun is that this also gives you a combination with the element itself, e.g. ROOM1-abc | ROOM1-abc and cares about order, hence gives you for example ROOM3-whatever | ROOM1-abc and ROOM3-whatever | ROOM1-abc.

If you do not care about order you can remove rows with duplicate

df[!duplicated(t(apply(df, 1, sort))), ]
             Var1           Var2
1       ROOM1-abc      ROOM1-abc
2       ROOM1-def      ROOM1-abc
3       ROOM2-abc      ROOM1-abc
4       ROOM2-lol      ROOM1-abc
5  ROOM3-whatever      ROOM1-abc
7       ROOM1-def      ROOM1-def
8       ROOM2-abc      ROOM1-def
9       ROOM2-lol      ROOM1-def
10 ROOM3-whatever      ROOM1-def
13      ROOM2-abc      ROOM2-abc
14      ROOM2-lol      ROOM2-abc
15 ROOM3-whatever      ROOM2-abc
19      ROOM2-lol      ROOM2-lol
20 ROOM3-whatever      ROOM2-lol
25 ROOM3-whatever ROOM3-whatever

EDIT

# splits at "-"
split <- strsplit(unlist(lst1), "-")
# adds "-" to each vector
split2 <- lapply(split, function(x){
  c(x[1], "-", x[2])})
# saves everything as a dataframe (if desired)
do.call("cbind.data.frame", split2)

This is helpful but my mistake, I did not explicitly list the delimiter. My input is a list with two elements in each item and I need to delimit "-" such that "ROOM1-abc" is split into "ROOM1", "-", "abc" where "abc" is used to create combinations but no more than one "ROOM1" can exist in any combinations. — Chris, Jun 05 '19 at 13:46
Thanks @schwantke. I am trying to use the comments to get my desired output. I don't mind if the end result resides in a list or data.frame as I can work it to how I need. Let me see if I can better explain this. If I need combinations of strings after the split"-" but can't have more than one of the string before the split"-" in a returned combination. So the combination of 'ROOM1-abc' and 'ROOM1-def' is not allowed because 'ROOM1' occurs more than once. — Chris, Jun 05 '19 at 14:31
@Chris: but this means that you don't need all combinations. How can we know what combinations are desired? — , Jun 05 '19 at 15:18
That's the whole issue. Getting all combinations from combn is straight-forward. The challenge is needing to identify the string before the split and if the combination length is 2...n then the string before the split can exist max 1 time in a returned combination whether it's a 2-item combo or 3-item combo. In Python, we populated a dictionary key with the string before the split and then all those items involved were populated into that key's values. We then got all combinations of values via a product of all values (perhaps this is similar to R's expand ?). — Chris, Jun 05 '19 at 15:36
And the resulting combination (at 2-item) is: ``` (('ROOM1-abc', 'ROOM2-lol'), ('ROOM1-abc', 'ROOM3-whatever'), ('ROOM1-def', 'ROOM2-abc'), ('ROOM1-def', 'ROOM2-lol'), ('ROOM1-def', 'ROOM3-whatever'), ('ROOM2-abc', 'ROOM3-whatever'), ('ROOM2-lol', 'ROOM3-whatever')) ``` — Chris, Jun 05 '19 at 15:36

score 0 · Answer 2 · answered Jun 05 '19 at 05:50

0

An option is to do combn on the list and return as a list of vectors

library(tidyverse)
combn(lst1, 2, simplify = FALSE) %>%
       map(flatten_chr)

data

lst1 <- list('ROOM1-abc',
'ROOM1-def',
'ROOM2-abc',
'ROOM2-lol',
'ROOM3-whatever')

answered Jun 05 '19 at 05:50

akrun

874,273
37
540
662

This creates a variable to hold the list, which I initially did not create but it does not address the desired output. – Chris Jun 05 '19 at 06:56
@Chris Your desired solution seems to be a list of vectors and BTW, the list you showed is not a syntax in R, but from python – akrun Jun 05 '19 at 13:52

splitstring and obtain all combinations of substrings after split value with only 1 item per combination can come from value before split

2 Answers2

data