2

I hope I phrased the question right, I'm not even sure how to word my question, which is probably part of why I'm having trouble finding the answer.

Consider a data.frame that has multiple string vectors. I would like to construct another variable that pair-wise combines the two vectors together, agnostic of their order.

For example, consider the following data.frame

df <- data.frame(var1 = c('string1', 'string2', 'string3'),
                 var2 = c('string3', 'string4', 'string1')
                 )

I'd like to have a variable that is identical for the first and 3rd element, like:

c('string1, string3', 'string2, string 4', 'string1, string3')

I'm imagining that it might be best to make a variable/vector that's a list of the two component variables, but I'm obviously open to any solution. I tried to make a list variable that does what I want based on this question but with no luck:

Create a data.frame where a column is a list

If possible, I'd like to do this in a way that could extend to more than 2 columns and could efficiently run over millions of rows, especially if there is a data.table method.

Thanks for your help!

Edit: A crappy example of how I could do it with a forloop that doesn't quite work but you get the idea:

for (i in 1:nrow(df)) {
  df$var.new[i] <- paste(sort( c(df$var1[i], df$var2[i])))
}
Community
  • 1
  • 1
Sam Asin
  • 131
  • 9
  • 2
    One option is `apply(df, 1, function(x) paste(sort(x), collapse=" "))` – akrun Feb 16 '17 at 18:47
  • 2
    This might be an "X/Y error" (asking the wrong question d/t focus on an ineffective potential solution). If you don't tell us what you want to do with this vector, we cannot offer good advice about the best coding strategy to use. – IRTFM Feb 16 '17 at 18:51
  • @42- I think you mean an "XY Problem". A/B seems to imply A/B testing. Just a silly semantics thing though. – Dason Feb 16 '17 at 18:54
  • 1
    Fixed. Thanks for clue. – IRTFM Feb 16 '17 at 18:56
  • Are you trying to find identical rows somehow? If so, transforming to `dat = xtabs( ~ row(df) + as.matrix(df), sparse = TRUE)` might be more informative. (and `aggregate(colnames(dat)[j] ~ i, summary(dat), toString)` can give the desired result if, indeed, that is the only thing needed) – alexis_laz Feb 16 '17 at 18:59
  • Hi guys, I have two goals in different scripts that are running into the same problem. In one of the problems, I have 8 vectors that I'd like to "combine." In that data frame, I want to only keep rows that have at least 3 unique values across the 8 vectors. In my other one, I have two vectors that represent the exact same thing. So "string1" in var1 and "string2" in var2 is exactly the same as "string2" in var1 and "string1" in var2. I want to have a variable that treats those two rows as exactly the same. Then I can run a data.table[,,by=] command to group them together. – Sam Asin Feb 16 '17 at 19:07
  • Should I try to provide a more detailed example? I appreciate your help, I'm a self-taught coder and am new to posting on stack exchange. – Sam Asin Feb 16 '17 at 19:12
  • Also, I have a bunch of other columns in my data as well... – Sam Asin Feb 16 '17 at 20:02

0 Answers0