0

Ok, my question might be a bit weirder than what the title suggests. I have this list:

x <- list(
  c("a", "d"),
  c("a", "c"), 
  c("d", "e"),
  c("e", "f"), 
  c("b", "c"), 
  c("f", "c"), # row 6 
  c("c", "e"), 
  c("f", "b"), 
  c("b", "a")
)

And I need to copy this stuff in another list called T. The only condition is that both letters of the pair must not be in T already. If one of them is already in T and the other isn't it's fine.

Basically in this example I would take the first 5 positions and copy them in T one after another because either one or both letters are new to T.

Then I would skip the 6th position because the letter "f" was already in the 4th position of T and the letter "c" is already in the 2nd and 5th positions of T.

Then I would skip the remaining 3 positions for the same reason (the letters "c", "e", "f", "b", "a" are already in T at this point)

I tried doing this

for(i in 1:length(T){
   if (!( *first letter* %in% T && *second letter* %in% T)) {
      T[[i]] <- c(*first letter*, *second letter*)
   }
}

But it's like the "if" isn't even there, and I'm pretty sure I'm using %in% in the wrong way.

Any suggestions? I hope what I wrote makes sense, I'm new to R and to this site in general.

Thanks for your time

MrFlick
  • 195,160
  • 17
  • 277
  • 295
  • Possibly related: https://stackoverflow.com/questions/28574006/unique-rows-considering-two-columns-in-r-without-order – MrFlick Jul 19 '18 at 17:45
  • So the order of the rows matter? If you reorder them, you'd get a different result? – MrFlick Jul 19 '18 at 17:50
  • Yeah, the order of the rows must be exactly like that because of other parts of the code not shown here, just assume that they are ordered like that and cannot be switched – Dario Ferretti Jul 19 '18 at 20:59

5 Answers5

1

Effectively, for each element of the list, you want to lose it if both of its elements exist in earlier elements. A logical index is helpful here.

# Make a logical vector the length of x.
lose <- logical(length(x))

Now you can run a loop over the length of lose and compare it against all previous elements of x. Using seq_len saves us the headache of having to guard against the special case of i = 1 (seq_len(0) returns a zero-length integer instead of 0).

for (i in seq_along(lose)){
  lose[i] <- all(x[[i]] %in% unique(unlist(x[seq_len(i - 1)])))
}

Now let's use the logical vector to subset x to T

T <- x[!lose]

T
#> [[1]]
#> [1] "a" "d"
#> 
#> [[2]]
#> [1] "a" "c"
#> 
#> [[3]]
#> [1] "d" "e"
#> 
#> [[4]]
#> [1] "e" "f"
#> 
#> [[5]]
#> [1] "b" "c"

# Created on 2018-07-19 by the [reprex package](http://reprex.tidyverse.org) (v0.2.0).
Benjamin
  • 16,897
  • 6
  • 45
  • 65
  • Thank you very much for your answer. In the end another solution because it fit better with my code but if I could I would pick every answer as "best answer", so many great suggestions! – Dario Ferretti Jul 20 '18 at 23:24
1

You can put the set of all previous elements in a list cum.sets, then use Map to check if all elements of the current vector are in the lagged cumulative set.

cum.sets <- lapply(seq_along(x), function(y) unlist(x[1:y]))
keep <- unlist(
          Map(function(x, y) !all(x %in% y)
              , x
              , c(NA, cum.sets[-length(cum.sets)])))

x[keep]

# [[1]]
# [1] "a" "d"
# 
# [[2]]
# [1] "a" "c"
# 
# [[3]]
# [1] "d" "e"
# 
# [[4]]
# [1] "e" "f"
# 
# [[5]]
# [1] "b" "c"

tidyverse version (same output)

library(tidyverse)

cum.sets <- imap(x, ~ unlist(x[1:.y]))
keep <- map2_lgl(x, lag(cum.sets), ~!all(.x %in% .y))

x[keep]
IceCreamToucan
  • 28,083
  • 2
  • 22
  • 38
  • Thank you for your contribution, I didn't pick your answer as the best because another one fit better with the rest of my work but if I could I would select all of the answers as "best answers". Again, thank you for your help – Dario Ferretti Jul 20 '18 at 23:25
1

You can use Reduce. In this case. IF all the new values are not in the list already, then concatenate it to the list, else drop it. the initial is the first element of the list:

 Reduce(function(i, y) c(i, if(!all(y %in% unlist(i))) list(y)), x[-1],init = x[1])

[[1]]
[1] "a" "d"

[[2]]
[1] "a" "c"

[[3]]
[1] "d" "e"

[[4]]
[1] "e" "f"

[[5]]
[1] "b" "c"
Onyambu
  • 67,392
  • 3
  • 24
  • 53
  • you can also run `c(z<-x[1],lapply(x,function(y) z<<-c(z,if(!all(y%in%unlist(z))) list(y))));z` Thi s is similar to having the argument `Reduce(...,accumulate=TRUE)`. – Onyambu Jul 19 '18 at 19:35
  • Thank you, this helped clarify another doubt I had. In the end I picked another answer only because it fit better with the rest of my code. I wish there was a way to choose all the answers as "best answers" since you all helped me very much. Thank you – Dario Ferretti Jul 20 '18 at 23:29
0

The most straightforward option is that you could store unique entries in another vector as you're looping through your input data.

Here's a solution without considering the positions (1 or 2) of the alphabets in your output list or the order of your input list.

dat <- list(c('a','d'),c('a','c'),c('d','e'),c('e','f'),c('b','c'),
            c('f','c'),c('c','e'),c('f','b'),c('b','a'))
Dat <- list()
idx <- list()
for(i in dat){
  if(!all(i %in% idx)){
    Dat <- append(Dat, list(i))
    ## append to idx if not previously observed
    if(! i[1] %in% idx) idx <- append(idx, i[1])
    if(! i[2] %in% idx) idx <- append(idx, i[2])
  }
}
print(Dat)
#> [[1]]
#> [1] "a" "d"
#> 
#> [[2]]
#> [1] "a" "c"
#> 
#> [[3]]
#> [1] "d" "e"
#> 
#> [[4]]
#> [1] "e" "f"
#> 
#> [[5]]
#> [1] "b" "c"

On another note, I'd advise against using T as your vector name as it's used as TRUE in R.

moodymudskipper
  • 46,417
  • 11
  • 121
  • 167
nachoes
  • 144
  • 4
  • Thanks, every single answer was very good and solved the question but your in particular fit best with the rest of the code I was using. I didn't know about T being used as "True" in R, thanks about that too – Dario Ferretti Jul 20 '18 at 23:19
0

We can unlist, check duplicated values with duplicated, reformat as a matrix and filter out pairs of TRUE values:

x[colSums(matrix(duplicated(unlist(x)), nrow = 2)) != 2]
# [[1]]
# [1] "a" "d"
# 
# [[2]]
# [1] "a" "c"
# 
# [[3]]
# [1] "d" "e"
# 
# [[4]]
# [1] "e" "f"
# 
# [[5]]
# [1] "b" "c"
# 

And I recommend you don't use T as a variable name, it means TRUE by default (thought it's discouraged to use it as such), this could lead to unpleasant debugging.

moodymudskipper
  • 46,417
  • 11
  • 121
  • 167
  • Thanks, didn't know about T being used like that in R, I'm still new to the language. In the end I choose another solution because it worked better with the rest of my code but if I could I would choose all the answers I got since they were all very usefull, thank you for your time – Dario Ferretti Jul 20 '18 at 23:27