0

I am trying to construct a function which shouldn't be hard in terms of programming but I am having some difficulties to conceptualize it. Hope you'll be able to understand my problem better than me!

I'd like a function that takes a single list of vectors as argument. Something like

arg1 = list(c(1,2), c(2,3), c(5,6), c(1,3), c(4,6), c(6,7), c(7,5), c(5,8))

The function should output a matrix with two columns (or a list of two vectors or something like that) where one column contains letters and the other numbers. One can think of the argument as a list of the positions/values that should be placed in the same group. If in the list there is the vector c(5,6), then the output should contain somewhere the same letters next to the values 5 and 6 in the number column. If there are the three following vectors c(1,2), c(2,3) and c(1,3), then the output should contain somewhere the same letters next to the value 1, 2 and 3 in the number column.

Therefore if we enter the object arg1 in the function it should return:

myFun(arg1)

number_column   letters_column
     1                 A
     2                 A
     3                 A
     5                 B
     6                 B
     7                 B
     4                 C
     6                 C
     5                 D
     8                 D

(the order is not important. The letters E should not be present before the letter D has been used)

Therefore the function has constructed 2 groups of 3 (A:[1,2,3] and B:[5,6,7]) and 2 groups of 2 (C:[4,6] and D:[5,8]). Note one position or number can be in several group.

Please let me know if something is unclear in my question! Thanks!

Blue Magister
  • 13,044
  • 5
  • 38
  • 56
Remi.b
  • 17,389
  • 28
  • 87
  • 168
  • 2
    This equivalent to finding the connected components of a graph, where your numbers are nodes, and the vectors are edges. Take a look at this question: http://stackoverflow.com/questions/12630521/grouping-a-many-to-many-relationship-from-a-two-column-map – Blue Magister Dec 09 '13 at 15:14
  • @BlueMagister thks. Indeed it looks asame! But not exactly… Or at least I don't yet understand how to apply this code to my list. Could someone help me with that? – Remi.b Dec 09 '13 at 15:23
  • I wrote too soon. It looks like you want a list of the [maximal cliques of a graph](http://en.wikipedia.org/wiki/Maximal_clique#Definitions), whereas the question I linked you to earlier gives you a list of the connected components. Try `igraph:::maximal.cliques`. – Blue Magister Dec 09 '13 at 15:38
  • In your example, it appears that "5" is in the group (5,6,7) and in the group (5,8) How do you want to annotate 5, 6, 7, and 8 vs. how you suggested annotating 1,2, and 3 ? that is, why not assign "A" to 1,2, and 3 but also assign "G" to 2 and 3? Where's the cut-off criterion? Edit: @BlueMagister 's "cliques" idea looks like it could answer my comments. – Carl Witthoft Dec 09 '13 at 15:38
  • @CarlWitthoft I don't understand your comment. Did I assign "G" to 2 and 3? If 1 and 2 are in the same group and 2 and 3 are in the same group, then 1 and 3 are not in the same group (except if it is stated that 1 and 3 are in the same group). Therefore they should be splitted into different groups and that's why one value can be found in several different groups. As it is the case for the values 5 and 6 in my example. Does it make sense? Note: The word group that I use might be considered as poorly chosen... – Remi.b Dec 09 '13 at 15:43
  • My question is why you the 3 pairs (1,2), (1,3), (2,3) are all assigned to "A", rather than assigning, say, A and B to 1, A and C to 2, and B and C to 3. But if you require both of each ordered pair to match up with another "group" member, then it's a maximal clique. – Carl Witthoft Dec 09 '13 at 16:10

1 Answers1

2

As I wrote in the comments, it appears that you want a data frame that lists the maximal cliques of a graph given a list of vectors that define the edges.

require(igraph)
## create a matrix where each row is an edge
argmatrix <- do.call(rbind, arg1)
## create an igraph object from the matrix of edges
gph <- graph.edgelist(argmatrix, directed = FALSE)
## returns a list of the maximal cliques of the graph
mxc <- maximal.cliques(gph)
## creates a data frame of the output
dat <- data.frame(number_column = unlist(mxc), 
 group_column = rep.int(seq_along(mxc),times = sapply(mxc,length)))
## converts group numbers to letters 
## ONLY USE if max(dat$group_column) <= 26
dat$group_column <- LETTERS[dat$group_column]
   # number_column group_column
# 1              5            A
# 2              8            A
# 3              5            B
# 4              6            B
# 5              7            B
# 6              4            C
# 7              6            C
# 8              3            D
# 9              1            D
# 10             2            D
Blue Magister
  • 13,044
  • 5
  • 38
  • 56