-3

Suppose I have the following data:

p1 <- c('a','a','a','a','a','b','b','b','b','c','c')
p2 <- c('b','c','d','e','f','c','a','e','d','e','f')
connections <- data.frame(p1, p2)

Where p1 and p2 are persons and each row represents a connection.

Question: How do I write a function that finds the maximum # of common connections between 2 people? (E.g. a & b have 3 common connections: c, d, e)

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
Ray
  • 3,137
  • 8
  • 32
  • 59
  • 1
    @MartijnPieters Sorry Martijn! I realized later that there was duplicity with another question, hence the mark for deletion. – Ray Feb 26 '15 at 00:31
  • 2
    Then *vote to close*, don't just invalidate the work of the answerers. Your question is licensed under the [CC-Wiki license](http://creativecommons.org/licenses/by-sa/3.0/) (see section 3 of the [terms of service](http://stackexchange.com/legal) and together with the answers is now a collective work. Vandalising that goes against that license. – Martijn Pieters Feb 26 '15 at 00:32

2 Answers2

5

In Python, you could use collection.Counter() objects, and their intersection:

>>> from collections import Counter
>>> p1_conns = Counter(('a','a','a','a','a','b','b','b','b','c','c'))
>>> p2_conns = Counter(('b','c','d','e','f','c','a','e','d','e','f'))
>>> p1_conns & p2_conns
Counter({'c': 2, 'a': 1, 'b': 1})
>>> sorted(p1_conns & p2_conns)
['a', 'b', 'c']
>>> len(p1_conns & p2_conns)
3

The length is then the number of common connections. That last value is also available if you just use set intersections:

>>> p1_set = {'a','a','a','a','a','b','b','b','b','c','c'}
>>> p2_set = {'b','c','d','e','f','c','a','e','d','e','f'}
>>> p1_set & p2_set
set(['a', 'c', 'b'])
>>> len(p1_set & p2_set)
3

but the counters (multi-sets) also say something about their counts.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • If i'm interpreting your results correctly, in each case you are getting back a,b,c. But that's not the same as showing that a and b are connected via c, d, e as described in the question. So would `p1_set = {'a','b','c','d'}` and `p2_set = {'b','a','d','c'}` return 4 even though each person is connected to no more than 1 person? – MrFlick Feb 26 '15 at 02:20
  • @MrFlick: yup, I misread the question at first then. I was finding the direct connections; nowhere is it clear that indices into those sequences mark edges, which on second reading they appear to mean. Voting on both or answers appears to indicate most people missed those details too; I was going by how often a 'connection' appears. – Martijn Pieters Feb 26 '15 at 08:15
  • @MrFlick: which goes some small distance towards explaining why the OP edited their post to link to [Graph Algorithm To Find All Connections Between Two Arbitrary Vertices](http://stackoverflow.com/q/58306) as a dupe – Martijn Pieters Feb 26 '15 at 08:21
4

It seems like you want find the maximal number of length-2 paths from one vertex to another. I'm not sure this is very efficient, but you could do it in R with

library(igraph)
gg <- graph.data.frame(connections, directed=F)
ln <- sapply(V(gg), function(x) V(gg)[nei(x)])
max(combn(ln,2, function(x) sum(x[[1]] %in% x[[2]])))
# [1] 3

Here we use a proper graph library to connect the nodes. Then we look at overlapping sets of neighbors to count the number of two-step paths.

plot(gg)

enter image description here

MrFlick
  • 195,160
  • 17
  • 277
  • 295