0

I am trying to generalize the graph implementation of the TextRank algorithm following what I found in this Gist on Github.

Basically it's a graph problem where you need to create edges between all pairs of nodes in a sliding window of size n. So for example if I have a list of nodes

nodes=[0,1,2,3,4,5,6] 

and the window size is n=3, I would want to enumerate a set of edges

edges=[(0,1), (0,2), (1,2), (1,3), (2,3), (2,4), (3,4), (3,5), ..., (5,6)]

I've looked at a number of questions on sliding windows (e.g., #6998245 and #6822725) and on enumerating pairs (e.g., #1257413 and #13014595) but, unfortunately, I don't have enough experience with Python and iterators to combine them into a single function.

The solution I came up with is this:

from itertools import islice, combinations, chain

def window(seq, n=2):
    "Returns a sliding window (of width n) over data from the iterable"
    "   s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ...                   "
    it = iter(seq)
    result = tuple(islice(it, n))
    if len(result) == n:
        yield result    
    for elem in it:
       result = result[1:] + (elem,)
       yield result

nodes = [0,1,2,3,4,5,6]

windowed = window(nodes,3)
intermediate_list = [combinations(item,2) for item in windowed]
edges = list(set(chain.from_iterable(intermediate_list)))
edges = sorted(edges)

print edges

This gives me the result that I want, sorting not really being necessary, but is there a more elegant "pythonic" way of doing it?

Community
  • 1
  • 1
nsecord
  • 210
  • 3
  • 7

1 Answers1

0

Actually IMHO what you have isn't bad and is fairly readable. The only improvement I can see with in the way it currently operates would be to change the intermediate_list to be a generator expression object instead of an actual list:

intermediate_list = (combinations(item,2) for item in windowed) # outer parentheses required

You might also want to change its name to better reflect it would then be, say something like pair_generator or gen_pairs.

BTW, it's above and beyond "Pythonic" to avoid prematurely optimizing things (unless, of course, you're doing it just for fun).

martineau
  • 119,623
  • 25
  • 170
  • 301
  • Thanks for your comment, I'll switch it to a generator as you suggest. I guess my concern was with the intermediate step generating duplicates and then using list(set(..)) to remove the duplicates. I've seen it suggested in many places that to remove duplicates in lists you can simply switch to set and then back to list but somehow it leaves me a little uneasy. In other languages you can easily get yourself into trouble switching back and forth between types. – nsecord Jan 24 '13 at 21:38
  • Well, item order is lost when switching from a sequence to something like a set or dictionary -- so there is some information-loss, but that does not appear to be a concern here. There are other ways of removing duplicates -- which I believe is the purpose of using a set in that step -- but using a set is one of the best if ordering isn't an issue. – martineau Jan 24 '13 at 23:00