0

i have the variable 'actorslist' and its output 100 lines of this ( a line for each movie):

[u'Tim Robbins', u'Morgan Freeman', u'Bob Gunton', u'William Sadler']
[u'Christian Bale', u'Heath Ledger', u'Aaron Eckhart', u'Michael Caine']
etc.

Then I have:

pairslist = list(itertools.permutations(actorslist, 2))

This gives me the pairs of actors, but only within a specific movie and then after a new line it goes to the next movie. How can I get it to output all the actors from all the movies in one big array? The idea being that two actors who were in a movie together should get a pydot edge.

I put in this, which successfully outputted to a dot file, but isn't outputting the right data.

graph = pydot.Dot(graph_type='graph', charset="utf8")
for i in pairslist:
  edge = pydot.Edge(i[0], i[1])
  graph.add_edge(edge)
  graph.write('dotfile.dot')

My expected output should be as follows in the dot file (A,B) is the same as (B,A) and so don't exist in the output:

"Tim Robbins" -- "Morgan Freeman";
"Tim Robbins" -- "Bob Gunton";
"Tim Robbins" -- "William Sadler";
"Morgan Freeman" -- "Bob Gunton";
"Morgan Freeman" -- "William Sadler";
"Bob Gunton" -- "William Sadler";
"Christian Bale" -- "Heath Ledger";
"Christian Bale" -- "Aaron Eckhart";
"Christian Bale" -- "Michael Caine";
"Heath Ledger" -- "Aaron Eckhart";
"Heath Ledger" -- "Michael Caine";
"Aaron Eckhart" -- "Michael Caine";

ADDITIONAL INFO:

some were interested in how the variable actorslist was created:

file = open('input.txt','rU') ###input is JSON data on each line{"Title":"Shawshank...
nfile = codecs.open('output.txt','w','utf-8')
movie_actors = []
for line in file:
  line = line.rstrip()
  movie = json.loads(line)
  l = []
  title = movie['Title']
  actors = movie['Actors']
  tempactorslist = actors.split(',')
  actorslist = []
  for actor in tempactorslist:
    actor = actor.strip()
    actorslist.append(actor)
  l.append(title)
  l.append(actorslist)
  row = l[0] + '\t' + json.dumps(l[1]) + '\n'
  nfile.writelines(row)
kegewe
  • 291
  • 1
  • 4
  • 14

3 Answers3

1
from collections import Counter
from itertools import combinations
import pydot

actorslists = [
    [u'Tim Robbins', u'Morgan Freeman', u'Bob Gunton', u'William Sadler'],
    [u'Christian Bale', u'Heath Ledger', u'Aaron Eckhart', u'Michael Caine'],
    [u'Tim Robbins', u'Heath Ledger', u'Michael Caine']
]

# Counter tracks how often each pair of actors has occurred (-> link weight)
actorpairs = Counter(pair for actorslist in actorslists for pair in combinations(sorted(actorslist), 2))

graph = pydot.Dot(graph_type='graph', charset="utf8")
for actors,weight in actorpairs.iteritems():   # or .items() for Python 3.x
    a,b = list(actors)
    edge = pydot.Edge(a, b, weight=str(weight))
    graph.add_edge(edge)
graph.write('dotfile.dot')

results in

enter image description here

Hugh Bothwell
  • 55,315
  • 8
  • 84
  • 99
  • Thanks, it seems I'm missing the combinations module, might be because I'm running python 2.7. – kegewe Feb 12 '14 at 02:46
  • @kegewe: `itertools.combinations` is part of the standard Python 2.7 library. – Hugh Bothwell Feb 12 '14 at 02:48
  • hmm, I seem to be getting this error message: `a,b = list(actor)` `ValueError: too many values to unpack` – kegewe Feb 12 '14 at 02:53
  • @kegewe: typo on my part (should be actors, plural); fixed. I am currently installing pydot for testing (any idea where dot_parser is supposed to come from?) – Hugh Bothwell Feb 12 '14 at 02:54
  • 1
    yea the dot_parser is a bit tricky, but this seemed to help me: http://stackoverflow.com/questions/15951748/pydot-and-graphviz-error-couldnt-import-dot-parser-loading-of-dot-files-will – kegewe Feb 12 '14 at 02:57
  • yea this seemed to work, ill be trying to make a graph using graphviz – kegewe Feb 12 '14 at 02:58
  • Quick question...any way I can use python to transform my existing `actorlist` variable (100 rows) into your `actorslists`variable for use in your solution – kegewe Feb 12 '14 at 03:20
0

You'll want something like this:

import itertools

actorslist = [
    [u'Tim Robbins', u'Morgan Freeman', u'Bob Gunton', u'William Sadler'],
    [u'Christian Bale', u'Heath Ledger', u'Aaron Eckhart', u'Michael Caine']
    ]

for movie in actorslist:
    for actor1, actor2 in itertools.permutations(movie, 2):
        print(actor1, actor2)
        # make edge, etc.

Output:

Tim Robbins Morgan Freeman
Tim Robbins Bob Gunton
Tim Robbins William Sadler
Morgan Freeman Tim Robbins
Morgan Freeman Bob Gunton
Morgan Freeman William Sadler
Bob Gunton Tim Robbins
Bob Gunton Morgan Freeman
Bob Gunton William Sadler
William Sadler Tim Robbins
William Sadler Morgan Freeman
William Sadler Bob Gunton
Christian Bale Heath Ledger
Christian Bale Aaron Eckhart
Christian Bale Michael Caine
Heath Ledger Christian Bale
Heath Ledger Aaron Eckhart
Heath Ledger Michael Caine
Aaron Eckhart Christian Bale
Aaron Eckhart Heath Ledger
Aaron Eckhart Michael Caine
Michael Caine Christian Bale
Michael Caine Heath Ledger
Michael Caine Aaron Eckhart

What you have right now is permuting the list of movies, not the list of actors within each movie.

senshin
  • 10,022
  • 7
  • 46
  • 59
  • Same comment as above: my problem is that the `actorslist` variable I have does not have sets of actors separated by comma, but rather has a new set of actors on each line...do you know how I can transform that variable so it looks like yours on top – kegewe Feb 12 '14 at 03:28
  • @kegewe What? What is the type of `actorslist`? – senshin Feb 12 '14 at 03:30
  • when I say `print type(actorslist)` I get many rows of `` – kegewe Feb 12 '14 at 03:35
  • @kegewe How are you creating `actorslist`? I don't understand how `print type(actorslist)` could give you "many rows". – senshin Feb 12 '14 at 03:35
  • I edited the original question to show how I created `actorslist` – kegewe Feb 12 '14 at 03:41
  • sorry about that, my bad...its `` I was running print type function inside the loop – kegewe Feb 12 '14 at 03:44
  • @kegewe Right, so I do not understand what you mean when you say that `actorslist` "has a new set of actors on each line" or how that is somehow different from what I included in my example code. `actorslist` is just a plain old list of lists, and there is nothing to "transform" here. – senshin Feb 12 '14 at 03:46
  • the `actorslist` variable you posted has sets of `[]` separated by commas. The one I'm producing is just one row after another with no commas – kegewe Feb 12 '14 at 04:02
  • @kegewe There is no such data structure in Python. Whether or not you believe me, what you have is a list of lists. Maybe you are being confused by the way you are printing `actorslist` or something. – senshin Feb 12 '14 at 04:04
  • yea I think you're right, I think that I'm writing to a text file one line at a time, but it's not saving everything in my variable...I'll try outputting the text file as only actors, then re-opening the text file and reading it into a new variable – kegewe Feb 12 '14 at 04:12
0

I am not sure how complicated it needs to be, but this seems to work to generate your output. I only changed your pairs line... (I took the liberty of putting Tim Robbins into Batman, just to give it more realistic overlap)

actorslist = [[u'Tim Robbins', u'Morgan Freeman', u'Bob Gunton', u'William Sadler'],
  [u'Christian Bale', u'Heath Ledger', u'Tim Robbins', u'Michael Caine']]

import itertools
import pydot
graph = pydot.Dot(graph_type='graph', charset="utf8")

# generate a list of all unique actors, if you want that
# allactors = list(set([j for j in [i for i in actorslist]]))

# this is the key line -- you have to iterate through the list 
# and not try to permute the whole thing
pairs = [list(itertools.permutations(k, 2)) for k in actorslist]


for pair in pairs:
    for a,b in pair:
        edge = pydot.Edge(a,b)
        graph.add_edge(edge)
        graph.write('dotfile.dot')

Output file (remember I changed the input re Tim Robbins)...

graph G {
charset=utf8;
"Tim Robbins" -- "Morgan Freeman";
"Tim Robbins" -- "Bob Gunton";
"Tim Robbins" -- "William Sadler";
"Morgan Freeman" -- "Tim Robbins";
"Morgan Freeman" -- "Bob Gunton";
"Morgan Freeman" -- "William Sadler";
"Bob Gunton" -- "Tim Robbins";
"Bob Gunton" -- "Morgan Freeman";
"Bob Gunton" -- "William Sadler";
"William Sadler" -- "Tim Robbins";
"William Sadler" -- "Morgan Freeman";
"William Sadler" -- "Bob Gunton";
"Christian Bale" -- "Heath Ledger";
"Christian Bale" -- "Tim Robbins";
"Christian Bale" -- "Michael Caine";
"Heath Ledger" -- "Christian Bale";
"Heath Ledger" -- "Tim Robbins";
"Heath Ledger" -- "Michael Caine";
"Tim Robbins" -- "Christian Bale";
"Tim Robbins" -- "Heath Ledger";
"Tim Robbins" -- "Michael Caine";
"Michael Caine" -- "Christian Bale";
"Michael Caine" -- "Heath Ledger";
"Michael Caine" -- "Tim Robbins";
}
beroe
  • 11,784
  • 5
  • 34
  • 79
  • I guess my problem is that the `actorslist` variable I have does not have sets of actors separated by comma, but rather has a new set of actors on each line...do you know how I can transform that variable so it looks like yours on top – kegewe Feb 12 '14 at 03:28