0

I'm not quite sure how to describe my problem, so I'm having trouble googling for solutions. Forgive me if the answer has been described elsewhere.

I have a function that compares two things and returns a tuple of a value and a list of values, where the first value is always part of the list eg: (a, [m,n,a,o]). I have a list of things that I want to compare [thing1, thing2, thing3, thing4]. I've got a function that loops through the things and compares them, but I'm having trouble figuring out how to plot them:

def compare_thing1(things=[thing2,thing3,thing4]):
    for thing in things:
        *compare thing1 to thing, add to dataframe*

    plot

So if thing1 to thing2 comparison returns (10, [8,9,10,11,12]), the thing3 comparison returns (25, [24,25,26,27]) and thing4 comparison returns (30, [28,29,30,31,32,33...]), I want a graph that looks like this:

enter image description here

In other words, the X position is determined by the first value, and then the values in the list are plotted on the y axis.

I think I could sort of cloodge this together by creating a bunch of (x,y) coordinates from each comparison, but I was wondering if there's a better way to do this with Series objects or something. The problem is that all of the lists are different lengths.

Oh, also not sure if performace is an issue, each of the comparisons can be thousands of values long.

Community
  • 1
  • 1
kevbonham
  • 999
  • 7
  • 24
  • what does "comparison" mean here? – Jblasco Jul 28 '15 at 14:53
  • @Jblasco Not sure it's worth going into - but briefly, I'm comparing the genomes of different species and looking at the gene distance. The numbers I'm returning are gene distances, and the `x` value is the distance of the 16S ribosomal subunit gene (which is a standard marker of evolutionary distance) – kevbonham Jul 28 '15 at 14:56
  • Sorry, kevbonham, thought it was part of the code you were looking for. My mistake! – Jblasco Jul 28 '15 at 14:56
  • @Jblasco ahh... no, that code works fine, it's the plotting part I'm having issue with :-) – kevbonham Jul 28 '15 at 14:58
  • @Jblasco I see now why my wording at the beginning is confusing... will make an edit. – kevbonham Jul 28 '15 at 14:59

3 Answers3

2

itertools repeat may be helpful here, e.g.:

repeat(thing1[0], len(thing1[1]))

This will yield a list of the first value with the length of the rest of the values. Then you can simply plot one vs the other.

EDIT: For completeness, here's how you use this to plot. Assuming things is a list containing objects structured as you described, e.g. things = [thing1, thing2, thing3]

for thing in things:

    xthing = repeat(thing[0], len(thing[1]))
    plt.scatter(xthing, thing[1])
S E Clark
  • 423
  • 4
  • 15
  • 1
    Better than mine, with an iterator, instead of building the list – Jblasco Jul 28 '15 at 15:01
  • The iterator is better for performance reasons? You're right that a couple thousand points doesn't seem to be causing an issue. – kevbonham Jul 28 '15 at 15:08
  • 1
    It would be if you scale your problem up and say millions instead of thousands, for example. That would mean a big difference. So S E Clark's answer will do better in more cases than mine, although it is essentially the same thing. – Jblasco Jul 28 '15 at 17:36
  • Agreed - In a scaled-up version, the iterator is both faster and more memory efficient. But either construction is faster than appending in a for-loop because you avoid the .append() lookup on each iteration: see [this answer](http://stackoverflow.com/questions/14124610/python-list-comprehension-expensive) – S E Clark Jul 28 '15 at 19:28
1
import matplotlib.pyplot as plt

thing1 = (10, [8,9,10,11,12])
thing2 = (25, [24,25,26,27])
thing3 = (30, [28,29,30,31,32,33])

thing1_y = []
for i in thing1[1]:
    thing1_y.append(i)
thing1_x = []
for i in range(len(thing1_y)):
    thing1_x.append(thing1[0])

This will give you :

In [2]: thing1_x
Out[2]: [10, 10, 10, 10, 10]

In [3]: thing1_y
Out[3]: [8, 9, 10, 11, 12]

In other words, the X position is determined by the first value, and then the values in the list are plotted on the y axis.

Now you can use 

plt.scatter(thing1_x,thing1_y)
Srivatsan
  • 9,225
  • 13
  • 58
  • 83
  • Ok, so there's no magic?.. I can accomplish the same thing by doing `plot([thing1[0] for y in thing1[1]], thing1[1])`... But this is what I was going for. – kevbonham Jul 28 '15 at 15:07
  • @kevbonham: Yep! But in your case you'd have to write `n` number of times, but here you append the values to a list and then plot them easily by calling just your `x list` and `y list` – Srivatsan Jul 28 '15 at 15:10
1

You can split the x and y, which is, I think, what you mention when you say: "I think I could sort of cloodge this together by creating a bunch of (x,y) coordinates from each comparison."

x, y = (10, [8,9,10,11,12])
x = x * len(y)

I wouldn'd expect thousands of points to be a problem computationally talking, but if it is x can be formed using a generator.

Jblasco
  • 3,827
  • 22
  • 25