Chart with lots of (but varied number of) Y values for each X value

Question

I'm not quite sure how to describe my problem, so I'm having trouble googling for solutions. Forgive me if the answer has been described elsewhere.

I have a function that compares two things and returns a tuple of a value and a list of values, where the first value is always part of the list eg: (a, [m,n,a,o]). I have a list of things that I want to compare [thing1, thing2, thing3, thing4]. I've got a function that loops through the things and compares them, but I'm having trouble figuring out how to plot them:

def compare_thing1(things=[thing2,thing3,thing4]):
    for thing in things:
        *compare thing1 to thing, add to dataframe*

    plot

So if thing1 to thing2 comparison returns (10, [8,9,10,11,12]), the thing3 comparison returns (25, [24,25,26,27]) and thing4 comparison returns (30, [28,29,30,31,32,33...]), I want a graph that looks like this:

In other words, the X position is determined by the first value, and then the values in the list are plotted on the y axis.

I think I could sort of cloodge this together by creating a bunch of (x,y) coordinates from each comparison, but I was wondering if there's a better way to do this with Series objects or something. The problem is that all of the lists are different lengths.

Oh, also not sure if performace is an issue, each of the comparisons can be thousands of values long.

@Jblasco Not sure it's worth going into - but briefly, I'm comparing the genomes of different species and looking at the gene distance. The numbers I'm returning are gene distances, and the `x` value is the distance of the 16S ribosomal subunit gene (which is a standard marker of evolutionary distance) — kevbonham, Jul 28 '15 at 14:56
Sorry, kevbonham, thought it was part of the code you were looking for. My mistake! — Jblasco, Jul 28 '15 at 14:56
@Jblasco ahh... no, that code works fine, it's the plotting part I'm having issue with :-) — kevbonham, Jul 28 '15 at 14:58
@Jblasco I see now why my wording at the beginning is confusing... will make an edit. — kevbonham, Jul 28 '15 at 14:59

S E Clark · Answer 1 · 2015-07-28T19:56:03.163

2

itertools repeat may be helpful here, e.g.:

repeat(thing1[0], len(thing1[1]))

This will yield a list of the first value with the length of the rest of the values. Then you can simply plot one vs the other.

EDIT: For completeness, here's how you use this to plot. Assuming things is a list containing objects structured as you described, e.g. things = [thing1, thing2, thing3]

for thing in things:

    xthing = repeat(thing[0], len(thing[1]))
    plt.scatter(xthing, thing[1])

edited Jul 28 '15 at 19:56

answered Jul 28 '15 at 14:58

S E Clark

423
4
15

1

Better than mine, with an iterator, instead of building the list – Jblasco Jul 28 '15 at 15:01
The iterator is better for performance reasons? You're right that a couple thousand points doesn't seem to be causing an issue. – kevbonham Jul 28 '15 at 15:08
1

It would be if you scale your problem up and say millions instead of thousands, for example. That would mean a big difference. So S E Clark's answer will do better in more cases than mine, although it is essentially the same thing. – Jblasco Jul 28 '15 at 17:36
Agreed - In a scaled-up version, the iterator is both faster and more memory efficient. But either construction is faster than appending in a for-loop because you avoid the .append() lookup on each iteration: see [this answer](http://stackoverflow.com/questions/14124610/python-list-comprehension-expensive) – S E Clark Jul 28 '15 at 19:28

score 1 · Accepted Answer · answered Jul 28 '15 at 14:54

1

import matplotlib.pyplot as plt

thing1 = (10, [8,9,10,11,12])
thing2 = (25, [24,25,26,27])
thing3 = (30, [28,29,30,31,32,33])

thing1_y = []
for i in thing1[1]:
    thing1_y.append(i)
thing1_x = []
for i in range(len(thing1_y)):
    thing1_x.append(thing1[0])

This will give you :

In [2]: thing1_x
Out[2]: [10, 10, 10, 10, 10]

In [3]: thing1_y
Out[3]: [8, 9, 10, 11, 12]

In other words, the X position is determined by the first value, and then the values in the list are plotted on the y axis.

Now you can use 

plt.scatter(thing1_x,thing1_y)

answered Jul 28 '15 at 14:54

Srivatsan

9,225
13
58
83

Ok, so there's no magic?.. I can accomplish the same thing by doing `plot([thing1[0] for y in thing1[1]], thing1[1])`... But this is what I was going for. – kevbonham Jul 28 '15 at 15:07
@kevbonham: Yep! But in your case you'd have to write `n` number of times, but here you append the values to a list and then plot them easily by calling just your `x list` and `y list` – Srivatsan Jul 28 '15 at 15:10

score 1 · Answer 3 · answered Jul 28 '15 at 14:59

You can split the x and y, which is, I think, what you mention when you say: "I think I could sort of cloodge this together by creating a bunch of (x,y) coordinates from each comparison."

x, y = (10, [8,9,10,11,12])
x = x * len(y)

I wouldn'd expect thousands of points to be a problem computationally talking, but if it is x can be formed using a generator.

Chart with lots of (but varied number of) Y values for each X value

3 Answers3