How to visualize overlapping data in python?

Question

Suppose i have two lists:

a=[00000011000000111100001100001]
b=[00000111100000010000001100001]

Is there any way to visualize the intersection? I mean to create some plot or pic where white background will be zeros, ones from [a] will be red, ones from [b] - blue, and their overlapping - violet?

So you have a == [83010353356805013505L] and b == [674459080744761589761L]? — timgeb, Feb 05 '16 at 12:53
like a=[00000011000000111100001100001] and b=[00000111100000010000001100001], but larger — Polly, Feb 05 '16 at 12:56
... that's the same as in your question, which you just said is not accurate. — timgeb, Feb 05 '16 at 12:57
Well i told you i read both a and b from csv file, where its a column. — Polly, Feb 05 '16 at 13:00
I'll ask you one last time to answer the question how a and b look like in your program before deciding this is a waste of time. WHAT ARE a AND b? Lists of integers, lists of strings? Macaroni? — timgeb, Feb 05 '16 at 13:01
You could try something like [plotly's heatmaps](https://plot.ly/python/heatmaps/). However, you'll first have to import `a` and `b` different from the csv. — Nander Speerstra, Feb 05 '16 at 13:02
dataset=loadtxt(open('data.csv', 'r'), dtype='f8', delimiter=';', skiprows=1), a=[i[1:] for i in dataset], and the same with b. Yes, its lists of integers. — Polly, Feb 05 '16 at 13:04
Please do not use the comments for code. You can update your question with additional details at any time. Also, you did provide your sample inputs, but what is your sample output? — OneCricketeer, Feb 05 '16 at 13:32
When you say `a` is a list of integers, do you mean `a` is a list of 1s and 0s, e.g., `a = [0, 1, 1, 0]`, **or** `a` is a list of integers, each which you want to transform to binary and compare the overlap with the bits in `b`? An example of the latter would be `a = [5, 13]`, which as binary would look like `00101` and `01101` respectively? — Reti43, Feb 05 '16 at 14:56

score 1 · Answer 1 · edited May 23 '17 at 12:01

While my solution is similar to Chris's, I'm also showing how to define custom colours and generalise the approach for multiple overlays.

For each position you get a contribution of the integer in a and the integer in b. Since those can be either 0 or 1, you can use a binary description. By shifting the integer in b to the second position, i.e. multiplying by 2, you can represent any combination with 2 bits, ba, one for each integer in each list (for each position).

Matplotlib also allows you to create your own custom colour map. Combining these aspects you can achieve the result you're after.

import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np

a = np.array([0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1])
b = np.array([0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1])

lists = [a, b]
overlap = np.zeros_like(a)

for k, row in enumerate(lists):
    overlap += row * 2**k

cmap = mpl.colors.ListedColormap(['white', 'blue', 'red', 'purple'])
bounds = range((2**len(lists))+1)
norm = mpl.colors.BoundaryNorm(bounds, cmap.N)

plt.pcolor(overlap.reshape((1, overlap.shape[0])), edgecolor='k', cmap=cmap, norm=norm)
plt.axes().set_aspect('equal')
plt.xticks([])
plt.yticks([])
plt.xlim(0, len(a))
plt.show()

Output:

If you had 3 overlays, you'd have to give a list of colours in the following order:

000    white
001    colour A
010    colour B
011    colour A+B
100    colour C
101    colour A+C
110    colour B+C
111    colour A+B+C

Unfortunately, this has to be done manually. You can see a list of colour names here. If a colour doesn't have an explicit name, you can always describe it by its hexadecimal value in a string, like '#RRGGBB', where RR would be the hexadecimal value for the red channel, etc. So, instead of 'white' you could have said '#FFFFFF' (the letters can also be lowercase).

Code notes:

bounds is the list of numbers [0, 1, 2, 3, 4]. This means that any value from 0 to 1, but not including 1, will be mapped to white, any value between 1 and 2 to blue, etc. If we have k overlays, we need 2**k + 1 boundary numbers.

I chose to represent the data with pcolor() since it has the edgecolor option, which better allows to visualise runs of the same colour. However, the function requires a 2D array for input, hence why I had to reshape the overlap array from size (29,) to (1, 29). Generally though, this code would also work even if a and b were 2D arrays, in which case you would skip any reshaping.

In this section I briefly discuss other approaches I considered but found lacking or unnecessarily complex considering the scope of the OP.

Alpha blending (transparency)

One could create a row of white-blue for the first list and stack on top of it another row of white-red. In theory, white-white would be white, white-blue would be blue, white-red would be red and blue-red would be purple. However, since each row is only semi-transparent, red looks like pink and blue with layers of white becomes "diluted" to light blue. This effect would be even more pronounced with multiple layers, but at least the combination of colours would emerge without any manual definitions.

An advantage of this method is that it doesn't only support 1 or 0 for any individual overlay, but any gradient in the between.

import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np

a = np.array([0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1])
b = np.array([0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1])

blue = mpl.colors.LinearSegmentedColormap.from_list('blue', ['white', 'blue'], 256)
red = mpl.colors.LinearSegmentedColormap.from_list('red', ['white', 'red'], 256)

plt.pcolor(a.reshape((1, a.shape[0])), cmap=blue, edgecolor='k', alpha=1.0)
plt.pcolor(b.reshape((1, b.shape[0])), cmap=red, edgecolor='k', alpha=0.5)
plt.axes().set_aspect('equal')
plt.xticks([])
plt.yticks([])
plt.xlim(0, len(a))
plt.show()

Output:

RGB blending

What if one translated the overlays to RGB values? For example, each 1 in a could be represented by the [0, 0, 255] triplet. By mixing the RGB values for each overlay, we could obtain a final RGB value for each position which we could then plot with an RGB colour map. However, this isn't as simple as it sounds.

Thank you very much Reti43! It's exactly what i needed – Polly Feb 08 '16 at 11:29 — Polly, Feb 08 '16 at 11:29

score 0 · Answer 2 · answered Feb 05 '16 at 15:02

My suggestion is to create a list that contains 4 values:

0 = 0 in both,1 = 1 in b,2 = 1 in a, 3 = 1 in both

You can accomplish this by multiplying the first list by 2, and adding it to the second list. This will result in list c, which has values 0-3, based on where the 1s show up in your original two lists.

Use matplotlib to generate the chart as follows:

import matplotlib.pyplot as plt

a = [0,1,0,1,0,0,0,0,1,0,0,0,1,1,1]
b = [0,0,1,0,0,1,1,0,0,0,1,0,1,1,0]
c = []

for i, item in enumerate(a):
    c.append(item * 2 + b[i])

x = range(0,len(a))
y = [1] * len(a)

plt.scatter(x, y, c=c, s=500)
plt.show()

You can print(c) to reveal how the color index works:

c = [0, 2, 1, 2, 0, 1, 1, 0, 2, 0, 1, 0, 3, 3, 2]

In this case - blue: both=0, yellow: a=1, light blue: b=1, red: both= 1

How to visualize overlapping data in python?

2 Answers2

Alpha blending (transparency)

RGB blending