4

I want to visualize 2D data that I have. For example following is the data with four attributes:

       att1  att2   att3
fun1     10     0      2
fun2      0     1      3
fun3      1    10      5
fun4      2     3     10

I want to assign each data point a different colour. The intensity of the color will depend on the value of the attribute in that column, and each column must have a different color.

Following is the desired image:

enter image description here

Does anyone have any idea how I can make it in Python or R?

jrjc
  • 21,103
  • 9
  • 64
  • 78
Sangeeta
  • 589
  • 1
  • 7
  • 26
  • Yes, http://docs.ggplot2.org/current/geom_tile.html. – Roman Luštrik Aug 06 '14 at 07:19
  • Another way is with `lattice`, via `levelplot`: http://stackoverflow.com/questions/8585767/how-do-i-specify-different-color-ranges-for-different-levels – Alex Reynolds Aug 06 '14 at 07:21
  • 1
    The OP wants different colors for each column, it's not just a heatmap. – jrjc Aug 06 '14 at 08:18
  • Are the numbers always scaled to be max-10? Do you want the colours scaled to max intensity for each column max or the overall max or scaled so max intensity is 10 even if there's no tens? Same for the minimum - top left looks white but is value 2, not zero. – Spacedman Aug 06 '14 at 09:09
  • And what do you even mean by intensity? Refer to RGB and HSV colour spaces if necessary. How exact is your figure? The third-down green looks more intense (saturated) and the fourth-down green is clearly darker but has a larger value. How did **you** map the numbers to colours? – Spacedman Aug 06 '14 at 09:17

2 Answers2

7

With Python :

I found a better way :

import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.cm as cm

# data loading
df = pd.read_csv("file.csv", index_col=0) 


# plotting
fig,ax = plt.subplots()
ax.matshow(df.mask(((df == df) | df.isnull()) & (df.columns != "att1")), 
           cmap=cm.Reds) # You can change the colormap here
ax.matshow(df.mask(((df == df) | df.isnull()) & (df.columns != "att2")), 
           cmap=cm.Greens)
ax.matshow(df.mask(((df == df) | df.isnull()) & (df.columns != "att3")), 
           cmap=cm.Blues)
plt.xticks(range(3), df.columns)
plt.yticks(range(4), df.index)
plt.show()

hm

some details :

df.mask(((df == df) | df.isnull()) & (df.columns != "att1"))
      att1  att2  att3
fun1    10   NaN   NaN
fun2     0   NaN   NaN
fun3     1   NaN   NaN
fun4     2   NaN   NaN

Older version, with numpy masked array :

import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.cm as cm
from numpy.ma import masked_array
import numpy as np

df = pd.read_clipboard() # just copied your example

# define masked arrays to mask all but the given column
c1 = masked_array(df, mask=(np.ones_like(df)*(df.values[0]!=df.values[0][0]))) 
c2 = masked_array(df, mask=(np.ones_like(df)*(df.values[0]!=df.values[0][1])))
c3 = masked_array(df, mask=(np.ones_like(df)*(df.values[0]!=df.values[0][2])))

fig,ax = plt.subplots()
ax.matshow(c1,cmap=cm.Reds) # You can change the colormap here
ax.matshow(c2,cmap=cm.Greens)
ax.matshow(c3,cmap=cm.Blues)
plt.xticks(range(3), df.columns)
plt.yticks(range(4), df.index)

Some details :

df is a dataframe :

      att1  att2  att3
fun1    10     0     2
fun2     0     1     3
fun3     1    10     5
fun4     2     3    10

c1, c2, c3 are masked arrays (for columns 1, 2 and 3):

>>> c1
masked_array(data =
 [[10 -- --]
 [0 -- --]
 [1 -- --]
 [2 -- --]],
             mask =
 [[False  True  True]
 [False  True  True]
 [False  True  True]
 [False  True  True]],
       fill_value = 999999)

alternatively, you can start from a numpy 2D array :

>> data
array([[10,  0,  2],
       [ 0,  1,  3],
       [ 1, 10,  5],
       [ 2,  3, 10]])

and replace all df and df.values with data (the 2D array), except in the labeling part.

jrjc
  • 21,103
  • 9
  • 64
  • 78
  • Hi, I am facing problem in creating dataframe normal array is not working with the code. Thanks! – Sangeeta Aug 08 '14 at 14:40
  • 1
    @Sangeeta : I didn't understand. Do you know how to read a csv file with pandas ? Is there something in the code you don't understand? – jrjc Aug 08 '14 at 15:09
  • Thanks! I it working! I though it will work only with "read_clipboard" function. Changing it to "read_csv" gave exactly what I wanted. Thanks :) – Sangeeta Aug 09 '14 at 04:05
  • Hi @Jeanrjc, When I had only 8 rows the graph was fine. But as I increased number of rows to 800 the graph became very clumsy, I mean nothing is visible in graph. Do you have any idea, how can I increase number of row. Thanks! – Sangeeta Aug 10 '14 at 12:23
  • Do you mean row or column ? And what do you mean with your last question ? – jrjc Aug 10 '14 at 15:19
  • I have data with 6 columns and 864 rows. After increasing no. of rows to such large value nothing is getting clear on image. I have asked another question here http://stackoverflow.com/questions/25228723/increasing-number-of-rows-in-python-2d-heatmap . Thanks! – Sangeeta Aug 10 '14 at 16:01
0

Try:

ddf = structure(list(fn = structure(1:4, .Label = c("fun1", "fun2", 
"fun3", "fun4"), class = "factor"), att1 = c(10L, 0L, 1L, 2L), 
    att2 = c(0L, 1L, 10L, 3L), att3 = c(2L, 3L, 5L, 10L)), .Names = c("fn", 
"att1", "att2", "att3"), class = "data.frame", row.names = c(NA, 
-4L))

ddf
    fn att1 att2 att3
1 fun1   10    0    2
2 fun2    0    1    3
3 fun3    1   10    5
4 fun4    2    3   10

ddfm = melt(ddf)
ddfm
     fn variable value
1  fun1     att1    10
2  fun2     att1     0
3  fun3     att1     1
4  fun4     att1     2
5  fun1     att2     0
6  fun2     att2     1
7  fun3     att2    10
8  fun4     att2     3
9  fun1     att3     2
10 fun2     att3     3
11 fun3     att3     5
12 fun4     att3    10
> 
ggplot(ddfm)+ geom_tile(aes(x=variable, y=fn, fill=value))

enter image description here

rnso
  • 23,686
  • 25
  • 112
  • 234