1

I am trying to save a large dendrogram made from a large table (10000+ rows, 18 columns), and I came with this code

from matplotlib import pyplot as plt
from scipy.cluster.hierarchy import dendrogram, linkage
import numpy as np
import pandas as pd

data = pd.read_csv("Input.txt", header = 0, index_col = None,\
               sep = "\t", memory_map = True)
data = data.fillna(0)
Matrix = data.iloc[:,-18:]

Linkage_Matrix = linkage (Matrix, "ward")
fig=plt.figure(figsize=(20, 200))
#fig, ax = plt.subplots(1, 1, tight_layout=False)
ax = fig.add_axes([0.1,0.1,0.75,0.75])
#fig.title('Hierarchical Clustering Dendrogram')
ax.set_title("Hierarchical Clustering Dendrogram")
ax.set_xlabel("distance")
ax.set_xlabel("name")
dendrogram(
    Linkage_Matrix,
    orientation ="left",
    leaf_rotation=0., 
    leaf_font_size=12.,  
    labels = list(data.loc[:,"name"])
)    
ax.set_yticklabels(list(data.loc[:,"name"]), minor=False)
ax.yaxis.set_label_position('right')
ax.yaxis.tick_right()

plt.savefig("plt1.png", dpi = 320, format= "png", bbox_inches=None)

But unfortunately, it doesn't save the axis, while I left some space as showed in these:
Matplotlib savefig does not save axes
Why is my xlabel cut off in my matplotlib plot?
Matplotlib savefig image trim Plotting hierarchical clustering dendrograms for large data sets Dendrogram generated by scipy-cluster customisation I have a correct display in the console, which I can save, but the dpi are not good, and ideally I also would like to switch to svg to be able to set the level of readability afterwards.

Any insights would be greatly appreciated

Ando Jurai
  • 1,003
  • 2
  • 14
  • 29
  • What exactly do you mean by "it doesn't save the axis"? – ImportanceOfBeingErnest Jul 11 '17 at 11:57
  • Exactly what it means; axis and elements that are depending on it like ticks, labels and so on are absent from the saved figure. – Ando Jurai Jul 11 '17 at 13:13
  • But the dendrogram itself is correctly shown in the plot without axis spines, ticks and labels or is the plot completely empty? Would you be able to provide a [mcve] of the issue that can be copied and run, to reproduce the problem? Just looking at the code I do not see what's wrong. – ImportanceOfBeingErnest Jul 11 '17 at 14:14

1 Answers1

3

Removing this line

ax = fig.add_axes([0.1,0.1,0.75,0.75])

and setting bbox_inches='tight' in plt.savefig() makes it work for me.

Also, since you are loading the data with pandas, note how you can declare the 'name' column as index and use these index values as labels.

from matplotlib import pyplot as plt
from scipy.cluster.hierarchy import dendrogram, linkage
import numpy as np
import pandas as pd


data = pd.read_csv('input.txt', header=0, index_col=['name'], sep="\t")
data = data.fillna(0)

link_matrix = linkage(data, 'ward')
fig, ax = plt.subplots(1, 1, figsize=(20,200))
ax.set_title('Hierarchical Clustering Dendrogram')
ax.set_xlabel('distance')
ax.set_ylabel('name')
dendrogram(
    link_matrix,
    orientation='left',
    leaf_rotation=0., 
    leaf_font_size=12.,  
    labels=data.index.values
)    
ax.yaxis.set_label_position('right')
ax.yaxis.tick_right()
plt.savefig('plt1.png', dpi=320, format='png', bbox_inches='tight')
amain
  • 1,668
  • 13
  • 19
  • I could verify it works, but somehow it was more related for me to bbox_inches than to ax line. Thanks for the insight about using index, I know that I can make any column an index but somehow I thought that I need this as a real column to use it... The problem is that fig is saved but can't be reopened, most of the time, due to size I guess (some programs don't see this as png/svg) – Ando Jurai Jul 11 '17 at 15:46
  • 1
    @AndoJurai I had similar problems with those very large PNGs and finally switched to a JS/HTML-based visualization. There are some solutions out there but I went with [InCHlib.js](https://github.com/skutac/InCHlib.js). Being far from perfect, it suits my needs and is easily adaptable. – amain Jul 12 '17 at 09:07
  • Thanks for these insights. I checked and actually when size is bigger than 5mg, png created such way won't open indeed. So I will consider such solution. – Ando Jurai Jul 12 '17 at 11:51