6

I have dataset which contains 72 features. Now I want to find the most significant features which will affect my model. So I am trying to plot a correlation matrix plot using seaborn and matplotlib but when I try to plot it as it contains 72 features it is not possible to visualize it properly. How can I enlarge the plot to understand it better.

Code:

%matplotlib inline
corr = data.corr()
sns.heatmap(corr, 
            xticklabels=corr.columns.values,
            yticklabels=corr.columns.values)

Screenshot:

enter image description here

stone rock
  • 1,923
  • 9
  • 43
  • 72
  • 1
    Have you tried `plt.matshow(data.corr())`? – Prem Mar 16 '18 at 07:29
  • @Prem I tried it as you told but see the screenshot : https://imgur.com/a/ZLBgK – stone rock Mar 16 '18 at 07:34
  • Isn't the numbering (as a label) helps identify high/ low correlated features? I mean X & Y axis labels are numbers so I think you can easily identify the feature name corresponding to these numbers. – Prem Mar 16 '18 at 07:39
  • @Prem Couldn't understand you ? – stone rock Mar 16 '18 at 07:40
  • 1
    @Prem No how can I understand the correlated feature ? I want to increase the cell size to understand them better. – stone rock Mar 16 '18 at 07:42
  • 1
    Let's try `%matplotlib notebook` instead of `%matplotlib inline` and then run `plt.matshow(data.corr())` after restarting kernel. You should be able to have an interactive plot. – Prem Mar 16 '18 at 07:54
  • @Prem Now it shows blank space in output it does not show any plot if I change inline to notebook. – stone rock Mar 16 '18 at 07:58
  • It'll open graph in another window but make sure that you restart kernel before executing the changed magic function. – Prem Mar 16 '18 at 08:26
  • 2
    Options you have (1) [Change the figure size](https://stackoverflow.com/questions/332289/how-do-you-change-the-size-of-figures-drawn-with-matplotlib), (2) [change the font size](https://stackoverflow.com/questions/3899980/how-to-change-the-font-size-on-a-matplotlib-plot) to make the labels readable, (3) Show the [graph interactively](https://stackoverflow.com/questions/41125690/matplotlib-notebook-showing-a-blank-histogram) to be able to zoom. (4) create a [tooltip in interactive plots](https://stackoverflow.com/questions/46531243/how-to-control-mouseover-text-in-matplotlib). – ImportanceOfBeingErnest Mar 16 '18 at 09:24
  • @ImportanceOfBeingErnest What will be good way to visualize the data with 72 features. Should I try to display subset of features or all 72 features. As I am not able to plot the enlargered plot using seaborn. If I make use of `matshow` then labeling of cells is disappeared. – stone rock Mar 16 '18 at 12:20
  • I gave you 4 options you have. – ImportanceOfBeingErnest Mar 16 '18 at 12:52
  • @ImportanceOfBeingErnest but those are matplotlib specific I want to do it with seaborn. I am new to both of them. – stone rock Mar 16 '18 at 12:54
  • Every seaborn plot is a matplotlib plot. If you have a specific problem with one of the suggested solution you can ask about it, showing the code you have a problem with. – ImportanceOfBeingErnest Mar 16 '18 at 13:00
  • @ImportanceOfBeingErnest I am not increasing fontsize or increasing the cell size instead I am taking subset of features to analyze correlation and then reducing fontscale but I have a problem now check this out: https://stackoverflow.com/questions/49321361/how-to-align-xlabels-and-ylabels-in-seaborn – stone rock Mar 16 '18 at 13:07

0 Answers0