0

I am trying to do a correlation heatmap for my 13 features. The thing is no matter what figure size I try or choose I don't get the 3rd feature (Part's Volume (cm^3)) for some reason. I get all other features though. Any suggestions? THANK YOU FOR YOUR HELP.

The code that I am using:

data_set = pd.read_excel("Correlation Analysis NEW.xlsx")
pd.set_option('max_columns', 35)
pd.set_option('max_rows', 300)
data_set.head(300)


Part's Z-Height (mm)    Part's Weight (N)   Part's Volume (cm^3)    Part's Solid Volume (cm^3)  Part's Surface Area (cm^2)  Material's Density (g/cm^3) Layer Height (mm)   Infill Density (%)  Nozzle/Printing Temperature (C) Platform Temperature (C)    Printing/Scanning Speed (mm/s)  Part's Orientation (Support's height) (mm)  Part's Orientation (Support's volume) (cm^3)
0   53.5773 0.225630    39.79   18.548387   155.57  1.24    0.1 20  210 60  50  35.310  4.919355
1   53.5773 0.225630    39.79   18.548387   155.57  1.24    0.2 20  210 60  50  35.310  4.516129
2   53.5773 0.234459    39.79   19.274194   155.57  1.24    0.3 20  210 60  50  20.201  3.870968
3   53.5773 0.233478    39.79   19.193548   155.57  1.24    0.4 20  210 60  50  20.201  3.870968
4   53.5773 0.235440    39.79   19.354839   155.57  1.24    0.6 20  210 60  50  35.310  3.951613
5   53.5773 0.225630    39.79   18.548387   155.57  1.24    0.1 20  210 60  40  35.310  4.919355
6   53.5773 0.290376    39.79   23.870968   155.57  1.24    0.1 40  210 60  50  35.310  4.919355
7   37.8376 0.224649    39.79   18.467742   155.57  1.24    0.1 20  210 60  50  18.381  3.790323
8   37.8376 0.224649    39.79   18.467742   155.57  1.24    0.2 20  210 60  50  18.381  3.709677
9   37.8376 0.235440    39.79   19.354839   155.57  1.24    0.3 20  210 60  50  18.381  3.629032
10  37.8376 0.235440    39.79   19.354839   155.57  1.24    0.4 20  210 60  50  18.381  3.548387
11  37.8376 0.234459    39.79   19.274194   155.57  1.24    0.6 20  210 60  50  18.381  3.467742
12  37.8376 0.224649    39.79   18.467742   155.57  1.24    0.1 20  210 60  40  18.381  3.790323
13  37.8376 0.289395    39.79   23.790323   155.57  1.24    0.1 40  210 60  50  18.381  3.790323
14  30.0253 0.224649    31.43   18.467742   169.94  1.24    0.1 20  210 60  50  12.484  6.532258
15  30.0253 0.224649    31.43   18.467742   169.94  1.24    0.1 20  210 60  40  12.484  6.532258
16  30.0253 0.224649    31.43   18.467742   169.94  1.24    0.1 20  210 60  45  12.484  6.532258
17  30.0253 0.224649    31.43   18.467742   169.94  1.24    0.1 20  210 60  53  12.484  6.532258
18  30.0253 0.224649    31.43   18.467742   169.94  1.24    0.1 20  210 60  55  12.484  6.532258
19  30.0253 0.386514    31.43   31.774194   169.94  1.24    0.1 100 210 60  50  12.484  6.532258
20  81.6440 0.215820    31.43   17.741935   169.94  1.24    0.1 20  210 60  50  63.289  8.870968
21  81.6440 0.215820    31.43   17.741935   169.94  1.24    0.1 20  210 60  40  63.289  8.870968
22  81.6440 0.215820    31.43   17.741935   169.94  1.24    0.1 20  210 60  45  63.289  8.870968
23  81.6440 0.215820    31.43   17.741935   169.94  1.24    0.1 20  210 60  53  63.289  8.870968



data_set.columns=["Part's Z-Height (mm)","Part's Weight (N)","Part's Volume (cm^3)","Part's Solid Volume (cm^3)","Part's Surface Area (cm^2)","Material's Density (g/cm^3)","Layer Height (mm)","Infill Density (%)","Nozzle/Printing Temperature (C)","Platform Temperature (C)","Printing/Scanning Speed (mm/s)","Part's Orientation (Support's height) (mm)","Part's Orientation (Support's volume) (cm^3)"]
#
# Correlation between different variables
#
corr = data_set.corr()
#
# Set up the matplotlib plot configuration
#
f, ax = plt.subplots(figsize=(12, 10))
#
# Configure a custom diverging colormap
#
cmap = sns.diverging_palette(230, 20, as_cmap=True)
#
# Draw the heatmap
#
sns.heatmap(corr, annot=True, cmap=cmap)


ax.set_title('Correlation Heat Map', weight='bold', fontsize = 10)
plt.show()
Z47
  • 67
  • 6
  • 1
    Did you try `data_set.describe()` to check whether all the columns make sense? – JohanC Aug 14 '22 at 15:07
  • @JohanC thank you. When I do that, the 3rd column/feature does not show up but I get the statistics for the other 12 columns, why? – Z47 Aug 14 '22 at 15:09
  • Maybe it is not completely numeric? Maybe it contains one or more strings? – JohanC Aug 14 '22 at 15:11
  • @JohanC I've checked my Excel file before posting the question, and made sure all the values are numeric under this column. Is there any other thing I should be checking/making sure of? – Z47 Aug 14 '22 at 15:13
  • Is it possible that the problem is with the line `f, ax = plt.subplots(figsize=(12, 10))` ? Try changing it to `f, ax = plt.subplots(figsize=(13, 10))` – joaopfg Aug 14 '22 at 15:15
  • @JohnDoe I've tried all the possible sizes, the problem is still there. Thanks. – Z47 Aug 14 '22 at 15:16
  • 1
    You should check your dataframe, not your excel file. – JohanC Aug 14 '22 at 15:16
  • @JohanC thanks, any ideas on how to do that? – Z47 Aug 14 '22 at 15:18
  • 1
    https://stackoverflow.com/questions/70791779/check-if-a-column-value-is-numeric-in-pandas-dataframe https://stackoverflow.com/questions/19900202/how-to-determine-whether-a-column-variable-is-numeric-or-not-in-pandas-numpy Also, `data_set.info()` would show the type of each column (type `object` means it would not completely be numeric. – JohanC Aug 14 '22 at 15:18
  • I see. thanks. I have some rows with NaN in that. – Z47 Aug 14 '22 at 15:24
  • The problem might be that those NaNs are strings instead of float NaNs. – JohanC Aug 14 '22 at 16:08

0 Answers0