0

* Please help it's very important: Why is not possible to get subplots of cloumns of Pandas dataframe by using HeatMap inside of for-loop?

I am trying to create subplots of columns in pandas dataframe inside of for-loop during iterations since I plot result for every cycle that is for each 480 values to get all 3 subplots belong to A, B, C side by side in one window. I've found only one answer here which I'm afraid is not my case! @euri10 answered by using flat.

My scripts are following:

# Import and call the needed libraries
import numpy as np
import pandas as pd
import os
import seaborn as sns
import matplotlib.pyplot as plt


'''
Take a list and create the formatted matrix
'''
def mkdf(ListOf480Numbers):
    normalMatrix = np.array_split(ListOf480Numbers,8)     #Take a list and create 8 array (Sections)
    fixMatrix = []
    for i in range(8):
        lines = np.array_split(normalMatrix[i],6)         #Split each section in lines (each line contains 10 cells from 0-9)
        newMatrix = [0,0,0,0,0,0]                         #Empty array to contain reordered lines
        for j in (1,3,5):
            newMatrix[j] = lines[j]                       #lines 1,3,5 remain equal
        for j in (0,2,4):
            newMatrix[j] = lines[j][::-1]                 #lines 2,4,6 are inverted
        fixMatrix.append(newMatrix)                 #After last update of format of table inverted (bottom-up zig-zag)
    return fixMatrix

'''
Print the matrix with the required format
'''
def print_df(fixMatrix):
    values = []
    for i in range(6):
        values.append([*fixMatrix[4][i], *fixMatrix[7][i]])  #lines form section 6 and 7 are side by side
    for i in range(6):
        values.append([*fixMatrix[5][i], *fixMatrix[6][i]])  #lines form section 4 and 5 are side by side
    for i in range(6):
        values.append([*fixMatrix[1][i], *fixMatrix[2][i]])  #lines form section 2 and 3 are side by side
    for i in range(6):
        values.append([*fixMatrix[0][i], *fixMatrix[3][i]])  #lines form section 0 and 1 are side by side
    df = pd.DataFrame(values)
    return (df)

'''
Normalizing Formula
'''

def normalize(value, min_value, max_value, min_norm, max_norm):
    new_value = ((max_norm - min_norm)*((value - min_value)/(max_value - min_value))) + min_norm
    return new_value

'''
Split data in three different lists A, B and C
'''

dft = pd.read_csv('D:\me4.TXT', header=None)
id_set = dft[dft.index % 4 == 0].astype('int').values
A = dft[dft.index % 4 == 1].values
B = dft[dft.index % 4 == 2].values
C = dft[dft.index % 4 == 3].values
data = {'A': A[:,0], 'B': B[:,0], 'C': C[:,0]}
#df contains all the data
df = pd.DataFrame(data, columns=['A','B','C'], index = id_set[:,0])  


'''
Data generation phase

'''

#next iteration create all plots, change the number of cycles
cycles = int(len(df)/480)
print(cycles)
for i in df:
    try:
        os.mkdir(i)
    except:
        pass
    min_val = df[i].min()
    min_nor = -1
    max_val = df[i].max()
    max_nor = 1
    for cycle in range(1):             #iterate thriugh all cycles range(1) by ====> range(int(len(df)/480))
        count =  '{:04}'.format(cycle)
        j = cycle * 480
        ordered_data = mkdf(df.iloc[j:j+480][i])
        csv = print_df(ordered_data)
        #Print .csv files contains matrix of each parameters by name of cycles respectively
        csv.to_csv(f'{i}/{i}{count}.csv', header=None, index=None)            
        if 'C' in i:
            min_nor = -40
            max_nor = 150
            #Applying normalization for C between [-40,+150]
            new_value3 = normalize(df['C'].iloc[j:j+480][i].values, min_val, max_val, -40, 150)
            n_cbar_kws = {"ticks":[-40,150,-20,0,25,50,75,100,125]}
            df3 = print_df(mkdf(new_value3))
        else:
            #Applying normalizayion for A,B between    [-1,+1]
            new_value1 = normalize(df['A'].iloc[j:j+480][i].values, min_val, max_val, -1, 1)
            new_value2 = normalize(df['B'].iloc[j:j+480][i].values, min_val, max_val, -1, 1)
            n_cbar_kws = {"ticks":[-1.0,-0.75,-0.50,-0.25,0.00,0.25,0.50,0.75,1.0]}
        df1 = print_df(mkdf(new_value1))
        df2 = print_df(mkdf(new_value2))    

        #Plotting parameters by using HeatMap
        plt.figure()
        sns.heatmap(df, vmin=min_nor, vmax=max_nor, cmap ='coolwarm', cbar_kws=n_cbar_kws)                             
        plt.title(i, fontsize=12, color='black', loc='left', style='italic')
        plt.axis('off')
        #Print .PNG images contains HeatMap plots of each parameters by name of cycles respectively
        plt.savefig(f'{i}/{i}{count}.png')  



        #plotting all columns ['A','B','C'] in-one-window side by side


        fig, axes = plt.subplots(nrows=1, ncols=3 , figsize=(20,10))

        plt.subplot(131)
        sns.heatmap(df1, vmin=-1, vmax=1, cmap ="coolwarm", linewidths=.75 , linecolor='black', cbar=True , cbar_kws={"ticks":[-1.0,-0.75,-0.5,-0.25,0.00,0.25,0.5,0.75,1.0]})
        fig.axes[-1].set_ylabel('[MPa]', size=20) #cbar_kws={'label': 'Celsius'}
        plt.title('A', fontsize=12, color='black', loc='left', style='italic')
        plt.axis('off')

        plt.subplot(132)
        sns.heatmap(df2, vmin=-1, vmax=1, cmap ="coolwarm", cbar=True , cbar_kws={"ticks":[-1.0,-0.75,-0.5,-0.25,0.00,0.25,0.5,0.75,1.0]})
        fig.axes[-1].set_ylabel('[Mpa]', size=20) #cbar_kws={'label': 'Celsius'}
        #sns.despine(left=True)
        plt.title('B', fontsize=12, color='black', loc='left', style='italic')
        plt.axis('off')

        plt.subplot(133)
        sns.heatmap(df3, vmin=-40, vmax=150, cmap ="coolwarm" , cbar=True , cbar_kws={"ticks":[-40,150,-20,0,25,50,75,100,125]}) 
        fig.axes[-1].set_ylabel('[°C]', size=20) #cbar_kws={'label': 'Celsius'}
        #sns.despine(left=True)
        plt.title('C', fontsize=12, color='black', loc='left', style='italic')
        plt.axis('off')


        plt.suptitle(f'Analysis of data in cycle Nr.: {count}', color='yellow', backgroundcolor='black', fontsize=48, fontweight='bold')
        plt.subplots_adjust(top=0.7, bottom=0.3, left=0.05, right=0.95, hspace=0.2, wspace=0.2)
        #plt.subplot_tool()
        plt.savefig(f'{i}/{i}{i}{count}.png') 
        plt.show()

So far I couldn't get proper output due to in each cycle it prints plot each of them 3 times in different intervals eg. it prints 'A' left then again it prints 'A' under the name of 'B' and 'C' in middle and right in-one-window. Again it prints 'B' 3-times instead once and put it middle and in the end it prints 'C' 3-times instead of once and put in right side it put in middle and left!

Target is to catch subplots of all 3 columns A,B & C in one-window for each cycle (every 480-values by 480-values) in main for-loop!

1st cycle : 0000 -----> subplots of A,B,C ----> Store it as 0000.png

2nd cycle : 0001 -----> subplots of A,B,C ----> Store it as 0001.png ...

Problem is usage of df inside of for-loop and it passes values of A or B or C 3 times while it should pass it values belong to each column once respectively I provide a picture of unsuccessful output here so that you could see exactly where the problem is clearly

my desired output is below:

picture

I also provide sample text file of dataset for 3 cycles: dataset

Scott Boston
  • 147,308
  • 15
  • 139
  • 187
Mario
  • 1,631
  • 2
  • 21
  • 51
  • I don't quite see why you couldn't use the linked solution here. Here, for debugging I would start by not calling the dataframe both inside and outside the loop `df` and not calling `sns.heatmap(df)` three times. After all, you want to plot *different* dataframes. – ImportanceOfBeingErnest Jan 21 '19 at 02:37
  • @IOBE dear I did outside of for-loop everything is OK but when I use inside of for-loop which is necessary to get plot 480-values by 480-values and iterates through all columns A, B, C I've realized that I could get plot individuality while I couldn't get all subplots side-by-side. It's so important for to catch all 3 in one window so that I can follow their correlations of behavior of those 3 parameters A, B, C. – Mario Jan 21 '19 at 03:06
  • I can help you if you provide a [mcve]. If not, I can refer you to my first comment. – ImportanceOfBeingErnest Jan 21 '19 at 03:18
  • @ImportanceOfBeingErnest As you can see I updated post by highlighting Target and Problem I've faced also I provided the picture of **unsuccessful output** and also **desired output** so that I can transfer my problem and idea clearly. If you just run my code on dataset which is text file you'll see that. Just I need subplots in-one-window for each cycle(480values). but since I've used `sns.heatmap(df)` due to I might limit `df` by using `df.plot(column='A', ax=axes[0,0])`, so on but I couldn't fix it yet. I used different methods to catch all subplots but i confess that I'm stuck by that! – Mario Jan 21 '19 at 04:04
  • You are using the same dataframe for all subplots, `sns.heatmap(df)`, `sns.heatmap(df)`, `sns.heatmap(df)`. Instead you need to use a different dataframe in each subplot: `sns.heatmap(df1)`, `sns.heatmap(df2)`, `sns.heatmap(df3)`. – ImportanceOfBeingErnest Jan 21 '19 at 11:44
  • @ImportanceOfBeingErnest I already did unsuccessfully by : `new_value3 = normalize(df['C'].iloc[j:j+480][i].values, min_val, max_val, -40, 150)` `df3 = print_df(mkdf(new_value3))` `new_value1 = normalize(df['A'].iloc[j:j+480][i].values, min_val, max_val, -1, 1)` `new_value2 = normalize(df['B'].iloc[j:j+480][i].values, min_val, max_val, -1, 1)` `df1 = print_df(mkdf(new_value1))` ` df2 = print_df(mkdf(new_value2))` in the end `sns.heatmap(df1)` `sns.heatmap(df2)` `sns.heatmap(df3)` but I face **keyError** I updated on post what i did unsuccessfully as well. I feel so miserable ! – Mario Jan 21 '19 at 12:33

1 Answers1

0

So after looking at your code and and your requirements I think I know what the problem is. Your for loops are in the wrong order. You want a new figure for each cycle, containing each 'A', 'B' and 'C' as subplots.

This means your outer loop should go over the cycles and then your inner loop over i, whereas your indentation and order of the loops makes you trying to plot all 'A','B','C'subplots already on your first loop through i (i='A', cycle=1) and not after your first loop through the first cycle, with all i (i='A','B','C', cycle=1).

This is also why you get the problem (as mentioned in your comment on this answer ) of not defining df3. The definition of df3 ist in an if block checking if 'C' in i, on your first loop through, this condition is not met and therefore df3 is not defined, but you are still trying to plot it!

Also you got the same problem as in your other question with the NaN/inf values again.

Rearraning the for loops and the indentation and cleaning up the NaN/inf values gets you the following code:

#...
#df contains all the data
df = pd.DataFrame(data, columns=['A','B','C'], index = id_set[:,0])  
df = df.replace(np.inf, np.nan)
df = df.fillna(0)

'''
Data generation phase

'''

#next iteration create all plots, change the number of cycles
cycles = int(len(df)/480)
print(cycles)
for cycle in range(cycles):             #iterate thriugh all cycles range(1) by ====> range(int(len(df)/480))
    count =  '{:04}'.format(cycle)
    j = cycle * 480
    for i in df:
        try:
            os.mkdir(i)
        except:
            pass

        min_val = df[i].min()
        min_nor = -1
        max_val = df[i].max()
        max_nor = 1

        ordered_data = mkdf(df.iloc[j:j+480][i])
        csv = print_df(ordered_data)
        #Print .csv files contains matrix of each parameters by name of cycles respectively
        csv.to_csv(f'{i}/{i}{count}.csv', header=None, index=None)            
        if 'C' in i:
            min_nor = -40
            max_nor = 150
            #Applying normalization for C between [-40,+150]
            new_value3 = normalize(df['C'].iloc[j:j+480], min_val, max_val, -40, 150)
            n_cbar_kws = {"ticks":[-40,150,-20,0,25,50,75,100,125]}
            df3 = print_df(mkdf(new_value3))
        else:
            #Applying normalizayion for A,B between    [-1,+1]
            new_value1 = normalize(df['A'].iloc[j:j+480], min_val, max_val, -1, 1)
            new_value2 = normalize(df['B'].iloc[j:j+480], min_val, max_val, -1, 1)
            n_cbar_kws = {"ticks":[-1.0,-0.75,-0.50,-0.25,0.00,0.25,0.50,0.75,1.0]}
            df1 = print_df(mkdf(new_value1))
            df2 = print_df(mkdf(new_value2))    

    #        #Plotting parameters by using HeatMap
    #        plt.figure()
    #        sns.heatmap(df, vmin=min_nor, vmax=max_nor, cmap ='coolwarm', cbar_kws=n_cbar_kws)                             
    #        plt.title(i, fontsize=12, color='black', loc='left', style='italic')
    #        plt.axis('off')
    #        #Print .PNG images contains HeatMap plots of each parameters by name of cycles respectively
    #        plt.savefig(f'{i}/{i}{count}.png')  


    #plotting all columns ['A','B','C'] in-one-window side by side
    fig, axes = plt.subplots(nrows=1, ncols=3 , figsize=(20,10))

    plt.subplot(131)
    sns.heatmap(df1, vmin=-1, vmax=1, cmap ="coolwarm", linewidths=.75 , linecolor='black', cbar=True , cbar_kws={"ticks":[-1.0,-0.75,-0.5,-0.25,0.00,0.25,0.5,0.75,1.0]})
    fig.axes[-1].set_ylabel('[MPa]', size=20) #cbar_kws={'label': 'Celsius'}
    plt.title('A', fontsize=12, color='black', loc='left', style='italic')
    plt.axis('off')

    plt.subplot(132)
    sns.heatmap(df2, vmin=-1, vmax=1, cmap ="coolwarm", cbar=True , cbar_kws={"ticks":[-1.0,-0.75,-0.5,-0.25,0.00,0.25,0.5,0.75,1.0]})
    fig.axes[-1].set_ylabel('[Mpa]', size=20) #cbar_kws={'label': 'Celsius'}
    #sns.despine(left=True)
    plt.title('B', fontsize=12, color='black', loc='left', style='italic')
    plt.axis('off')

    plt.subplot(133)
    sns.heatmap(df3, vmin=-40, vmax=150, cmap ="coolwarm" , cbar=True , cbar_kws={"ticks":[-40,150,-20,0,25,50,75,100,125]}) 
    fig.axes[-1].set_ylabel('[°C]', size=20) #cbar_kws={'label': 'Celsius'}
    #sns.despine(left=True)
    plt.title('C', fontsize=12, color='black', loc='left', style='italic')
    plt.axis('off')


    plt.suptitle(f'Analysis of data in cycle Nr.: {count}', color='yellow', backgroundcolor='black', fontsize=48, fontweight='bold')
    plt.subplots_adjust(top=0.7, bottom=0.3, left=0.05, right=0.95, hspace=0.2, wspace=0.2)
    #plt.subplot_tool()
    plt.savefig(f'{i}/{i}{i}{count}.png') 
    plt.show()

This gets you the following three images as three seperate figures with the data you provided:

Figure 1, Figure 2, Figure 3

Generally speaking, your code is quite messy. I get it, if you're new to programming and just want to analyse your data, you do whatever works, doesn't matter if it is pretty.

However, I think that the messy code means you cant properly look at the underlying logic of your script, which is how you got this problem.

I would recommend if you get a problem like that again to write out some 'pseudo code' with all of the loops and try to think about what you are trying to accomplish in each loop.

Freya W
  • 487
  • 3
  • 11
  • @Ferya man I've been amazed by your troubleshooting skills. It would be great if you have a look to my another question which is important for me [here](https://stackoverflow.com/questions/54330676/normalization-by-using-2-times-gaussian-function-on-negative-and-positive-number) regarding **normalization** I used by using **gaussian function** for **negative** and **positive** numbers of each columns in `dataframe`. – Mario Jan 23 '19 at 15:44
  • BTW do you have any idea how I can use **csv** file instead of **txt** file as dataframe fits to above-mentioned solution like: `dft = pd.read_csv('D:\me24.csv', columns=['A','B','C'], index = id_set[:,0])` would you show me how should I define indexes and so on in case that I had csv file instead txt file I can make dataframe and get subplots quickly, thanks a zillion – Mario Jan 23 '19 at 15:46
  • Do you know how can I smooth the picture which is mentioned [here](https://stackoverflow.com/questions/50086566/smoothen-heatmap-in-plotly) but I don't know could be my case or not? by using `zsmooth`. i was wondering if I can apply this mask on subplots and get better ones: `data = [go.Heatmap(z=[[1, 20, 30], [20, 1, 60], [30, 60, 1]], zsmooth = 'best')]` – Mario Jan 23 '19 at 16:26
  • @Mario, taking a look at your other question, they are always quite extensive. It would be good if, using some debugging and really reviewing you code logic you could break it down to a more minimalistic code example which isn't working for you. For example the smoothing question. Using some sample data, could you write a small example (let's say 30 lines of codes max.) where you try to smooth your data, but it isn't working? Or what exactly isn't working when you read in a csv file (which is exactly what `read_csv` was made for...) – Freya W Jan 24 '19 at 09:42
  • Hi Freya , man honestly as I mentioned in **smoothing** question, I don't have any clue to use that mask matrix since I used `seaborn` but the example was addressed in answer of that post was related to `go.heatmap` but regarding my question about csv was how can define indexes and so on. Assume my dataset instead of `txt` is `csv` then should I define indexes and A, B, C as column in same way? `df = pd.read_csv('myfile.csv')` `A = dtt[dft.index].values`,..., `df = pd.DataFrame(data, columns=['A','B','C'], index = id_set[:,0])` even though it's already like what I want inside of `csv` file? – Mario Jan 24 '19 at 18:34
  • @Mario I don't know what your csv file looks like, but if the data is already structured like you want it to I don't know why you would redefine it. May I ask how you debug your code at the moment? Do you use an IDE or do you start your scripty from the console? Because you might really benefit from looking at how to debug your code to figure some things out for yourself. For example, I use spyder and I find it very helpful to have the Variable explorer and also the IPython console to just try things out with variables from the script. – Freya W Jan 25 '19 at 09:13