4

So I am writing an interface that will do some steps Data Analysis automatically. The code involves multiple screens that will be used for different functions. The first one I wrote is the EDA (exploratory data analysis) screen. The code works but will generate the report and then just close the Tk window.

In specific, when I generate the report, the code will generate the reports and then shut down. I haven't put the code generation in there because that doesn't use Tkinter. Also left out the other screens because they aren't problematic and I didn't want to dump the whole project here.

import HTMLCreator as sv
from tkinter import *
import tkinter.filedialog
import time
import pandas as pd
import Credentials as cred
import os
class EDAScreen(Page):
   
    def __init__(self, *args, **kwargs):

        Page.__init__(self, *args, **kwargs)
        self.fields = ['First X Column Name', 'Last X Column Name', 'First Y Column name','Last Y Column Name']
        self.entries = []
        self.df=""
        self.things=[]
        self.description = "Automate EDA for a Dataset"
        self.instructions = "Upload the CSV file with all the data"

        self.descriptionLabel = Label(self, text = self.description, font=("System", 13)).place(x = 140, y = 150)
        self.instructionsLabel = Label(self, text = self.instructions, font=("System", 13)).place(x = 190, y =180)
        self.genResultButton = Button(self, text = "Upload input file", font=("Arial", 18), command = self.open).place(x = 290, y = 230)
        self.edaButton=Button(self, text = "Generate Report", font=("Arial", 18), command = self.EDA).place(x = 290, y = 330)
       
        # root.config(background='gray')
        for ndex, field in enumerate(self.fields):
            Label(self, width=20, text=field, anchor='w').grid(row=ndex, column=0, sticky='ew')
            self.entries.append(Entry(self))
            self.entries[-1].grid(row=ndex, column=1, sticky='ew')


        Button(self, text='Set these Params', command=self.fetch).grid(row=len(self.fields)+1, column=1, sticky='ew')

    def fetch(self):
        for ndex, entry in enumerate(self.entries):
            print('{}: {}'.format(self.fields[ndex], entry.get()))
            self.things.append(entry.get())
    
    def EDA(self):
        sv.createHTML(self.df,self.things[:2],self.things[2:])

    
    # Get the prediction answer by searching for file
    def open(self):
        filename =  tkinter.filedialog.askopenfilename(parent=self,initialdir = "./",title = "Select file",filetypes = (("data files","*.csv"),("all files","*.*")))
        print(filename)
        self.df=pd.read_csv(filename)
        cols=self.df.columns.values
        print(cols)
        try:  
            os.mkdir("./Breakdowns")
        except :
            pass
        # print("Things: ",things)
       
        #iv.processImg(r,'jpg')

It seems to be a problem that happens when you don't have a main loop. I do however have one defined as:

if __name__ == "__main__":
    root = Toplevel()
    root.geometry("800x500")
    main = MainView(root)
    main.pack(side="top", fill="both", expand=True)
    root.mainloop()

The other important definitions are

class Page(Frame):

    def __init__(self, *args, **kwargs):
        Frame.__init__(self, *args, **kwargs)

    def show(self):
        self.lift()


class MainView(Frame):

    def __init__(self, *args, **kwargs):
        
        Frame.__init__(self, *args, **kwargs)
        
        introScreen = IntroScreen(self)
        edaScreen = EDAScreen(self)

        buttonFrame = Frame(self)
        container = Frame(self)
        buttonFrame.pack(side="top", fill="x", expand=False)
        container.pack(side="top", fill="both", expand=True)

        edaScreen.place(in_ = container, x = 0, y = 0, relwidth = 1, relheight = 1)
        introScreen.place(in_ = container, x = 0, y = 0, relwidth = 1, relheight = 1)
        
        introScreenButton = Button(buttonFrame, text = "Go to Intro Screen", 
                                command = introScreen.lift, width = 30, height = 2)
        edaScreenButton = Button(buttonFrame, text = "Understand your data in detail", 
                                command = edaScreen.lift, width = 30, height = 2)
        
        edaScreenButton.pack(side = "left")
        introScreenButton.pack(side = "left")
        
        introScreen.show()

class IntroScreen(Page):

    def __init__(self, *args, **kwargs):

        Page.__init__(self, *args, **kwargs)
        self.backgroundImage = PhotoImage(file = "What is PD.png") 
        #^^ replace with something describing how to use the tool
        backgroundLabel = Label(self, image = self.backgroundImage)
        backgroundLabel.place(x = 0, y = 0, relwidth = 1, relheight = 1)

EDIT: My createHTML is shown below. It has nothing to do with Tkinter (and works). It is in a separate file called HTMLCreator. The entire file is put there. I've added import statements so that you can run the code with no issues.

import pandas as pd
from pandas import Series
# from pygame import mixer # Load the required library
import seaborn as sns
import matplotlib.pyplot as plt
import glob
import sweetviz as sv

def createHTML(df,feature_cols,target_col):
    """
    Create the HTML reports for the dataset given the names
    Params:
    df: dataframe passed,
    feature_cols: The independent variables (what you input)
    target_col: The dependent vars (what you predict)
    """
    data= df
    ##TODO Set the y_all and target_cols in a way that they get Tk input
    y_all=data.loc[:, target_col[0]:target_col[1]]
    target_col_names = data.loc[:, target_col[0]:target_col[1]].columns.values
    for col in target_col_names:
        X_all = data.loc[:, feature_cols[0]:feature_cols[1]]
        # print(y_all[col])
        X_all[col]=y_all[col]
        # print(X_all[col])
        advert_report=""
        if(len(X_all.columns.values)>50):
            advert_report = sv.analyze(X_all,pairwise_analysis="off",target_feat=col)
        else:
            print("here")
            advert_report = sv.analyze(X_all,pairwise_analysis="on",target_feat=col)

        #display the report
        
        advert_report.show_html('Breakdowns/'+col+'.html')
    
    advert_report = sv.analyze(y_all,pairwise_analysis="on")
        #display the report
        
    advert_report.show_html('Breakdowns/Preds'+'.html')
pasha
  • 406
  • 1
  • 4
  • 17

3 Answers3

1

I did experience some problems right after an initial, partial generation of the report, but the Tk window, in my case, did not shutdown.

The error had to do with the following lines in your HTMLCreator file:

if(len(X_all.columns.values)>50):
    advert_report = sv.analyze(X_all,pairwise_analysis="off",target_feat=col)
else:
    print("here")
    advert_report = sv.analyze(X_all,pairwise_analysis="on",target_feat=col)

Sweetviz Only supports BOOLEAN and NUMERICAL features as targets for now, as per the documentation:

target_feat: A string representing the name of the feature to be marked as "target". Only BOOLEAN and NUMERICAL features can be targets for now.

So that by running the code as below, the error does not show up:

if(len(X_all.columns.values)>50):
    advert_report = sv.analyze(X_all,pairwise_analysis="off")
else:
    print("here")
    advert_report = sv.analyze(X_all,pairwise_analysis="on")

Yet, passing only a numerical column as a target still strangely threw an error, but you can use the FeatureConfig object as a work around to force the target of numerical columns, if the column is not of the numerical type, you still get an error.

feature_config = sv.FeatureConfig(force_num=col)
if(len(X_all.columns.values)>50):
    advert_report = sv.analyze(X_all, pairwise_analysis="off", feat_cfg=feature_config, target_feat=col)
else:
    print("here")
    advert_report = sv.analyze(X_all, pairwise_analysis="on", feat_cfg=feature_config, target_feat=col)
Samuel Kazeem
  • 787
  • 1
  • 8
  • 15
  • There is no error in the execution for me. Tkinter just closes for me – pasha Aug 31 '20 at 13:02
  • You will have to formulate your question in a way that others can reproduce the problem. – Samuel Kazeem Aug 31 '20 at 15:23
  • I don't understand. Is there something unclear about the question? The code is as is, so I don't understand why it's unclear. – pasha Aug 31 '20 at 16:04
  • I am not sure you are giving sufficient information to reproduce the problem. I am unable, with the information you gave, to reproduce the problem. for instance what do you mean by "It seems to be a problem that happens when you don't have a main loop"? Are you importing the whole code into another main file? – Samuel Kazeem Aug 31 '20 at 16:21
  • This is literally it. My whole code. The reason I said about that Mainloop is because that's what it said online. Do you think you can share the code you have? I'm starting to think it could be my system. – pasha Aug 31 '20 at 16:33
1

So I have discovered the reason after using other EDA tools like Pandas Profiling. Apparently all the ones based on Pandas Profiling just down TKinter (for some reason) after finishing. There's nothing that can be done.

pasha
  • 406
  • 1
  • 4
  • 17
0

The mainloop() has to go after the functions that are called (data analysis in your case).
Ex.

tk = Tk()
tk.title('Data Analysis')
data_analysis_func(args)
tk.mainloop()
PythonSnek
  • 542
  • 4
  • 21
  • When I put in the main (after MainView) isn't it after the others? – pasha Sep 01 '20 at 18:47
  • What is the point of: if __name__ == "__main__": ? – PythonSnek Sep 06 '20 at 01:42
  • https://stackoverflow.com/questions/419163/what-does-if-name-main-do#:~:text=In%20short%2C%20use%20this%20'%20if,when%20the%20module%20is%20imported.&text=Put%20simply%2C%20__name__,run%20as%20an%20imported%20module. – pasha Sep 07 '20 at 11:44