5

I have a function which gets a data frame and one column and after some process, plot that column, as in the following line:

def plot_dist(df,col):
    ax=sns.countplot(x=col,data=df)

As i repeat this function for several dataframes, I'd like to have the dataframe name in the title of the plot, like this: "Distribution of col in dataframe df"

plt.title('Distribution of '+ col + 'in dataframe' + df.name );

Q: how to get dataframe name? According to here, one can write df.name='DFNAME' and then get the string by df.name. But then one has to define the name and I am not sure if it works in the loop. Thank you!

physiker
  • 889
  • 3
  • 16
  • 30
  • 1
    It's not clear what you mean in the last sentence. `df.name` is an attribute of the dataframe, and you can set the name with `df.name='NameOfDataFrame'` and then access it with `df.name`, which will return the string 'NameOfDataFrame' – G. Anderson Jan 10 '19 at 22:23
  • @G.Anderson I would like to get exactly the same name directly from the function input and NOT define it by hand, as it is stated in that link. – physiker Jan 10 '19 at 22:26
  • I apologize if I'm just not getting it, but I don't exactly understand the question. If you want to use a dataframe's name, you have to define it somehow. Maybe in your function definition you could add an optional argument? `def plot_dist(df,col, name='mydf'):`. If you don't want to set the name, you can't use the name but you can use any other string you like instead. Maybe an example of your desired behavior would help? When you invoke your function what do you want to happen, and what is the 'loop' you mention? – G. Anderson Jan 10 '19 at 22:35
  • 1
    @G.Anderson sorry if the question was not too clear. I appreciate your comments. I have several dataframes (e.g dfUS and dfEU) and I wanted just to print that name on top of the plot for practical/visual reason. When I read them, I wanted just to get the name from e.g dfUS=pd.read_csv (...). Please see below the accepted answer which works. – physiker Jan 11 '19 at 08:53

3 Answers3

10

I found nice function here: (Get the name of a pandas DataFrame)

def get_df_name(df):
    name =[x for x in globals() if globals()[x] is df][0]
    return name

It will help you.

def plot_dist(df,col):
    ax=sns.countplot(x=col,data=df)
    ax.set_title(get_df_name(df))
cors
  • 527
  • 4
  • 11
2

As a beginner I was amazed how difficult it was for experienced programmers to get what we try to achieve. In my case I just wanted to print the name and size of the dataframe. I hope this helps:

def dfshape(df):
    dfname =[x for x in globals() if globals()[x] is df][0]
    print("'"+str(dfname)+"'"+" dataframe shape is:"+str(df.shape))

This if very over-engineered compared to hard-coding the print statement, but does 2 in 1. Credits to cors for original solution.

erp_da
  • 199
  • 1
  • 8
1

The question of: What if this is imported from a module, the function that names the df is in the imported module and, the scope is in a file in a parent directory remains open.

@erp_da

This is a ridiculously over-engineered version that adds thousands separator.

def dfshape(df):
    dfname =[x for x in globals() if globals()[x] is df][0]
    print('Dataframe ['+str(dfname)+"]'s"+f" shape is: ({int(str(df.shape)[1:-1].split(', ')[0]):,})",
          f"({int(str(df.shape)[1:-1].split(', ')[1]):,})")

output:

Dataframe [NAME_OF_DATAFRAME]'s shape is: (2,942,528) (7)

Credits to cors for original solution.