1

I have 3 data-frames:

d1 = {'col1': [1, 2], 'col2': [3, 4]}
d2 = {'col1': [1,2,3], 'col2': [3,4,5]}
d3 = {'col1': [1,2,3,4,5], 'col2': [3,4,5,6,7]}
df1 = pd.DataFrame(data=d1)
df2 = pd.DataFrame(data=d2)
df3 = pd.DataFrame(data=d3)

Now i'm trying to count the amount of rows and columns of these 3 data-frames and place it in a new data-frame named my_dataframe. This is the code I used:

dataframes = [df1, df2, df3]
number_rows = [df.shape[0] for df in dataframes]
number_columns = [df.shape[1] for df in dataframes]

my_data = {'df': dataframes, 'rows': number_rows, 'columns': number_columns}

my_dataframe = pd.DataFrame(my_data)

print(my_dataframe)

This is my output:

enter image description here

This is my expected output:

    df   -   rows   -   columns      
0   df1  -   2      -   2
1   df2  -   3      -   2
2   df3  -   5      -   2

Can someone explain me what went wrong and how I can fix this? Thank you all.

NorthAfrican
  • 135
  • 2
  • 10

3 Answers3

1

In the line where you define the data to be inserted into my_data, you are inadvertently inserting the original dataframes themselves rather than their names.

my_data = {'df': dataframes, 'rows': number_rows, 'columns': number_columns}

Instead define df_names = ['df1', 'df2', 'df3'] and use this as value in my_data in the place of dataframes.

I don't think there is a nice, in-built way in Pandas to get the name of a dataframe. (I could be wrong, though.)

navneethc
  • 1,234
  • 8
  • 17
  • @naveethc. If I do that I get `AttributeError: 'str' object has no attribute 'shape'` – NorthAfrican Sep 10 '20 at 10:39
  • Oh yeah, sorry, my bad. Perhaps define a separate variable to insert into the final dataframe rather than the one you use to iterate over. I will amend by answer accordingly. – navneethc Sep 10 '20 at 10:41
1

Better is use dicts:

dataframes = {'df1': df1, 'df2':df2, 'df3':df3}

number_rows = [df.shape[0] for k, df in dataframes.items()]
number_columns = [df.shape[1] for k, df in dataframes.items()]
names = list(dataframes.keys())


my_data = {'df': names, 'rows': number_rows, 'columns': number_columns}

my_dataframe = pd.DataFrame(my_data)

print(my_dataframe)
    df  rows  columns
0  df1     2        2
1  df2     3        2
2  df3     5        2

Or:

dataframes = {'df1': df1, 'df2':df2, 'df3':df3}

my_dataframe = pd.DataFrame([(k, df.shape[0], df.shape[1]) for k, df in dataframes.items()],
                            columns=['df','rows','columns'])

print(my_dataframe)
    df  rows  columns
0  df1     2        2
1  df2     3        2
2  df3     5        2

It is possible, but need inspect lib for this:

dataframes = [df1, df2, df3]

import inspect

#https://stackoverflow.com/a/40536047
def retrieve_name(var):
        """
        Gets the name of var. Does it from the out most frame inner-wards.
        :param var: variable to get name from.
        :return: string
        """
        for fi in reversed(inspect.stack()):
            names = [var_name for var_name, var_val in fi.frame.f_locals.items() if var_val is var]
            if len(names) > 0:
                return names[0]

number_rows = [df.shape[0] for df in dataframes]
number_columns = [df.shape[1] for df in dataframes]
names = [retrieve_name(x) for x in dataframes]

my_data = {'df': names, 'rows': number_rows, 'columns': number_columns}

my_dataframe = pd.DataFrame(my_data)
print(my_dataframe)
    df  rows  columns
0  df1     2        2
1  df2     3        2
2  df3     5        2
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
1

You can try:

d = pd.DataFrame([{'df': k, 'rows': v.shape[0], 'cols': v.shape[1]}
                  for k, v in zip(('df1', 'df2', 'df2'), (df1, df2, df3))])

print(d)

    df  rows  cols
0  df1     2     2
1  df2     3     2
2  df2     5     2
Shubham Sharma
  • 68,127
  • 6
  • 24
  • 53