-1

I have a dataset of over 10 million rows. Due to the limitations of excel, these data are split over 10 excel spreadsheets. Hence, in python, I'll have to read in all 10 of them individually (df1, df2, df3...df10) before concatinating them together.

I want to check that all 10 dataframes have the same number of columns before actually concatinating them. I could do a print(df1.shape) ten times. But is there a more elegant way to do this via a for loop? I also want to use the same for loop to make sure that all 10 dfs have the same column names. I tried the following and it didn't work (error: str object has no attribute shape):

for i in range(1,11):
x = "df"+str(i)
print(x.shape)
  • 1
    The error is obvious isn't it? You are trying to call the shape method on a string data type when its defined for a dataframe. Instead what you could do is maintain a list of the dataframes, loop through them and check using df.shape[1](where df is your dataframe name), as df.shape would return a tuple consisting of no. of rows followed by no. of columns. – RaphX Jun 28 '21 at 07:07
  • 2
    You should organize the dataframes as a list rather than give them individual names. – DYZ Jun 28 '21 at 07:07
  • Nk03's proposed answer seems like the most elegant way to me. There are multiple proposal for me to put the dataframes as a list...but what if I have 100 dataframes that I want to print the shape of? – user13399233 Jun 28 '21 at 07:29
  • 1
    Accepting answer means use not recommend way for splitting DataFrames - https://stackoverflow.com/questions/30635145/create-multiple-dataframes-in-loop/30638956#30638956 – jezrael Jun 28 '21 at 07:31

3 Answers3

3

Why not create list of DataFrames?

dfs = [df1, df2, ... df10]

for df in dfs:
    print(df.shape)
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
0

Here's one way:

for i in range(1,11):
    print(eval(f'df{i}').shape)
Nk03
  • 14,699
  • 2
  • 8
  • 22
0

Use this:

list_of_all_dataframes = [ HERE ARE THE NAMES OF YOUR DATAFRAMES ]

for each in list_of_all_dataframes:
    print(each.shape[0])

You must be having dataframe names to use shape function

So instead of making new names by deploying range(1,11) and concatenating that with str(df), use the names you used to upload your datasets as pandas dataframe

letdatado
  • 93
  • 1
  • 11
  • This is the same as what I've tried, with minor changes. It returns the same error as what I've had. – user13399233 Jun 28 '21 at 07:15
  • Kindly check that again. Its going to work this time. See, in order to use shape function, you should be having the data in dataframe form. Kindly check it and let me know if that worked – letdatado Jun 28 '21 at 07:17
  • To use shape, you should be having your data loaded in dataframes already. – letdatado Jun 28 '21 at 07:18