0

I've started using pandas this week so pardon me if I'm asking a really obvious question. I am trying to get pandas to loop through all the sheets in my Excel wb to extract a specific column, column F for counts of occurances via value_counts.

Previously I used value_counts and typed in the specific name of the worksheet and it worked in pulling out the value count of that 1 sheet. However, the minute I replaced the sheet name to df1, it stops working.

df = pd.ExcelFile("filepath.xlsx")
for df1 in df.sheet_names:
    df2 = pd.read_excel("filepath.xlsx", sheet_name=df1, usecols="F")
    df2.dropna(inplace=False)
    print (df2.value_counts())

I expect the output to be the unique values and their occurance numbers but it returns:

AttributeError: 'DataFrame' object has no attribute 'value_counts'

Can somebody please help me?

  • 1
    The [documentation](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.isin.html) mentions this function for a `Series`. If you want to get it for multiple columns, use [`apply`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html). [*Some discussions about*](https://stackoverflow.com/questions/32589829/how-to-get-value-counts-for-multiple-columns-at-once-in-pandas-dataframe) – Alexandre B. Aug 04 '19 at 18:23
  • @Trenton_M Hi! does this solution work for specific columns in all the excel sheets in the workbook? – MeHasTooManyQueries... Aug 04 '19 at 19:20
  • replace `df2.value_counts()` with `df2.apply(pd.Series.value_counts)` inside the loop – Trenton McKinney Aug 04 '19 at 19:27
  • @Trenton_M tks man! – MeHasTooManyQueries... Aug 04 '19 at 19:42

1 Answers1

0

You have a loop running on each sheet in the source Excel file. It is OK, but you should apply value_counts separately to each DataFrame (read from each sheet).

And a remark concerning variable names: Use df for DataFrames only. Neither Excel file nor sheet name are DataFrames.

So the loop should be somenthing like:

exFile = pd.ExcelFile("filepath.xlsx")
for sheet in exFile.sheet_names:
    df = pd.read_excel("filepath.xlsx", sheet_name=sheet, usecols="F")
    df.dropna(inplace=True)
    print('Sheet', sheet)
    print(df.apply(pd.Series.value_counts))
Valdi_Bo
  • 30,023
  • 4
  • 23
  • 41