Calculation on columns using pandas groupby

Question

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

excel_file = "dataset.xlsx"

sheet0 = pd.read_excel(excel_file, 'Title Sheet')
sheet3 = pd.read_excel(excel_file, sheet_name='CustomerDemographic')
sheet4 = pd.read_excel(excel_file, sheet_name='CustomerAddress')
sheet1 = pd.read_excel(excel_file, sheet_name='Transactions')

customer_data = pd.concat([sheet3, sheet4, sheet1])

#TODO: Data calculations
pivot = sheet3.groupby(['customer_id']).mean()
bestCustomers = pivot.loc[:,"past_3_years_bike_related_purchases":"tenure"]

But it gives KeyError: 'customer_id' , where is the problem actually ?

an example of the data would it make it possible to directly answer, in any case try sheet3.columns to see if it includes 'customer_id' — Ezer K, Jul 25 '20 at 19:06
It's a multiple-sheet excel file as you saw 4 sheets in the code avobe. In sheet3 there are 13 columns and customer_id(numbers), tenure(numbers) & past_3_years_bike_related_prchases (numbers) are 3 of them — Imran Rony, Jul 25 '20 at 19:11
You may have misspelled customer_id, check sheet 3.columns to see if customer_id exists — GGS, Jul 25 '20 at 19:12
List sheet3.columns and check if customer_id has a leading or trailing white space — Farid Jafri, Jul 25 '20 at 19:14
I found the error, sheet3.columns shows unnamed:1, unnamed: 2.... etc column. So how can I groupby them ? — Imran Rony, Jul 25 '20 at 19:18
@FaridJafri I think not, I copied directly from the file. Maybe read_excel can't find the column name because the first row of the sheet3 is not column names, some other comments there in a line ! — Imran Rony, Jul 25 '20 at 19:27
@Al-Imran yes you can open and check the file if it has column names. By default it reads the 0th row of excel as headers. Also check https://stackoverflow.com/questions/44734613/pandas-returning-the-unnamed-columns. — Farid Jafri, Jul 25 '20 at 19:30
I would rename the columns to be safe... or find the column that should be titled 'customer_id' and use that column name in the `groupby` — RichieV, Jul 25 '20 at 19:37
Thanks to you guys ! **I found the solution by using skiprows=1 when read_excel it , as there is no values on the first row of the file** — Imran Rony, Jul 25 '20 at 19:46

Calculation on columns using pandas groupby

0 Answers0