1
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

excel_file = "dataset.xlsx"

sheet0 = pd.read_excel(excel_file, 'Title Sheet')
sheet3 = pd.read_excel(excel_file, sheet_name='CustomerDemographic')
sheet4 = pd.read_excel(excel_file, sheet_name='CustomerAddress')
sheet1 = pd.read_excel(excel_file, sheet_name='Transactions')

customer_data = pd.concat([sheet3, sheet4, sheet1])

#TODO: Data calculations
pivot = sheet3.groupby(['customer_id']).mean()
bestCustomers = pivot.loc[:,"past_3_years_bike_related_purchases":"tenure"]

But it gives KeyError: 'customer_id' , where is the problem actually ?

Imran Rony
  • 51
  • 7
  • an example of the data would it make it possible to directly answer, in any case try sheet3.columns to see if it includes 'customer_id' – Ezer K Jul 25 '20 at 19:06
  • It's a multiple-sheet excel file as you saw 4 sheets in the code avobe. In sheet3 there are 13 columns and customer_id(numbers), tenure(numbers) & past_3_years_bike_related_prchases (numbers) are 3 of them – Imran Rony Jul 25 '20 at 19:11
  • You may have misspelled customer_id, check sheet 3.columns to see if customer_id exists – GGS Jul 25 '20 at 19:12
  • List sheet3.columns and check if customer_id has a leading or trailing white space – Farid Jafri Jul 25 '20 at 19:14
  • I found the error, sheet3.columns shows unnamed:1, unnamed: 2.... etc column. So how can I groupby them ? – Imran Rony Jul 25 '20 at 19:18
  • @FaridJafri I think not, I copied directly from the file. Maybe read_excel can't find the column name because the first row of the sheet3 is not column names, some other comments there in a line ! – Imran Rony Jul 25 '20 at 19:27
  • @Al-Imran yes you can open and check the file if it has column names. By default it reads the 0th row of excel as headers. Also check https://stackoverflow.com/questions/44734613/pandas-returning-the-unnamed-columns. – Farid Jafri Jul 25 '20 at 19:30
  • I would rename the columns to be safe... or find the column that should be titled 'customer_id' and use that column name in the `groupby` – RichieV Jul 25 '20 at 19:37
  • Thanks to you guys ! **I found the solution by using skiprows=1 when read_excel it , as there is no values on the first row of the file** – Imran Rony Jul 25 '20 at 19:46

0 Answers0