What kind of analysis can I run for this large excel file dataset?

Question

I work for a company that sells pails to customers. I am doing this for a class project and I'm not the best when it comes to Python. The file lists thousands of customers who purchased different types of pails from 2015-2019. My Finance Department would like to know who was returning them vs who wasn’t and at what rate?

I managed to use python to only use selected columns (see this data) that were relevant to my analysis. Once I exported to csv file, what kind of analysis should I run in Python to help answer my question? I've already did a pivottable and chart (to count how many customers return pails and what year) but want to use Python to make things simpler to read and analyze so I can answer the question.

import pandas as pd

data = pd.read_excel (r'C:\Users\Vilma\Documents\CIS450\Inidividual project\ContainerTracker.xlsx',
sheet_name='2015-2019') 
df = pd.DataFrame(data, columns= ['Customer for Tracking::CustomerName',
          'Customer for Tracking::CustomerID',
          'Order for Tracking::OrderDate',
          'Products for Tracking::ProdName',
          'Transaction Items for Tracking::Description',
          'RemovalNote',
          'RemovalDate',
          'OrderID'])


df.to_csv(r'C:\Users\Vilma\Documents\CIS450\Inidividual project\ContainerTrackerTrimmed.csv',index = False)
print (df)

Hello please read [this](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) article on how to post a reproducible Python pandas question. — Ukrainian-serge, Mar 01 '20 at 05:00

score 0 · Answer 1 · answered Mar 01 '20 at 05:16

Please read my comment about posting a reproducible question with examples that we may copy and paste and work on to come up with a solution.

That being said, and if I understand you correctly, I believe a .groupby() function may assist in your cause:

# fill your black entries with something like No Return

df = df[df['RemovalNote'] == ''] = 'No Return' # fill empty rows

df1 = df.groupby('Customer for Tracking::CustomerName')['RemovalNote'].value_counts()

print(df1)

I hope this helps even though your question was very ambiguous.

What kind of analysis can I run for this large excel file dataset?

1 Answers1