0
import pandas as pd
import numpy as np
df = pd.DataFrame({ 
    'VipNo':np.repeat( range(3), 2 ),
    'DepartureDate':np.random.choice(pd.date_range('1/1/2020', periods=100, freq='D'), 6, replace=False),
    'OrderPrice':np.repeat( range(3), 2 ),
    'DaysofTrip':np.repeat( range(3), 2 ),
    'OrderDate': np.random.choice( pd.date_range('1/1/2020', periods=100, freq='D'), 6, replace=False)})
print(df)

I have a dataset that I imported as df. Now I want to do some calculations with some of the columns (and with groupby function). Because I grouped some of the data points, I have to create a new data frame which I called it df1.

data = {"CustomerNo": df["VipNo"].unique(),
        "HistoricalAmount" : df.groupby('VipNo')['DepartureDate'].count(),
        "DailyAveragePrice" : df.groupby('VipNo').apply(lambda x: x.OrderPrice.sum()/x.DaysofTrip.sum()),
        "Orderwithin90days" : df.groupby('VipNo').apply(df.OrderDate.between("2019-02-27","2019-03-31").astype(int))}

df1 = pd.DataFrame(data)
df1.reset_index(drop=True, inplace=True)

print(df1)

DailyAveragePrice works perfectly fine when I take it out separately like

DailyAveragePrice = df.groupby('VipNo').apply(lambda x: x.Orderprice.sum()/x.Days.sum())
print(DailyAveragePrice)

But when I put in the equation, it only creates an empty column named DailyAveragePrice (not even NaN is showed, just simply blank). (I'm not sure why it works in this small dataset I created here but it doesn't work in the real dataset I'm using.

Orderwithin90days always gives me a traceback "'Series' objects are mutable, thus they cannot be hashed".

The first two (CustomerNo and HistoricalAmount) work perfectly though.

So I am wondering if there is something wrong when I am using a dictionary to convert it to a dataframe I want. What is a good way to achieve my goal? I just want the new variables I created to be shown in a new dataframe. I mean I feel like it's not the most efficient way to put everything in a dictionary, because I have to keep the codes short. I have more variables to add on and some may require long calculations. Thank you so much in advance!

FEI
  • 37
  • 5
  • 1
    can you provide sample data? your data dictionary is calling a dataframe that we don't know what the data looks like for. Also, please provide expected output. Here is how to create a pandas question. https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples – David Erickson Jul 17 '20 at 04:19
  • yes! working on it – FEI Jul 17 '20 at 04:21
  • `df.OrderDate.between("2019-02-27","2019-03-31")`Does this specify the correct date and time? The date and time you are referring to is 2020 data. And the other data count in df1 is 3. – r-beginners Jul 17 '20 at 06:56

0 Answers0