I'm new to Pandas and I'd like to know what I'm doing wrong in the following example.
I found an example here explaining how to get a data frame after applying a group by instead of a series.
df1 = pd.DataFrame( {
"Name" : ["Alice", "Bob", "Mallory", "Mallory", "Bob" , "Mallory"] ,
"City" : ["Seattle", "Seattle", "Baires", "Caracas", "Baires", "Caracas"] })
df1['size'] = df1.groupby(['City']).transform(np.size)
df1.dtypes #Why is size an object? shouldn't it be an integer?
df1[['size']] = df1[['size']].astype(int) #convert to integer
df1['avera'] = df1.groupby(['City'])['size'].transform(np.mean) #group by again
Basically, I want to apply the same transformation to a huge data set I'm working on now, but I'm getting an error message:
budgetbid['meanpb']=budgetbid.groupby(['jobid'])['probudget'].transform(np.mean) #can't upload this data for the sake of explanation
ValueError: Length mismatch: Expected axis has 5564 elements, new values have 78421 elements
Thus, my questions are:
- How can I overcome this error?
- Why do I get an object type when apply group by with size instead of an integer type?
Let us say that I want to get a data frame from
df1
with unique cities and their respectivecount(*)
. I know I can do something likenewdf=df1.groupby(['City']).size()
Unfortunately, this is a series, but I want a data frame with two columns, City
and the brand new variable, let's say countcity
. How can I get a data frame from a group-by operation like the one in this example?
- Could you give me an example of a
select distinct
equivalence here in pandas?