1

I have a dataframe which contains rows of information about different wines, including their variety. I would like to append a new column to this dataframe which counts how many times that variety has been seen in the dataframe up until that point.

Sample data:         Expected output:
wine entries         ordered counts                               

pinot noir           1
chardonnay           1
malbec               1
malbec               2
pinot noir           2
champagne            1
malbec               3

Hopefully this is clear. It feels like it should be straight forward using a groupby and a loop but I just can't make it work!

Edit 1: My attempt

import numpy as np
import pandas as pd

df = pd.read_csv('wine_tiny.csv')
index = range(df.shape[0])

#create df to hold new column of wine number within variety group
df_w= pd.DataFrame(0,index=index, columns=['w_index'])

#how many varieties are present in the data
unique_variety_codes = df.variety_code.unique()
n_varieties = len(unique_variety_codes)

#initialise a array to store the numbers / variety
counts = [0] * (n_varieties)

for row_number, row in enumerate(df.itertuples(),1):

#code breaks at last line, hack to make it work
if (row_number == 5614):
    break
v_code = row.variety_code
# increment the counter for the variety, note both variety_code and array are zero-indexed
counts[v_code ] += 1

#set the column value equal to the current count
#print("before: " + str(df_w.loc[row_number, 'w_index']))
df_w.loc[row_number, 'w_index'] =counts[v_code]
#print(" after " + str(df_w.loc[row_number, 'w_index']))

#now have a complete df_w, one column, containing a wine's placing within its variety group
print(df_w.tail(10))
print(df.tail(10))
#check if the two dataframes have the same no. rows
print("nrows df: " + str(df.shape[0]))
print("nrows df_w: " + str(df_w.shape[0]))

#this doesn't work - varieties getting more than one of each count
result = pd.merge(df, df_w, left_index=True, right_index=True)

I believe this is similar to the answer here: How to add sequential counter column on groups using Pandas groupby

But how do I map the counts produced onto the original dataframe so they are no longer grouped?

0 Answers0