-1

I have a dataframe that looks like this:

   A  B          C
0  1  2  PRODUCT_1
1  3  2  PRODUCT_2
2  3  2  PRODUCT_4
3  3  2  PRODUCT_5
4  5  2  PRODUCT_1
5  3  2  PRODUCT_3

I want to, for each unique product, perform a model prediction with A and B columns, and store the corresponding accuracy.

unique = ["PRODUCT_1", ...] # unique products
accuracy
for i in unique:
    first_subset = ???  # all rows for product `i` - how do I implement this correctly?

X = first_subset[:, 0]
Y = first_subset[:, 1]

prediction_product_1 = model.predict(X)
accuracy_product_1 = np.sum( (prediction_product_1)/np.sum(Y) )    
accuracy.append([accuracy_product_1, PRODUCT_1])

How could I implement the second point in Python?

cs95
  • 379,657
  • 97
  • 704
  • 746
Alessandro Ceccarelli
  • 1,775
  • 5
  • 21
  • 41

1 Answers1

0

Starting with -

df = pd.DataFrame(...) # your data 
df

   A  B          C
0  1  2  PRODUCT_1
1  3  2  PRODUCT_2
2  3  2  PRODUCT_4
3  3  2  PRODUCT_5
4  5  2  PRODUCT_1
5  3  2  PRODUCT_3

Find uniques first, using

uniques = df.C.unique()

uniques
array(['PRODUCT_1', 'PRODUCT_2', 'PRODUCT_4', 'PRODUCT_5', 'PRODUCT_3'], dtype=object)

To get all rows from a particular product, I'd do this using groupby (so, you actually don't need uniques here) -

acc = {}
for i, g in df.groupby('C'):
    X, y = g['A'], g['B']
    p = model.predict(X, y)

    acc[i] = (p == y).sum() / len(y)

Basically, for each group, call model.predict and append a key-value pair to the acc dict.

cs95
  • 379,657
  • 97
  • 704
  • 746
  • how could it be addressed using numpy instead? – Alessandro Ceccarelli Dec 12 '17 at 15:11
  • @AlessandroCeccarelli If you want a numpy answer, don't tag the question pandas. Also, a numpy solution may not be as efficient or easy to achieve. – cs95 Dec 12 '17 at 15:12
  • Okay, but whenever I try to run the code, it report me an error at the last line acc[i] = ..." list assignment index out of range" ; could it be because I have not called an "append" command yet? – Alessandro Ceccarelli Dec 12 '17 at 15:36
  • @AlessandroCeccarelli No did you see the top of the loop? I've defined `acc` as `{}`, not `[]`. – cs95 Dec 12 '17 at 15:37