Extract Uniques and loop

Question

I have a dataframe that looks like this:

   A  B          C
0  1  2  PRODUCT_1
1  3  2  PRODUCT_2
2  3  2  PRODUCT_4
3  3  2  PRODUCT_5
4  5  2  PRODUCT_1
5  3  2  PRODUCT_3

I want to, for each unique product, perform a model prediction with A and B columns, and store the corresponding accuracy.

unique = ["PRODUCT_1", ...] # unique products
accuracy
for i in unique:
    first_subset = ???  # all rows for product `i` - how do I implement this correctly?

X = first_subset[:, 0]
Y = first_subset[:, 1]

prediction_product_1 = model.predict(X)
accuracy_product_1 = np.sum( (prediction_product_1)/np.sum(Y) )    
accuracy.append([accuracy_product_1, PRODUCT_1])

How could I implement the second point in Python?

take a close looke at `df.groupby`. This generates groups or smaller `DataFrames` for each unique key — Maarten Fabré, Dec 12 '17 at 10:04
I can't correctly initialize the loop and create a frame for the last point, since I'm not an advanced coder @cᴏʟᴅsᴘᴇᴇᴅ — Alessandro Ceccarelli, Dec 12 '17 at 10:38
Okay, can't you at least provide a [mcve] with sample data and output? You're working with pandas aren't you? How do you expect to get an answer without providing anything? — cs95, Dec 12 '17 at 10:40
[here](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) are some more tips for a clear question — Maarten Fabré, Dec 12 '17 at 10:47
Thank you; I tried my best to make it more clear @cᴏʟᴅsᴘᴇᴇᴅ — Alessandro Ceccarelli, Dec 12 '17 at 14:06
I cleaned your question up, it was badly in need of some work. See my answer for a solution to your problem. — cs95, Dec 12 '17 at 14:17

score 0 · Accepted Answer · answered Dec 12 '17 at 14:12

0

Starting with -

df = pd.DataFrame(...) # your data 
df

   A  B          C
0  1  2  PRODUCT_1
1  3  2  PRODUCT_2
2  3  2  PRODUCT_4
3  3  2  PRODUCT_5
4  5  2  PRODUCT_1
5  3  2  PRODUCT_3

Find uniques first, using

uniques = df.C.unique()

uniques
array(['PRODUCT_1', 'PRODUCT_2', 'PRODUCT_4', 'PRODUCT_5', 'PRODUCT_3'], dtype=object)

To get all rows from a particular product, I'd do this using groupby (so, you actually don't need uniques here) -

acc = {}
for i, g in df.groupby('C'):
    X, y = g['A'], g['B']
    p = model.predict(X, y)

    acc[i] = (p == y).sum() / len(y)

Basically, for each group, call model.predict and append a key-value pair to the acc dict.

answered Dec 12 '17 at 14:12

cs95

379,657
97
704
746

how could it be addressed using numpy instead? – Alessandro Ceccarelli Dec 12 '17 at 15:11
@AlessandroCeccarelli If you want a numpy answer, don't tag the question pandas. Also, a numpy solution may not be as efficient or easy to achieve. – cs95 Dec 12 '17 at 15:12
Okay, but whenever I try to run the code, it report me an error at the last line acc[i] = ..." list assignment index out of range" ; could it be because I have not called an "append" command yet? – Alessandro Ceccarelli Dec 12 '17 at 15:36
@AlessandroCeccarelli No did you see the top of the loop? I've defined `acc` as `{}`, not `[]`. – cs95 Dec 12 '17 at 15:37

Extract Uniques and loop

1 Answers1