I have a for loop that is taking a subsample of my original dataset, doing a prediction from a previously fit model, and then i need to match the target
value from the original dataframe to the prediction to calculate a different value.
20 lines from original subsample:
index f0 f1 f2 product
89641 11.758713 -2.548885 5.007187 134.766305
30665 7.134050 -7.369558 3.990141 107.813044
71148 -13.860892 -2.727111 4.995418 137.945408
63263 -1.949113 6.340399 4.999270 134.766305
34301 2.741874 -5.114227 1.990971 57.085625
28150 -9.194978 -8.220917 4.000539 110.992147
37974 5.416532 -6.685454 3.997102 107.813044
63541 8.116958 -0.106199 1.992089 53.906522
69007 -0.886114 -8.732907 3.004329 84.038886
8808 -10.138814 -5.428649 3.996867 110.992147
77082 -7.427920 -9.558472 5.002233 137.945408
30523 0.780631 -1.872719 1.000312 30.132364
78523 3.096930 -6.854314 3.000831 84.038886
66519 4.459357 -6.787551 4.994414 134.766305
69231 10.113738 -10.433003 4.004866 107.813044
48418 -17.092959 -3.294716 1.999222 57.085625
59715 -0.970615 -1.741134 2.012687 57.085625
30159 -7.075355 -16.977595 4.997697 137.945408
34763 5.850225 -5.069475 2.994821 80.859783
99239 -8.493579 -8.126316 1.004643 30.132364
code:
r2_revenue = []
for i in range(1000):
subsample = r2_test.sample(500,replace=True)
features = subsample.drop(['product'],axis=1)
predict = model2.predict(features)
top_200 = pd.Series(predict).sort_values(ascending=False).iloc[:200]
target = subsample['product'].isin(top_200)
result = (revenue(target).sum())
r2_revenue.append(result)
so, my "target" needs to find the index of each top_200
entry and then find the resulting entry in the ['product']
from the original subsample.
i am striking out on finding the way to take the index number from the series top_200
and find the corresponding product
value from the original dataset.
i feel like i am missing something obvious, but searches like "matching an index from a series to a value in a dataframe" are turning up results for a single dataframe, not a series to a dataframe.
if i were looking up data, i'd use a .query() but i don't know how to do that with an index to an index?
any input would be greatly appreciated!
:Edit to help clarify (hopefully):
so my series top_200 is predictions from the subsample dataframe. the index from the series should be the same as the index from the subsample dataframe. based on the index for a particular row, i want to look up a value in the product
column of the subsample dataframe with the same index number.
so here is an example output for that series:
303 139.893243
203 138.886222
21 138.561583
296 138.535309
391 138.491757
the rows are 303,203,21,296 and 391. i now want to get the value in the column product
from the subsample
dataframe for the rows 303,203,21,296 and 391.