I've asked a related question some days ago (Element-wise division by rows between dataframe and series) but did not want to edit it as, although is related, this is a new task.
Having this data (mydata.csv):
>>> mydata = pd.read_csv("mydata.csv")
>>> mydata
date type id ... 1096 1097 1098
0 2014-06-13 cal 1 ... 17.949524 16.247619 15.465079
1 2014-06-13 cow 32 ... 0.523429 -0.854286 -1.520952
2 2014-06-13 cow 47 ... 7.676000 6.521714 5.892381
3 2014-06-13 cow 107 ... 4.161714 3.048571 2.419048
4 2014-06-13 cow 137 ... 3.781143 2.557143 1.931429
5 2014-06-13 dark 1 ... 168.725714 167.885715 167.600001
6 2014-10-24 cal 1 ... 14.137143 12.584000 11.876191
7 2014-10-24 cal 2 ... -3.980571 -5.753143 -6.504762
8 2014-10-24 cow 32 ... -18.670857 -20.433714 -21.246667
9 2014-10-24 cow 47 ... -20.106286 -21.812571 -22.526667
10 2014-10-24 cow 105 ... -12.625714 -14.267429 -14.946667
11 2014-10-24 cow 107 ... -15.352571 -16.901714 -17.652381
12 2014-10-24 cow 137 ... -20.389143 -22.345143 -23.168571
13 2014-10-24 dark 1 ... 172.965715 172.031429 171.780952
What I want to do is that each row with type "cow"
to be divided by the row with type == "cal"
but both with the same value of "date"
. In the case that there are more than one "cal"
row for certain date, the one with higher id
should be selected.
For the data presented, rows 1-4 should be divided by row 0, and rows 8-12, by row 7. (the column after id, "188", is where the numeric values start). This should be the result:
date type id ... 1096 1097 1098
1 2014-06-13 cow 32 ... 0.029161 -0.052579 -0.098348
2 2014-06-13 cow 47 ... 0.427644 0.401395 0.381012
3 2014-06-13 cow 107 ... 0.231857 0.187632 0.156420
4 2014-06-13 cow 137 ... 0.210654 0.157386 0.124890
8 2014-10-24 cow 32 ... 4.690497 3.551748 3.266325
9 2014-10-24 cow 47 ... 5.051105 3.791418 3.463104
10 2014-10-24 cow 105 ... 3.171835 2.479936 2.297804
11 2014-10-24 cow 107 ... 3.856876 2.937823 2.713763
12 2014-10-24 cow 137 ... 5.122165 3.883989 3.561786
I managed to do it iterating over the unique dates, then filter and extract the "cal" row and the "cow" rows, divide them, and append each result to an empty DataFrame. Here is the function I used:
def divide_cal_date(mydata):
cows = mydata[(mydata["type"] == "cow")]
div_cows = pd.DataFrame()
for i in mydata.date.unique():
curr_cal = mydata[(mydata["date"] == i) & (mydata["type"] == "cal")].tail(1).iloc[:,3:]
temp_cows = cows[(cows["date"] == i)].copy()
temp_cows.loc[:,3:] = cows[(cows["date"] == i)].iloc[:,3:].div(curr_cal.squeeze())
div_cows = div_cows.append(temp_cows)
print(i, end="\r")
print("end")
return div_cows
As I stated in my previous question, I am just beginning with pandas, so I wonder if there is a more clean, efficient, straightforward way to do it. For now, it is working but any suggestion is appreciated.