1

I've asked a related question some days ago (Element-wise division by rows between dataframe and series) but did not want to edit it as, although is related, this is a new task.

Having this data (mydata.csv):

>>> mydata = pd.read_csv("mydata.csv")
>>> mydata
          date  type   id     ...            1096        1097        1098
0   2014-06-13   cal    1     ...       17.949524   16.247619   15.465079
1   2014-06-13   cow   32     ...        0.523429   -0.854286   -1.520952
2   2014-06-13   cow   47     ...        7.676000    6.521714    5.892381
3   2014-06-13   cow  107     ...        4.161714    3.048571    2.419048
4   2014-06-13   cow  137     ...        3.781143    2.557143    1.931429
5   2014-06-13  dark    1     ...      168.725714  167.885715  167.600001
6   2014-10-24   cal    1     ...       14.137143   12.584000   11.876191
7   2014-10-24   cal    2     ...       -3.980571   -5.753143   -6.504762
8   2014-10-24   cow   32     ...      -18.670857  -20.433714  -21.246667
9   2014-10-24   cow   47     ...      -20.106286  -21.812571  -22.526667
10  2014-10-24   cow  105     ...      -12.625714  -14.267429  -14.946667
11  2014-10-24   cow  107     ...      -15.352571  -16.901714  -17.652381
12  2014-10-24   cow  137     ...      -20.389143  -22.345143  -23.168571
13  2014-10-24  dark    1     ...      172.965715  172.031429  171.780952

What I want to do is that each row with type "cow" to be divided by the row with type == "cal" but both with the same value of "date". In the case that there are more than one "cal" row for certain date, the one with higher id should be selected. For the data presented, rows 1-4 should be divided by row 0, and rows 8-12, by row 7. (the column after id, "188", is where the numeric values start). This should be the result:

          date type   id    ...         1096      1097      1098
1   2014-06-13  cow   32    ...     0.029161 -0.052579 -0.098348
2   2014-06-13  cow   47    ...     0.427644  0.401395  0.381012
3   2014-06-13  cow  107    ...     0.231857  0.187632  0.156420
4   2014-06-13  cow  137    ...     0.210654  0.157386  0.124890
8   2014-10-24  cow   32    ...     4.690497  3.551748  3.266325
9   2014-10-24  cow   47    ...     5.051105  3.791418  3.463104
10  2014-10-24  cow  105    ...     3.171835  2.479936  2.297804
11  2014-10-24  cow  107    ...     3.856876  2.937823  2.713763
12  2014-10-24  cow  137    ...     5.122165  3.883989  3.561786

I managed to do it iterating over the unique dates, then filter and extract the "cal" row and the "cow" rows, divide them, and append each result to an empty DataFrame. Here is the function I used:

def divide_cal_date(mydata):
    cows = mydata[(mydata["type"] == "cow")]
    div_cows = pd.DataFrame()
    for i in mydata.date.unique():
        curr_cal = mydata[(mydata["date"] == i) & (mydata["type"] == "cal")].tail(1).iloc[:,3:]        
        temp_cows = cows[(cows["date"] == i)].copy()
        temp_cows.loc[:,3:] = cows[(cows["date"] == i)].iloc[:,3:].div(curr_cal.squeeze())
        div_cows = div_cows.append(temp_cows)        
        print(i, end="\r")
    print("end")
    return div_cows

As I stated in my previous question, I am just beginning with pandas, so I wonder if there is a more clean, efficient, straightforward way to do it. For now, it is working but any suggestion is appreciated.

gustavovelascoh
  • 1,208
  • 1
  • 14
  • 28
  • 1
    It looks good enough for me, but be careful, the `tail(1)` implies that your ids are sorted. :) – IMCoins Feb 06 '19 at 16:32

0 Answers0