11

I have a following data frame df with two columns "identifier", "values" and "subid":

     identifier   values    subid
0      1          101       1
1      1          102       1
2      1          103       2 #index in list x        
3      1          104       2
4      1          105       2
5      2          106       3   
6      2          107       3
7      2          108       3
8      2          109       4 #index in list x
9      2          110       4
10     3          111       5
11     3          112       5 
12     3          113       6 #index in list x

I have a list of indices, say

x = [2, 8, 12] 

I want insert rows just before the indices mentioned in the list x. Like, for the row which is inserted just before index 2, will have the following values, it will have the same identifier as the row at index 2, i.e. 1; same values as the row at index 2, i.e. 103; but the subid in the new row would be ((subid at index 2) - 1), or simply the subid from the previous row i.e 1.

Below is the final resultant df I expect:

   identifier   values    subid
0      1          101       1
1      1          102       1
2      1          103       1 #new row inserted     
3      1          103       2 #index in list x        
4      1          104       2
5      1          105       2
6      2          106       3   
7      2          107       3
8      2          108       3
9      2          109       3 #new row inserted
10     2          109       4 #index in list x
11     2          110       4
12     3          111       5
13     3          112       5 
14     3          113       5 #new row inserted
15     3          113       6 #index in list x

The code I have been trying:

 m = df.index       #storing the indices of the df
 #m

 for i in m:
     if i in x:     #x is the given list of indices
         df.iloc[i-1]["identifier"] = df.iloc[i]["identifier"]
         df.iloc[i-1]["values"] = df.iloc[i]["values"]
         df.iloc[i-1]["subid"] = (df.iloc[i]["subid"]-1)
 df

The above code is simply replacing the rows at (i-1) indices and not inserting the additional rows with the above values. Please help.

Please let me know if anything is unclear.

Liza
  • 961
  • 3
  • 19
  • 35
  • possible [duplicate here](https://stackoverflow.com/questions/15888648/is-it-possible-to-insert-a-row-at-an-arbitrary-position-in-a-dataframe-using-pan) – Lucas Hattori Aug 18 '20 at 19:20

3 Answers3

4

Preserving the index order is the tricky part. I'm not sure this is the most efficient way to do this, but it should work.

x = [2,8,12]
rows = []
cur = {}

for i in df.index:
    if i in x:
        cur['index'] = i
        cur['identifier'] = df.iloc[i].identifier
        cur['values'] = df.iloc[i]['values']
        cur['subid'] = df.iloc[i].subid - 1
        rows.append(cur)
        cur = {}

Then, iterate through the new rows list, and perform an incremental concat, inserting each new row into the correct spot.

offset = 0; #tracks the number of rows already inserted to ensure rows are inserted in the correct position

for d in rows:
    df = pd.concat([df.head(d['index'] + offset), pd.DataFrame([d]), df.tail(len(df) - (d['index']+offset))])
    offset+=1


df.reset_index(inplace=True)
df.drop('index', axis=1, inplace=True)
df

    level_0 identifier  subid   values
0         0          1      1      101
1         1          1      1      102
2         0          1      1      103
3         2          1      2      103
4         3          1      2      104
5         4          1      2      105
6         5          2      3      106
7         6          2      3      107
8         7          2      3      108
9         0          2      3      109
10        8          2      4      109
11        9          2      4      110
12       10          3      5      111
13       11          3      5      112
14        0          3      5      113
15       12          3      6      113
bdiamante
  • 15,980
  • 6
  • 40
  • 46
  • Thank you so much, but I have a little problem, what if the "identifier", "subid" and "values" are not ordered ? Sorting wouldn't put the rows where I expect? Above was just a dummy data, sorry for keeping it ordered. How else can I proceed once I concat the two dfs ? – Liza Jun 22 '17 at 17:52
  • Well that's unfortunate. This definitely won't work if you need exact unordered placement. Give me a min to rework. – bdiamante Jun 22 '17 at 18:07
2

subtract where the prior row is different than the current row

# edit in place
df['values'] -= df.identifier.ne(df.identifier.shift().bfill())
df

    identifier  values
0            1     101
1            1     102
2            1     103
3            1     104
4            1     105
5            2     105
6            2     107
7            2     108
8            2     109
9            2     110
10           3     110
11           3     112
12           3     113

Or

# new dataframe
df.assign(values=df['values'] - df.identifier.ne(df.identifier.shift().bfill()))

    identifier  values
0            1     101
1            1     102
2            1     103
3            1     104
4            1     105
5            2     105
6            2     107
7            2     108
8            2     109
9            2     110
10           3     110
11           3     112
12           3     113
piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • The data frame should be altered as per the indices mentioned in the list l, here whenever the new identifier is encountered , it's subtracting the value in that row by 1. If index 2 is mentioned, I want to copy that value in the next row, with a new "identifier" and then continue with that identifier until another is encountered. – Liza Jun 18 '17 at 19:19
  • Please have a look at the question again, I have edited it, hope it's more clear about what I expect. – Liza Jun 21 '17 at 00:08
0

One can create a function to do the work.

def newrow(df, x):

    # For each index in x
    for i in x: 

        df.loc[i-0.5] = [df.loc[i, 'identifier'], df.loc[i, 'values'], df.loc[i, 'subid']-1]

    return df.sort_index().reset_index(drop=True)

Then one will have to apply the function to the dataframe df and the list of indices x as follows

df_new = newrow(df, x)

[Out]:

    identifier  values  subid
0            1     101      1
1            1     102      1
2            1     103      1
3            1     103      2
4            1     104      2
5            1     105      2
6            2     106      3
7            2     107      3
8            2     108      3
9            2     109      3
10           2     109      4
11           2     110      4
12           3     111      5
13           3     112      5
14           3     113      5
15           3     113      6

Notes:

  • If in pandas.DataFrame.loc one passed only i, that would replace the elements on those rows. One way to circumvent that is using a decimal in between. In this case, one is using 0,5, but 0.1, 0.000001 would also do the work, as long as it was less than the current one, i, and more than i-1. With this, one is not creating a decimal index, but a label of an index, as one can see from Pandas documentation

A single label, e.g. 5 or 'a' (Note that 5 is interpreted as a label of the index. This use is not an integer position along the index.). [Source]

Then, if one sorts the index and resets the index (what one is doing in the return), one would end up with the desired output.

Gonçalo Peres
  • 11,752
  • 3
  • 54
  • 83