5

While iterating through the rows of a specific column in a Pandas DataFrame, I would like to add a new row below the currently iterated row, if the cell in the currently iterated row meets a certain condition.

Say for example:

df = pd.DataFrame(data = {'A': [0.15, 0.15, 0.7], 'B': [1500, 1500, 7000]})

DataFrame:

      A     B
0  0.15  1500
1  0.15  1500
2  0.70  7000

Attempt:

y = 100                             #An example scalar

i = 1

for x in df['A']:
    if x is not None:               #Values in 'A' are filled atm, but not necessarily.
        df.loc[i] = [None, x*y]     #Should insert None into 'A', and product into 'B'.
        df.index = df.index + 1     #Shift index? According to this S/O answer: https://stackoverflow.com/a/24284680/4909923
    i = i + 1

df.sort_index(inplace=True)         #Sort index?

I haven't been able to succeed so far; getting a shifted index numbering that doesn't start at 0, and rows seem not to be inserted in an orderly way:

      A     B
3  0.15  1500
4   NaN    70
5  0.70  7000

I tried various variants of this, trying to use applymap with a lambda function, but was not able to get it working.

Desired result:

      A     B
0  0.15  1500
1  None  15
2  0.15  1500
3  None  15
4  0.70  7000
5  None  70
P A N
  • 5,642
  • 15
  • 52
  • 103
  • As it stands, I see no use case for pandas. You're iterating (mostly a no no) and you want to insert rows (also not really a good thing in numpy, which is the underlying structure) – roganjosh Nov 10 '18 at 10:39
  • @roganjosh I am already using Pandas, this is just a subset of a DataFrame script doing other things also. I need to be able to insert rows as the program will create a DataFrame depending on other factors, therefore not so desirable to preallocate the index (also while inserting rows is inefficient, it doesn't matter as I'm dealing with less than tens of rows, not thousands). – P A N Nov 10 '18 at 10:43
  • So my point still stands. Drop pandas and deal with nested lists. Pandas is just getting in the way here. – roganjosh Nov 10 '18 at 10:44

2 Answers2

3

I believe you can use:

df = pd.DataFrame(data = {'A': [0.15, 0.15, 0.7], 
                          'B': [1500, 1500, 7000],
                          'C': [100, 200, 400]})

v = 100
L = []
for i, x in df.to_dict('index').items():
    print (x)
    #append dictionary
    L.append(x)
    #append new dictionary, for missing keys ('B, C') DataFrame constructor add NaNs 
    L.append({'A':x['A'] * v})

df = pd.DataFrame(L)
print (df)
       A       B      C
0   0.15  1500.0  100.0
1  15.00     NaN    NaN
2   0.15  1500.0  200.0
3  15.00     NaN    NaN
4   0.70  7000.0  400.0
5  70.00     NaN    NaN
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
1

It doesn't seem you need a manual loop here:

df = pd.DataFrame(data = {'A': [0.15, 0.15, 0.7], 'B': [1500, 1500, 7000]})

y = 100

# copy slice of dataframe
df_extra = df.loc[df['A'].notnull()].copy()

# assign A and B series values
df_extra = df_extra.assign(A=np.nan, B=(df_extra['A']*y).astype(int))

# increment index partially, required for sorting afterwards
df_extra.index += 0.5

# append, sort index, drop index
res = df.append(df_extra).sort_index().reset_index(drop=True)

print(res)

      A     B
0  0.15  1500
1   NaN    15
2  0.15  1500
3   NaN    15
4  0.70  7000
5   NaN    70
jpp
  • 159,742
  • 34
  • 281
  • 339