How to add a new column to pandas dataframe while iterate over the rows?

Question

I want to generate a new column using some columns that already exists.But I think it is too difficult to use an apply function. Can I generate a new column (ftp_price here) when iterating through this dataframe? Here is my code. When I call product_df['ftp_price'],I got a KeyError.

for index, row in product_df.iterrows():
    current_curve_type_df = curve_df[curve_df['curve_surrogate_key'] == row['curve_surrogate_key_x']]
    min_tmp_df = row['start_date'] - current_curve_type_df['datab_map'].apply(parse)
    min_tmp_df = min_tmp_df[min_tmp_df > timedelta(days=0)]
    curve = current_curve_type_df.loc[min_tmp_df.idxmin()]
    tmp_diff = row['end_time'] - np.array(row['start_time'])

    if np.isin(0, tmp_diff):
        idx = np.where(tmp_diff == 0)
        col_name = COL_NAMES[idx[0][0]]
        row['ftp_price'] = curve[col_name]
    else:
        idx = np.argmin(tmp_diff > 0)
        p_plus_one_rate = curve[COL_NAMES[idx]]
        p_minus_one_rate = curve[COL_NAMES[idx - 1]]
        d_plus_one_days = row['start_date'] + rate_mapping_dict[COL_NAMES[idx]]
        d_minus_one_days = row['start_date'] + rate_mapping_dict[COL_NAMES[idx - 1]]
        row['ftp_price'] = p_minus_one_rate + (p_plus_one_rate - p_minus_one_rate) * (row['start_date'] - d_minus_one_days) / (d_plus_one_days - d_minus_one_days)

Add the values to a list and then create the `ftp_price` column once the loop is done using the list? Might be less of a headache imo. — Sid, Jun 11 '19 at 10:11
I wonder if a list may take up more memory than do some changes directly on the dataframe.Using a list is easy though.@Sid — Shengxin Huang, Jun 11 '19 at 10:15

score 3 · Answer 1 · answered Jun 11 '19 at 10:24

3

An alternative to setting new value to a particular index is using at:

for index, row in product_df.iterrows():
    product_df.at[index, 'ftp_price'] = val

Also, you should read why using iterrows should be avoided

answered Jun 11 '19 at 10:24

Mohit Motwani

4,662
3
17
45

score 1 · Accepted Answer · answered Jun 11 '19 at 10:15

A row can be a view or a copy (and is often a copy), so changing it would not change the original dataframe. The correct way is to always change the original dataframe using loc or iloc:

product_df.loc[index, 'ftp_price'] = ...

That being said, you should try to avoid to explicitely iterate the rows of a dataframe when possible...

How to add a new column to pandas dataframe while iterate over the rows?

2 Answers2