-1

I have this dataframe:

   SRC  Coup  Vint    Bal   Mar   Apr   May   Jun   Jul BondSec
0  JPM   1.5  2021   43.9   5.6   4.9   4.9   5.2   4.4    FNCL
1  JPM   1.5  2020   41.6   6.2   6.0   5.6   5.8   4.8    FNCL
2  JPM   2.0  2021  503.9   7.1   6.3   5.8   6.0   4.9    FNCL
3  JPM   2.0  2020  308.3   9.3   7.8   7.5   7.9   6.6    FNCL
4  JPM   2.5  2021  345.0   8.6   7.8   6.9   6.8   5.6    FNCL
5  JPM   4.5  2010    5.7  21.3  20.0  18.0  17.7  14.6    G2SF
6  JPM   5.0  2019    2.8  39.1  37.6  34.6  30.8  24.2    G2SF
7  JPM   5.0  2018    7.3  39.8  37.1  33.4  30.1  24.2    G2SF
8  JPM   5.0  2010    3.9  23.3  20.0  18.6  17.9  14.6    G2SF
9  JPM   5.0  2009    4.2  22.8  21.2  19.5  18.6  15.4    G2SF

I want to duplicate all the rows that have FNCL as the BondSec, and rename the value of BondSec in those new duplicate rows to FGLMC. I'm able to accomplish half of that with the following code:

if "FGLMC" not in jpm['BondSec']:
    is_FNCL = jpm['BondSec'] == "FNCL"
    FNCL_try = jpm[is_FNCL]
    jpm.append([FNCL_try]*1,ignore_index=True)

But if I instead try to implement the change to the BondSec value in the same line as below:

jpm.append(([FNCL_try]*1).assign(**{'BondSecurity': 'FGLMC'}),ignore_index=True)

I get the following error: AttributeError: 'list' object has no attribute 'assign'

Additionally, I would like to insert the duplicated rows based on an index condition, not just at the bottom as additional rows. The condition cannot be simply a row position because this will have to work on future files with different numbers of rows. So I would like to insert the duplicated rows at the position where the BondSec column values change from FNCL to FNCI (FNCI is not showing here, but basically it would be right below the last row with FNCL). I'm assuming this could be done with an np.where function call, but I'm not sure how to implement that.

I'll also eventually want to do this same exact process with rows with FNCI as the BondSec value (duplicating them and transforming the BondSec value to FGCI, and inserting at the index position right below the last row with FNCI as the value).

Hefe
  • 421
  • 3
  • 23
  • 2
    FWIW `pd.DataFrame.append` is really slow and inefficient. – ddejohn Jun 10 '22 at 18:07
  • 2
    The issue is your parenthesis are incorrect: `([FNCL_try]*1).assign`. Also, what's with the `*1`? – ddejohn Jun 10 '22 at 18:07
  • 1
    If you don't actually care about the order of BondSec groups, I'd recommend using `pd.concat([df, df[df["BondSec"] == "FNCL"].assign(BondSec="FGLMC")])` – ddejohn Jun 10 '22 at 18:07
  • If you *do* care about having your BondSec grouped a certain way, I'd suggest doing all the duplication you need to do first, and then using a [custom ordering](https://stackoverflow.com/questions/13838405/custom-sorting-in-pandas-dataframe). – ddejohn Jun 10 '22 at 18:09
  • Hi @ddejohn, thank you for your help. I wasn't familiar with `pd.concat` formatting but happy to use it instead. I do care about the ordering, but I can check out the custom ordering link you shared. – Hefe Jun 10 '22 at 18:13
  • @ddejohn the `pd.concat` you shared did the trick! Now I need to figure out the ordering. – Hefe Jun 10 '22 at 18:23

1 Answers1

1

I'd suggest a helper function to handle all your duplications:

def duplicate_and_rename(df, target, value):
    return pd.concat([df, df[df["BondSec"] == target].assign(BondSec=value)])

Then

for target, value in (("FNCL", "FGLMC"), ("FNCI", "FGCI")):
    df = duplicate_and_rename(df, target, value)

Then after all that, you can categorize the BondSec column and use a custom order:

ordering = ["FNCL", "FGLMC", "FNCI", "FGCI", "G2SF"]

df["BondSec"] = pd.Categorical(df["BondSec"], ordering).sort_values()
df = df.reset_index(drop=True)

Alternatively, you can use a dictionary for your ordering, as explained in this answer.

ddejohn
  • 8,775
  • 3
  • 17
  • 30
  • Amazing! Thank you, it worked. I was getting an attribute error when `.reset_index()` was attached to the `pd.Categorical()` call, so I just reset the index of the entire df in a new line. – Hefe Jun 10 '22 at 18:53