Pandas: np.where overwriting values

Question

I currently iterate through the rows of an excel file multiple times and write in "XYZ" to a new column when the row meets certain conditions.

My current code is:

 df["new_column"] = np.where(fn == True, "XYZ", "")

The issue I face is that when the fn == True condition is not satisfied, I want to do absolutely nothing and move onto checking the next row of the excel file. I noticed that each time I iterate, the empty string replaces the "XYZ"s that are already marked in the file. Is there a way to prevent this from happening? Is there something I can do instead of empty string ("") to prevent overwriting?

Edit:

My dataframe is a huge financial Excel file with multiple columns and rows. This data set has columns like quantity, revenue, sales, etc. Basically, I have a list that contains about 50 conditionals. For each condition, I iterate through all the rows in the Excel and for the row that matches the condition, I wanted to put an "XYZ" in the df["new_column"] flagging that row. The df["new_column"] is an added column to the original dataframe. Then, I move onto the next condition up until the 50th conditional.

I think the problem is, is that the way I wrote code replaces the previous existing "XYZ" with empty string when I proceed onto check the other conditionals in the list. Basically, I want to find a way to lock "XYZ" in, so it can't become overwritten.

The fn is a helper function that returns a boolean depending on if the condition equals a row in the dataframe. While I iterate, if the condition matches a row, then this function returns True and marks the df["new_column"] with "XYZ". The helper function takes in multiple arguments to check if the current condition matches any of the rows in the dataframe. I hope this explanation helps!

I know the name suggests otherwise, but does `df["new_column"]` already exist in the dataframe? — cs95, Aug 31 '17 at 21:56
It overwrites the previous "XYZ"s that were already written in the excel file. — Brian Kim, Aug 31 '17 at 22:21

score 1 · Answer 1 · answered Sep 01 '17 at 01:05

you can try using a lambda.

first, create the function:

def checkIfTrue(FN, new):
    if new == "":
        pass
    if FN:
        return "XYZ"

than apply this to the new column like that:

df['new_column'] = df.apply(lambda row: checkIfTrue(row["fn"], row["new_column"]), axis=1)

score 0 · Answer 2 · answered Aug 31 '17 at 23:12

0

IIUC you want to use .loc[]:

df.loc[fn, "new_column"] = 'XYZ'

answered Aug 31 '17 at 23:12

MaxU - stand with Ukraine

205,989
36
386
419

This gives me "cannot use a single bool to index into setitem" – Brian Kim Aug 31 '17 at 23:22
@BrianKim, could you provide a small sample data sets (`df` and `fn`) and your desired data set? – MaxU - stand with Ukraine Aug 31 '17 at 23:24
I edited my previous response. I hope it's more helpful now! – Brian Kim Aug 31 '17 at 23:38
@BrianKim, no it didn't make it much clearer. Please read [how to make good reproducible pandas examples](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) and edit your post correspondingly. – MaxU - stand with Ukraine Aug 31 '17 at 23:40

Pandas: np.where overwriting values

2 Answers2