So I have a DataFrame with 180000+ values and I need to (1) replace duplicate and certain values in cells by row and (2) rearrange. Here is my DataFrame, df:
key sellyr brand makrc item1 item2 item3 item4 item5 item6
0 da12 2013 imp apt furi apt nan nan nan nan
1 da32 2013 sa rye rye app nan nan nan nan
2 da14 2013 sa pro not pro pan fan nan nan
........
nan values represent np.nan. And forbidden string is 'not'.
So what I need to do is check columns item1~6 replace strings that are contained in the makrc column with nan. As well, I also want to replace 'not's' with nan's. After replacing strings to np.nan, I need to rearrange the item1~6 to left justify non-nan data to the leftmost empty cell, as shown below, (expected output):
key sellyr brand makrc item1 item2 item3 item4 item5 item6
0 da12 2013 imp apt furi nan nan nan nan nan
1 da32 2013 sa rye app nan nan nan nan nan
2 da14 2013 sa pro pan fan nan nan nan nan
........
So as you can see in a first index, I have removed apt string in item2 and changed to np.nan because same string is in makrc column. In index 1, I have removed rye and replace with np.nan. But this time, I rearranged the 'app' string from item2 to item1 because np.nan values should come after the values. In index 2, I have replaced pro and not since I need to replace every 'not'string in the item columns to np.nan. Also I have rearranged the items.
I've tried combining all item columns as a list and replacing it, but there are a few rows with only np.nan items. Can you guys recommend an ideal process to solve my problem? Thank you so much.