Logical Tests on Two Dataframe Column Values to Determine Assignment of a Value from Two Dictionaries to a Third Column

Question

OK,

I think I finally understand enough stuff to ask a question. Be easy - I'm a ship driver and water hippie playing with AIS datasets and who's last foray into coding was the early 90s with Fortran 77. I've spent the better part of the day trying to solve this problem. Lots of "close but no cigar" answers here and in randomly Googled articles. So, here goes:

I've got a df (list5_df) with two columns (GENERAL_CLASS, INDIVIDUAL_CLASS) that provide a 2 or 3-letter ship classification code. I need to translate these classification codes by means of a dictionary look-up, which would then be assigned to a new/blank column. BEFORE DOING THIS THOUGH (and where my barrier lies) there are some logical tests I need to apply to the general and individual ship classification codes to determine whether the integer code will be assigned from the general code column via a dictionary lookup using the dictionary "gc_dict" or from the individual code column and a dictionary lookup using the "ic_dict." (yes, I snickered too when I realized how most people would pronounce that dictionary name.)

What have I done so far? Lots and lots of this type of stuff:

np.where((list5_df['GENERAL_CLASS'].eq('FV')|list5_df['GENERAL_CLASS'].eq('NS')), 
         list5_df['AIS_SHIP_TYPE'] = list5_df['GENERAL_CLASS'].map(gc_dict))

^^^doesn't work because np.where doesn't allow expressions after the logical test

if list5_df['INDIVIDUAL_CLASS'].eq('XXX'):
    list5_df['AIS_SHIP_TYPE'] = list5_df['GENERAL_CLASS'].map(gc_dict)

^^^doesn't work because of the deadly "truth value of a Series is ambiguous"

if (list5_df.INDIVIDUAL_CLASS.loc != 'ZZZ') & (list5_df.INDIVIDUAL_CLASS.loc != 'XXX'):
    for key, value in gc_dict.items():
        list5_df.loc[list5_df["INDIVIDUAL_CLASS"] == key, ["AIS_SHIP_TYPE"]] = value

^^^can't remember why this doesn't work...I think it was that the logical test was all jacked up - only evaluating the first row or something and missed a substantial portion of the 680K+ entries.

if list5_df.INDIVIDUAL_CLASS.loc == "XXX":
    for key, value in gc_dict.items():
        list5_df.loc[list5_df['GENERAL_CLASS'] == key, ['AIS_SHIP_TYPE']] = value

if list5_df.INDIVIDUAL_CLASS.loc == "ZZZ":
    for key, value in gc_dict.items():
        list5_df.loc[list5_df['GENERAL_CLASS'] == key, ['AIS_SHIP_TYPE']] = value
        
if (list5_df.INDIVIDUAL_CLASS.loc != "ZZZ") & (list5_df.INDIVIDUAL_CLASS.loc != "XXX"):
    for key, value in ic_dict.items():
        list5_df.loc[list5_df['INDIVIDUAL_CLASS'] == key, ['AIS_SHIP_TYPE']] = value

^^^this was my first attempt that sort of worked, but not completely. And honestly, I'm not quite sure why...probably the same issue with iterating through each column value and performing the test on all 680K fields.

And the final coding of desperation involved using a for loop with .itterows() and that crashed hard (...don't judge, I really was desperate.) Part of me is thinking that perhaps I'm trying to do too many things at once and shouldn't be "greedy" (remember, I learned to code in Fortran 77, so all this new-fangled object oriented hoo-ha is still blowing my mind.)

Ideas on how to get back to an even keel?

Could you add some example input to illustrate the problem clearer (i.e., a small dataframe with at most ~10 rows and 2 short dicts) and the expected result of this. — Shaido, Apr 08 '21 at 06:29
Whelp - apparently I don't know enough yet as figuring out how to copy and paste some of the dataframe is giving me formatting fits. I'll try to mess around with this again tonight. Thanks for your interest and stay tuned... — Amilynn Adams, Apr 08 '21 at 12:56
It does not have to be the real data, as long as it correctly shows the problem it's good. The simpler you can make the example, the better. You can see here for reference: https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples — Shaido, Apr 08 '21 at 14:21

Logical Tests on Two Dataframe Column Values to Determine Assignment of a Value from Two Dictionaries to a Third Column

0 Answers0