OK,
I think I finally understand enough stuff to ask a question. Be easy - I'm a ship driver and water hippie playing with AIS datasets and who's last foray into coding was the early 90s with Fortran 77. I've spent the better part of the day trying to solve this problem. Lots of "close but no cigar" answers here and in randomly Googled articles. So, here goes:
I've got a df (list5_df) with two columns (GENERAL_CLASS, INDIVIDUAL_CLASS) that provide a 2 or 3-letter ship classification code. I need to translate these classification codes by means of a dictionary look-up, which would then be assigned to a new/blank column. BEFORE DOING THIS THOUGH (and where my barrier lies) there are some logical tests I need to apply to the general and individual ship classification codes to determine whether the integer code will be assigned from the general code column via a dictionary lookup using the dictionary "gc_dict" or from the individual code column and a dictionary lookup using the "ic_dict." (yes, I snickered too when I realized how most people would pronounce that dictionary name.)
What have I done so far? Lots and lots of this type of stuff:
np.where((list5_df['GENERAL_CLASS'].eq('FV')|list5_df['GENERAL_CLASS'].eq('NS')),
list5_df['AIS_SHIP_TYPE'] = list5_df['GENERAL_CLASS'].map(gc_dict))
^^^doesn't work because np.where doesn't allow expressions after the logical test
if list5_df['INDIVIDUAL_CLASS'].eq('XXX'):
list5_df['AIS_SHIP_TYPE'] = list5_df['GENERAL_CLASS'].map(gc_dict)
^^^doesn't work because of the deadly "truth value of a Series is ambiguous"
if (list5_df.INDIVIDUAL_CLASS.loc != 'ZZZ') & (list5_df.INDIVIDUAL_CLASS.loc != 'XXX'):
for key, value in gc_dict.items():
list5_df.loc[list5_df["INDIVIDUAL_CLASS"] == key, ["AIS_SHIP_TYPE"]] = value
^^^can't remember why this doesn't work...I think it was that the logical test was all jacked up - only evaluating the first row or something and missed a substantial portion of the 680K+ entries.
if list5_df.INDIVIDUAL_CLASS.loc == "XXX":
for key, value in gc_dict.items():
list5_df.loc[list5_df['GENERAL_CLASS'] == key, ['AIS_SHIP_TYPE']] = value
if list5_df.INDIVIDUAL_CLASS.loc == "ZZZ":
for key, value in gc_dict.items():
list5_df.loc[list5_df['GENERAL_CLASS'] == key, ['AIS_SHIP_TYPE']] = value
if (list5_df.INDIVIDUAL_CLASS.loc != "ZZZ") & (list5_df.INDIVIDUAL_CLASS.loc != "XXX"):
for key, value in ic_dict.items():
list5_df.loc[list5_df['INDIVIDUAL_CLASS'] == key, ['AIS_SHIP_TYPE']] = value
^^^this was my first attempt that sort of worked, but not completely. And honestly, I'm not quite sure why...probably the same issue with iterating through each column value and performing the test on all 680K fields.
And the final coding of desperation involved using a for loop with .itterows() and that crashed hard (...don't judge, I really was desperate.) Part of me is thinking that perhaps I'm trying to do too many things at once and shouldn't be "greedy" (remember, I learned to code in Fortran 77, so all this new-fangled object oriented hoo-ha is still blowing my mind.)
Ideas on how to get back to an even keel?