0

I need to count values from multiple columns and assign it to new column. Note: I'am sure value types are same for the columns. I couldn't come up with a solution. I know I can compare two series but how to compare 7 columns and get the most frequent value?

Column1  Column2 Column3 Result
   x        x       z       x
   y        y       x       y
   x        z       y     neutral
   z        x       z       z
   x        z       z       z

Sorry if I wasn't so explicit while explaining...

I tried:

df_partial.mode(axis='columns')

But is there any other way to do that?

quilliam
  • 182
  • 1
  • 9
  • [This question](https://stackoverflow.com/questions/15222754/groupby-pandas-dataframe-and-select-most-common-value) might be helpful – Teshan Shanuka J Aug 31 '20 at 11:05

1 Answers1

0

The following should work:

from collections import Counter
df['Result']=''
for i in range(len(df)):
    l=list(df.iloc[i, :])
    l=[x for x in l if str(x) != 'nan']
    d=dict(Counter(l)) 
    m=sorted(list(d.values()))
    if m[-1]==m[-2]:
        df.Result.iloc[i]='neutral'
    else:
        df.Result.iloc[i]=max(d, key=d.get)
IoaTzimas
  • 10,538
  • 2
  • 13
  • 30
  • 1
    In case number of values occur in same same amount? I guess it assigns the first one. But I want to assign as **neutral**. It works but doesn't provide enough functionality – quilliam Aug 31 '20 at 12:36
  • sorry, my mistake, i made some changes so that this scenario will be handled properly. However i only added the case of absolutely equa frequencies. What about cases where values will have frequency of eg 331 (some equal but not all of them)? What would you like to be resulted in this case? For example the following scenarios for the 7 columns: x y x y x y z – IoaTzimas Aug 31 '20 at 12:49
  • Yeah, this is another case actually. Thanks. This should be also considered as **neural** – quilliam Aug 31 '20 at 12:54
  • Ok, i made the changes. Now it will result to 'neutral' in case that the 2 higher frequencies are equal (i guess that you don't mind for equalities in lower frequencies, eg in case of 511 you want the value of 5 despite the equality in lower frequencies) – IoaTzimas Aug 31 '20 at 13:04
  • That's almost perfect. I forgot to handle NaN values. In case NaN values has max number of occurrence, I need to consider 2nd highest. – quilliam Aug 31 '20 at 13:11
  • Ok, i removed NaNs from calculations. It should work now – IoaTzimas Aug 31 '20 at 13:26
  • Thanks for the acceptance. Maybe an upvote of the answer too? Regards – IoaTzimas Aug 31 '20 at 13:30
  • I am not able to right now :( I need more then 15 as you may know – quilliam Aug 31 '20 at 13:41
  • ah ok, sorry i forgot that – IoaTzimas Aug 31 '20 at 13:42
  • 1
    I also had to check empty string since we initialized new column in the first place l=[x for x in l if str(x) != ' '] – quilliam Aug 31 '20 at 14:27