Pandas - Column Addition for duplicate or list values in a column in Dataframe

Question

Lets say a dataframe looks like this

  one  two 
a  1.0  1.0 
b  2.0  2.0 
c  3.0  3.0 
d  NaN  4.0

Adding new three column is like this

df["three"] = df["one"] * df["two"]

Result

   one  two     three 
 a  1.0  1.0    1.0 
 b  2.0  2.0    4.0 
 c  3.0  3.0    9.0   
 d  NaN  4.0    NaN

How about colum values that contains duplicate lists or list, and I need to create a new column and add the number with highest value

Example

    one  two 
 a  1.0  [12,1]
         [12,1]
 b  2.0  2.0    
 c  3.0  3.0    
 d  NaN  4.0

So I want like this

    one  two        flag
 a  1.0  [12,1]      12
         [12,1]
 b  2.0  [200,400]   400
 c  3.0  3.0         3.0
 d  NaN  4.0         4.0

Thanks

Can you add for second DataFrame `df.to_dict('list')` to question ? — jezrael, May 31 '22 at 10:02

jezrael · Accepted Answer · 2022-06-01T05:43:04.353

2

If there is list or nested lists or floats you can flatten lists with max:

df = pd.DataFrame({"two":  [[[12,1],[12,1]] ,[200,400] ,3.0,4.0 ]})
    
from typing import Iterable 
              
#https://stackoverflow.com/a/40857703/2901002
def flatten(items):
    """Yield items from any nested iterable; see Reference."""
    for x in items:
        if isinstance(x, Iterable) and not isinstance(x, (str, bytes)):
            for sub_x in flatten(x):
                yield sub_x
        else:
            yield x
            
df['new'] = [max(flatten(x)) if isinstance(x, list) else x for x in df['two']]
print (df)
                  two    new
0  [[12, 1], [12, 1]]   12.0
1          [200, 400]  400.0
2                 3.0    3.0
3                 4.0    4.0

EDIT: For max in new DataFrame for all columns use aggregate function max:

df = df_orig.pivot_table(index=['keyword_name','volume'], 
                    columns='asin', 
                    values='rank', 
                    aggfunc=list)

df1 = df_orig.pivot_table(index=['keyword_name','volume'], 
                     columns='asin', 
                     values='rank', 
                     aggfunc='max')

out = pd.concat([df, df1.add_suffix('_max')], axis=1)

edited Jun 01 '22 at 05:43

answered May 31 '22 at 10:10

jezrael

822,522
95
1,334
1,252

Instead of column `two` (df['two']) , I have a dynamic columns of asins as per this question https://stackoverflow.com/questions/72435171/transforming-list-of-dicts-to-pandas-dataframe-with-dynamic-header-columns/72435254?noredirect=1#comment127964770_72435254, so its asin might contain duplicated or repeated lists – ira Jun 01 '22 at 01:16
i recreated it https://onlinegdb.com/xzM_j8e7T – ira Jun 01 '22 at 01:25
I also tried ` for col in df.columns: df["new"] = [max(flatten(x)) if isinstance(x, list) else x for x in col ]` – ira Jun 01 '22 at 02:48
@ira - answer was edited. – jezrael Jun 01 '22 at 05:31
I need to retain the original values in each columns and create a new colum to store the max value for each rows – ira Jun 01 '22 at 05:39
@ira - answer was edited. – jezrael Jun 01 '22 at 05:44
1

I tried the updated answer, but it create new columns for each column.What I want instead is to get max value of all columns and put it in new colum. This one works `[max(flatten(x)) if isinstance(x, list) else x for x in df['column1']]`. But it will only get the max of the specific column. Lets say it has columns ['column1','column2','column3']. I just dont know how to get the max for all columns like `[max(flatten(x)) if isinstance(x, list) else x for x in df[['column1','column2','column3']]]` <-(this will not work) – ira Jun 01 '22 at 10:04
@ira - so need `df1 = df[['column1','column2','column3']].applymap(lambda x: max(flatten(x)) if isinstance(x, list) else x)` ? – jezrael Jun 01 '22 at 10:06
After some digging into pandas docu, I solved it by adding ` margins=True, margins_name="max_value"` in pivot_table, but I am wondering some of rows not calculated properly, so the max_values not all accurate... – ira Jun 02 '22 at 03:27

Pandas - Column Addition for duplicate or list values in a column in Dataframe

1 Answers1