0

I have following dataframe:

p     s
ABCD  AB,AC,AD
XY    XY   
MSD   MS,MD
PQRS  PQ,PR,PS

I'm using following syntax to split column s into column s0,s1,s2....

df = df.join(df['s'].str.split(',', expand=True).add_prefix('s').fillna(np.nan))

which wull result in

p     s         s0    s1    s2
ABCD  AB,AC,AD  AB    AC    AD 
XY    XY        XY    NaN   NaN
MSD   MS,MD     MS    MD    NaN
PQRS  PQ,PR,PS  PQ    PR    PS

Now I want to pass these newly generated column values into a function along with some other column values. For Eg:

def compare(p,s0,s1,s2):
    //piece of code

Suppose the number of columns generated(Say one time 13, means s0,s1,s2,...s12 and another time 15, s0,s1,...,s13) varies from dataset to dataset(depends on number of fields present in column s separated by commas). Is there a way so that I can pass these column values dynamically to function on basis of number of columns created?

Something like following: def compare(p,[list comrehension])

Can I get any suggstions??

Avinash Clinton
  • 543
  • 1
  • 8
  • 19

1 Answers1

1

You could use the Index.difference method to generate a list of the new columns:

new_columns = df.columns.difference(old_columns).tolist()

For example,

import numpy as np
import pandas as pd

def compare(p, new_columns):
    print(new_columns)

df = pd.DataFrame({'p': ['ABCD', 'XY', 'MSD', 'PQRS'],
                   's': ['AB,AC,AD', 'XY', 'MS,MD', 'PQ,PR,PS']})

old_columns = df.columns
df = df.join(df['s'].str.split(',', expand=True).add_prefix('s').fillna(np.nan))
new_columns = df.columns.difference(old_columns).tolist()

compare(df['p'], new_columns)

prints

['s0', 's1', 's2']
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • Thanks.. I was trying to find the max length of split of column s and then I was trying with list comprehenson [s+str(i) for i in range(max)]. But It was passing string(obviously)... – Avinash Clinton Jan 31 '18 at 16:55