0

Am having dataframe, I want to use apply function or lambda function for string column values in a dataframe to apply if-else conditions for columns. i have tried with for loop iterations

      Input Dataframe
      text1                                        output_column
     ['bread','bread','bread']                      ['bread] --> [ if count values >2 ]
     ['bread','butter','jam']                       ['butter']--> [if all 3 values are unique select 1st element value as output]
     ['bread','jam','jam']                          ['jam']--> [if count values >2]
     ['unknown']                                    ['unknown'] --> [if any of the value came as blank or null mark it as 'unknown']
     

         ################## I tried below lines of code#########

         output_column=[]
         df_value = df[['text_col1','text_col2','text_col3']].values.tolist()
          if np.all(df_value <= 1):
             output_column.append(df_value[1])
          else:
             output_column.append(max_count[np.argmax(df_value)])   


       output Dataframe
      text1                                        output_column
     ['bread','bread','bread']                      ['bread'] 
     ['bread','butter','jam']                       ['butter']
     ['bread','jam','jam']                          ['jam']
     ['unknown']                                    ['unknown']
s nandan
  • 83
  • 7

1 Answers1

0
import pandas as pd

df = pd.DataFrame({'text1': [['bread', 'bread', 'bread'],
                             ['bread', 'butter', 'jam'],
                             ['bread', 'jam', 'jam'],
                             ['unknown']]})

List cells aren't good, so let's explode them:

df = df.explode('text1')

>>> df.head()
     text1
0    bread
0    bread
0    bread
1    bread
1   butter

Now you can use groupby to apply a function to each document (by grouping by index level 0).

The details of the heuristic are up to you, but here's something to start with:

def get_values(s):
    counts = s.value_counts()
     
    if "unknown" in counts:
        return "unknown"
    
    if counts.eq(1).all():
        return s.iloc[1]

    if counts.max() >= 2:
        return counts.idxmax()

Apply to each group:

>>> df.groupby(level=0).text1.apply(get_values)
0      bread
1     butter
2        jam
3    unknown
Name: text1, dtype: object
fsimonjetz
  • 5,644
  • 3
  • 5
  • 21
  • For large dataframe which solution will be ideal for loop or apply function? – s nandan Mar 21 '22 at 13:43
  • As a rule of thumb, for-loops are almost never the right way to work with pandas. Cf. [this post](https://stackoverflow.com/a/55557758/15873043). – fsimonjetz Mar 21 '22 at 15:56
  • I agree with your point @fsimonetz but for particular scenario rather than apply function for loop has taken minimal time to run because after apply function to convert output column as a dataframe its taking time i think so – s nandan Mar 22 '22 at 10:15