0

I need to create a function to check the length of a string in dataframe columns.

I have this code

df['XXX'] = df['XXX'].map(lambda x: x if isinstance(x, (datetime)) else None)
df_col_len = int(df['XXX']].str.encode(encoding='utf-8').str.len().max())
if df_col_len > 4:
  print("In this step it will send a email")

The problem is that I have about 20 columns and each column should have a different length.

I need to check if the 1st column has max length <4, the 3rd column max length <50, the 7th column max length <47, etc. And then if a column does not meet the condition, write which column does not meet it.

Do you have an idea how to check the necessary columns at once?

Thanks

Winter
  • 87
  • 9

1 Answers1

1

You can use .lt (lower than) on dataframes:

Sample data:

import pandas as pd
import numpy as np

d1 = {'A': {0: 'a', 1: 'ab', 2: 'abc'}, 'B': {0: 'abcd', 1: 'abcde', 2: 'abcdef'}, 'C': {0: 'abcdefg', 1: 'abcdefge', 2: 'abcdefgeh'}}
df = pd.DataFrame(d1)

Code:

max_len = {'A': 2, 'B': 5, 'C': 10}

# return length of element in your dataframe
df_check = df.applymap(len)
# create a new auxiallry dataframe with the values you want as a maximum
df_max = pd.DataFrame(np.repeat(pd.DataFrame(max_len, index=[1]).values, len(df), axis=0), columns=df.columns)

# check if the length of the actual value are *lower than* their max
df_check.lt(df_max)

Output:

Input, looks like:

     A       B          C
0    a    abcd    abcdefg
1   ab   abcde   abcdefge
2  abc  abcdef  abcdefgeh


Output, looks like:

       A      B     C
0   True   True  True
1  False  False  True
2  False  False  True

Additional notes:

To then find the column name you can look into this question.

Andreas
  • 8,694
  • 3
  • 14
  • 38