0

I ran into an issue while using a conditioned function in order to create a new column in my dataframe.

While using the defined function to split the text based on an if statement the error that returned is the ValueError: ValueError: The truth value of a Series is ambiguous.

    def split_name(Name):
        if (test_data['Tussenvoegsel'].str.len() != 0) :
            temp = str(re.findall(r'(\.[^.]*)$', Name))
            return (temp[3:100])
        else :
            return ("")
    
    test_data['Achternaam'] = test_data['Naam'].apply(lambda x: split_name(x))

Dataframe column where it's applied to is:

                       Naam Tussenvoegsel
0          Dhr. V. Andersen              
1  Mevr. J.C. van der Kosan       van der
2     Dhr. P.M.M. van Zomer           van
3     Mevr. M.J.J. Raimondo              
4     Mevr. E. van de Doorn        van de

Example of the expected outcome:

                       Naam Tussenvoegsel Achternaam
0          Dhr. V. Andersen               Andersen
1  Mevr. J.C. van der Kosan       van der 
2     Dhr. P.M.M. van Zomer           van 
3     Mevr. M.J.J. Raimondo               Raimondo              
4     Mevr. E. van de Doorn        van de 

I've tried the split method, tried it without and with lambda, but I don't get it working.

Thanks!

Luuk_148
  • 11
  • 3
  • You should use `if (len(Name) != 0)`, there might be a vectorial solution to your problem though, please provide a sample dataset and the matching expected output – mozway Jul 24 '22 at 13:56
  • @mozway, thanks, already helps, but the dependency should be based on another column than the manipulation, how can that be done? – Luuk_148 Jul 24 '22 at 14:01
  • The logic is unclear, you want to extract the name except if starting with van/van de/van der? – mozway Jul 24 '22 at 14:39
  • you should run `test_data.apply(split_name)` (instead of `test_data['Naam'].apply(split_name)` and then you get full `row` in function `def split_name(row):` and you can use `row['Tussenvoegsel']` and `row["Naam"]` – furas Jul 24 '22 at 17:40
  • Thanks a lot for your answers, it reminded me to build the function with two arguments hence change the input into: ```def split_name(tussenvoegsel, Name): if (len(tussenvoegsel) == 0) : temp = str(re.findall(r'(\.[^.]*)$', Name)) return (temp[3:-2]) else: temp = re.search(tussenvoegsel, Name) tempint = int(temp.end()) return(Name[tempint:100]) test_data['Achternaam'] = test_data.apply(lambda x: split_name(x['Tussenvoegsel'], x['Naam']),axis=1) ``` – Luuk_148 Jul 28 '22 at 07:18

0 Answers0