Populating column based on function that returns boolean

Question

I have a function called ValidString(s) that tests a string and returns whether it's valid or not in boolean True/False. I need to use this function on a specific column that contains strings, and based on the results of true/false, populate 'good' or 'no good'.

I tried below, but it's returning either: 'float' object is not subscriptable or The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

df["good"] = df['string'].apply(ValidString)

Edit: function checks string if it complies with format "A5b2c3" where number after b and c need to equal to # after A. Abc are set text. I'm not using any libraries. For example:

isValid = False
if s == "" or s[0] != "A" or len(s)<=5 or s[1] == "0" or "b" not in s or "c" not in s:
return isValid

I know the function does work on the string. The new dataframe has a column with values exactly the same format - "A5b2c3".
I think the setup is OK, I'm struggling with basic python syntax somewhere and have been bashing my heads for hours. Is it somehow because the original function is ValidString(s) --with (s)? And I'm trying to use .apply without that?

Maybe I don't need to add it as a new column directly, can I store the booleans of the test in another dataframe, then combine them afterwards using if statements? In other words.. how do I use this function, ValidString(s), to test a column instead of a single value (s)?

The error may be caused by something in your function, so without providing your code we can't tell. — LITzman, Jul 16 '23 at 22:43
As LITzman says, you need to add the code you are tying to debug to your quesion. Please read the [How to Ask](https://stackoverflow.com/help/how-to-ask) section of the help in order to make it easier for others to help you. — Tony, Jul 16 '23 at 22:48
Hi Tony & Litz, I'm not sure I can add the whole code, it may be against academic policy. I'll add a little more info to original post. — Jack, Jul 16 '23 at 22:55
Firstly, welcome to Stack Overflow! Please take the [tour]. None of the code here would cause those errors, I don't think. And to get that second error, you must be using some library, probably Pandas. So for debugging help, you'll need to make a [mre]. Building one from the ground up should help you get around that academic policy. See also [How to ask and answer homework questions](//meta.stackoverflow.com/q/334822/4518341) and [How to make good reproducible pandas examples](/q/20109391/4518341). — wjandrea, Jul 16 '23 at 23:10

mrtig · Accepted Answer · 2023-07-16T23:31:17.437

In principle df["<col_name>"].apply(func_with_one_arg) should produce a pandas series that you can assign to a new column.

It looks like your ValidString function is not checking for nulls or situations where the input is a number or any other type that doesn't support subscripting eg: [].

Here's your code with some corrections:

def ValidString(s):
    if s is None:
        return False

    if type(s) != str:
        return False

    if     s == "" \
        or s[0] != "A" \
        or len(s)<=5 \
        or s[1] == "0" \
        or "b" not in s \
        or "c" not in s:
        return True

    return False


# Now your column logic should work
df["good"] = df['string'].apply(ValidString)

Hey mrtig! Thank you so much for this! Followed up your ideas that 1) the syntax was OK, so I created simpler dataframe and it tested OK.. so the problem was the new data. 2) The function was testing for everything except for NaNs. Adding if s is None or s != s : fixed it! Thank you again — Jack, Jul 17 '23 at 00:45

Populating column based on function that returns boolean

1 Answers1