I have a conditional statement I'm trying to workout in pandas in Anaconda. I've installed numpy as np.
I need to create a new "Text" field and, if the existing "truncated" field is "False", use the string in the existing "text" field. Otherwise, (or if the value of the "truncated" field is "True", use the string in the existing "extended_tweet.full_text" field.
Trying to follow instructions on this page, but it's not a direct parallel, as my 'choices' are the values of other fields, and not a given string. Pandas conditional creation of a series/dataframe column
Here's my code:
conditions = [
(df['truncated'] == 'False'),
(df['truncated'] == 'True')]
choices = ['text'], ['extended_tweet.full_text']
df['Text'] = np.select(conditions, choices, default='null')
After running that, all 'Text' values are 'null'
I've tried variations for the 'choices' options code, and am thinking the problem is the way I'm indicating the options in the choices line (the example code I'm following is using given 'string' values). But I can't sort out the right way to indicate I want the string values in the stated fields used in the new 'Text' field.
Any help greatly appreciated.
PART 2: RESPONSE TO INPUT BELOW:
Thank you. I wasn't familiar with minimal reproducible examples.
Here's what I've come up with:
df5 = pd.DataFrame([["True", "Hello", "fine"], ["False", 'Howdy', 'good'], ["False", "Hi", "bien"]], columns=['truncated', 'text', 'extended_tweet.full_text'])
print(df5)
truncated text extended_tweet.full_text
0 True Hello fine
1 False Howdy good
2 False Hi bien
conditions = [
(df5['truncated'] == 'False'),
(df5['truncated'] == 'True')]
choices = ['text'], ['extended_tweet.full_text']
df5['Text'] = np.select(conditions, choices, default='null')
df5['Text']
0 extended_tweet.full_text
1 text
2 text
Name: Text, dtype: object
However, it's returning strings for the 'text' and 'extended_tweet.full_test' fields, and not the values in those columns.
I tried the two suggestions, which I can't see now that I'm in edit mode. But here are my results:
I changed the 'choices' line to:
choices = ['text', 'extended_tweet.full_text']
And it returned this error message, and every 'Text' value was 'null':
/anaconda3/lib/python3.7/site-packages/pandas/core/ops.py:1649: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
result = method(y)
I also tried this version:
conditions = [~df['truncated'],df['truncated']]
choices = ['text'], ['extended_tweet.full_text']
df['Text'] = np.select(conditions, choices, default='null')
But like my minimal example, it produced the 'text' and 'extended.tweet_full.text' strings, and not the values in those fields.