-1

I'm working with Pandas. I need to create a new column in a dataframe according to conditions in other columns. I try to look for each value in a series if it contains a value (a condition to return text).This works when the values are exactly the same but not when the value is only a part of the value of the series.

Sample data :

df = pd.DataFrame([["ores"], ["ores + more texts"], ["anything else"]], columns=['Symptom'])
def conditions(df5):
    if ("ores") in df5["Symptom"]: return "Things"

df["new_column"] = df.swifter.apply(conditions, axis=1)

It's doesn't work because any("something") is always True

So i tried :

df['new_column'] = np.where(df2["Symptom"].str.contains('ores'), 'yes', 'no') : return "Things"

It doesn't work because it's inside a loop. I can't use np.select because it needed two separate lists and my code has to be easily editable (and it can't come from a dict). It also doesn't work with find_all. And also not with :

df["new_column"] == "ores" is True: return "things"

I don't really understand why nothing work and what i have to do ?

Edit :

df5 = pd.DataFrame([["ores"], ["ores + more texts"], ["anything else"]], columns=['Symptom'])
def conditions(df5):
    (df5["Symptom"].str.contains('ores'), 'Things')
df5["Deversement Service"] = np.where(conditions)
df5

For the moment i have a lenght of values problem

SRP
  • 209
  • 6
  • 15
  • 1
    [How to make good reproducible pandas examples](https://stackoverflow.com/a/20159305/9177877) – It_is_Chris Oct 22 '21 at 18:16
  • 1
    `df2['new'] = np.where(df2["Symptom"].str.contains('something'), 'yes', 'no')` should suffice. What's the `return "things"` for? – Henry Yik Oct 22 '21 at 18:21
  • I edited with sample data. You're right it should but it's inside a function. I use apply and the condition is this function. With the edit it could be more clear – SRP Oct 22 '21 at 18:24
  • It still doesn't make any sense. IF you simply want to create a new column if `Symptom` contains `ores`, use `df2['new'] = np.where(df2["Symptom"].str.contains('ores'), 'Things')` outside the function. – Henry Yik Oct 22 '21 at 18:35
  • I understand but i need to have the condition in another file to edit it easier, how can i do that and import it in the np.where ? – SRP Oct 22 '21 at 18:46
  • I edited with what you suggested me to do – SRP Oct 22 '21 at 18:48

1 Answers1

1

To add a new column with condition, use np.where:

df = pd.DataFrame([["ores"], ["ores + more texts"], ["anything else"]], columns=['Symptom'])
df['new'] = np.where(df["Symptom"].str.contains('ores'), 'Things', "")

print (df)

             Symptom     new
0               ores  Things
1  ores + more texts  Things
2      anything else

If you need a single boolean value, use pd.Series.any:

if df["Symptom"].str.contains('ores').any():
    print ("Things")

# Things
Henry Yik
  • 22,275
  • 4
  • 18
  • 40
  • I understood well that np.where works but where do I store my hundreds of conditions? I would like to do it in a function in a separate file for example – SRP Oct 22 '21 at 18:59
  • It is not clear what do you mean by hundreds of conditions. Please provide some sample. – Henry Yik Oct 22 '21 at 19:00
  • If you have a list of strings like `list_of_strings=["ores", "abc", "cde"...]`, you can chain them by `df["Symptom"].str.contains('|'.join(list_of_strings))`. – Henry Yik Oct 22 '21 at 19:04
  • I have the same condition but instead of "ores" it's another string. And i also check more columns. Sometimes i have "and" and "or" also. A screen will be usefull here, i will delete after maybe : https://i.imgur.com/27N3SNz.png – SRP Oct 22 '21 at 19:04
  • You can obviously group a lot of your conditions in one using `np.select`. Also see [this](https://stackoverflow.com/a/19913845/9284423) on how to use `and` on different columns. – Henry Yik Oct 22 '21 at 19:11
  • I agree but np.select need 2 separate liste and then my code is far less editable, and i need to edit every week – SRP Oct 22 '21 at 19:21