2

I have a cannabis dataset that has a column for "Effects" and I'm trying to add a binary "nice_buds" column for strains that do not include certain effects. This is the code:

nice_buds = []
undesired_effects = ["Sleepy", "Hungry", "Giggly", "Tingly", "Aroused", "Talkative"]

for row in sample["Effects"]:
    if "Sleepy" not in row and "Hungry" not in row and "Giggly" not in row and "Tingly" not in row and "Aroused" not in row and "Talkative" not in row:
        nice_buds.append(1)
    else:
        nice_buds.append(0)

sample["nice_buds"] = nice_buds

As of now, the undesired_effects list is doing nothing, and the code works perfectly fine in terms of giving me the desired output.

My question though is if there is a more "Pythonic" or "DRY" way to go about this ...

ekselan
  • 137
  • 1
  • 10
  • 2
    Does this answer your question? [Test if lists share any items in python](https://stackoverflow.com/questions/3170055/test-if-lists-share-any-items-in-python) – wwii Jun 05 '20 at 04:38

2 Answers2

6

You could use all() with a generator expression to simplify the if-statement

nice_buds = []
undesired_effects = ["Sleepy", "Hungry", "Giggly", "Tingly", "Aroused", "Talkative"]

for row in sample["Effects"]:
    if all(effect not in row for effect in undesired_effects):
        nice_buds.append(1)
    else:
        nice_buds.append(0)

sample["nice_buds"] = nice_buds

Or use any() & check for the presence of an effect:

nice_buds = []
undesired_effects = ["Sleepy", "Hungry", "Giggly", "Tingly", "Aroused", "Talkative"]

for row in sample["Effects"]:
    if any(effect in row for effect in undesired_effects):
        nice_buds.append(0)
    else:
        nice_buds.append(1)

sample["nice_buds"] = nice_buds
Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
rdas
  • 20,604
  • 6
  • 33
  • 46
2

Given a dataframe sample

  • Use np.where
  • Use pandas.str.contains
  • Strings have the potential to be upper or lowercase, so it's better to force one case, because Giggly != giggly
  • for row in sample["Effects"] tells me you're using a dataframe. You should never use a for-loop to iterate through a dataframe.
import pandas as pd
import numpy as np

# create dataframe
data = {'Effects': ['I feel great', 'I feel sleepy', 'I fell hungry', 'I feel giggly', 'I feel tingly', 'I feel aroused', 'I feel talkative']}

sample = pd.DataFrame(data)

|    | Effects          |
|---:|:-----------------|
|  0 | I feel great     |
|  1 | I feel sleepy    |
|  2 | I fell hungry    |
|  3 | I feel giggly    |
|  4 | I feel tingly    |
|  5 | I feel aroused   |
|  6 | I feel talkative |

undesired_effects = ["Sleepy", "Hungry", "Giggly", "Tingly", "Aroused", "Talkative"]

# words should be 1 case for matching, lower in this instance
undesired_effects = [effect.lower() for effect in undesired_effects]

# values to match as string with | (or)
match_vals = '|'.join(undesired_effects)

# create the nice buds column
sample['nice buds'] = np.where(sample['Effects'].str.lower().str.contains(match_vals), 0, 1)

display(sample)

|    | Effects          |   nice buds |
|---:|:-----------------|------------:|
|  0 | I feel great     |           1 |
|  1 | I feel sleepy    |           0 |
|  2 | I fell hungry    |           0 |
|  3 | I feel giggly    |           0 |
|  4 | I feel tingly    |           0 |
|  5 | I feel aroused   |           0 |
|  6 | I feel talkative |           0 |
Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158