1

Hello problem to loop over a column by searching a list of word then create a Boolean column if any of the list word searched is found. Here is my code

# Code naf related to sport.
code = ["3230Z","4764Z","7721Z","8551Z","9311Z", "9312Z", "9313Z", "9319Z",
        "9329Z", "364Z" "524W", "714B", "804C", "926A", "926C", "930L", "927C",
        "923K"]

# check keywords of code into "Code_Naf" column
for branch in code:
    df_codeNaf["topNAF"] = df_codeNaf["Code_NAF"].str.contains("3230Z" or "4764Z" or "7721Z" or "8551Z"
                                                                      or "9311Z" or "9312Z" or "9313Z" or "9319Z"
                                                                      or "9329Z" or "364Z" "524W" or "714B" or
                                                                      "804C" or "926A" or "926C" or "930L" or
                                                                      "927C" or "923K")

When I look in the topNaf column I found only 2 True but in reality there more than two. What's wrong with my code? Thanks

abdoulsn
  • 842
  • 2
  • 16
  • 32
  • 2
    `"xxx" or "yyy"` isn't doing what you think it does – pault Sep 11 '19 at 13:57
  • First i suggest you to remove the huge "contains". It should exist another method to check if is in list... – Lore Sep 11 '19 at 13:58
  • 1
    https://stackoverflow.com/questions/26577516/how-to-test-if-a-string-contains-one-of-the-substrings-in-a-list-in-pandas – BENY Sep 11 '19 at 13:58
  • Further reading: [Strange use of “and” / “or” operator](https://stackoverflow.com/questions/47007680/strange-use-of-and-or-operator) – pault Sep 11 '19 at 14:00

2 Answers2

2

Your problem is you change df_codeNaf['topNAF'] with every single banch in code. You code can be fixed by:

df_codeNaf['topNAF'] = False
for branch in code:
    df_codeNaf['topNAF'] = df_codeNaf['topNAF'] | df_codeNaf['Code_NAF'].str.contains(branch).

But better yet, you can try regex with contains in one line:

pattern = '|'.join(code)
df_codeNaf['topNAF'] = df_codeNaf['Code_NAF'].str.contains(pattern)
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
1

Here a method using lambda

code = ["3230Z","4764Z","7721Z","8551Z","9311Z", "9312Z", "9313Z", "9319Z",
        "9329Z", "364Z" "524W", "714B", "804C", "926A", "926C", "930L", "927C",
        "923K"]

df_codeNaf["topNAF"] = df_codeNaf["Code_NAF"].apply(lambda x: True if x in code else False)
HazimoRa3d
  • 517
  • 5
  • 12
  • `apply` is slow and inefficient - since `str.contains` can do the same job, it would be much more recommended. – r.ook Sep 11 '19 at 14:00