0

What I'm trying to do: Pass a column through a regex search in order to return that will be added to another column

How: By writing a function with simple if-else clauses:

def category(series):
    pattern = 'microsoft|office|m365|o365'
    if re.search (series,pattern,re.IGNORECASE) != None:
        return 'Microsoft 365'
    else:
        return 'Not Microsoft 365'

df['Category'] = df['name'].apply(category)

Expected Output: A series with values set to Microsoft 365 or Not Microsoft 365

Actual Output: A series with None values

How I've solved it currently:

df[df['name'].str.contains(pattern,case = False), 'Category'] = 'Microsoft 365'

A snippet of the dataset:

name Category
Microsoft None
M365 None

I am trying to understand why the apply function did not work. Any insights will be appreciated. I'm fairly new to Pandas so not 100% what's going wrong.

Thank you!

yd132
  • 3
  • 2

2 Answers2

0

I think there is a small mistake when you call apply. It should be as follows.

df['Category'] = df['name'].apply(category)

The argument to the apply method should be the function you need to apply to each element of your series.

Manik Tharaka
  • 298
  • 3
  • 9
0

This should work:

import pandas as pd
import re

df = pd.DataFrame({
    'name': ['Microsoft Exchange Pro', 'Microsoft', 'microsoft', 'office', 'Office', 'M365', 'm365', 'other'], 
    'Category':[None, None, None, None, None, None, None, None]
})

def category(series):
    pattern = 'microsoft|office|m365|o365'
    if re.search (pattern, series, re.IGNORECASE) != None:
        return 'Microsoft 365'
    else:
        return 'Not Microsoft 365'

df['Category'] = df['name'].apply(category)

print(df)

Result:

                     name           Category
0  Microsoft Exchange Pro      Microsoft 365
1               Microsoft      Microsoft 365
2               microsoft      Microsoft 365
3                  office      Microsoft 365
4                  Office      Microsoft 365
5                    M365      Microsoft 365
6                    m365      Microsoft 365
7                   other  Not Microsoft 365
René
  • 4,594
  • 5
  • 23
  • 52
  • This did work, thank you! Follow up would by to try to understand why it fails to work on my original dataset. For example, putting Microsoft Exchange Pro fails and returns Not Microsoft. Any thoughts? To my understanding, re.search should not return None unless NO match is found. Am I correct? – yd132 Sep 03 '21 at 10:17
  • I updated my answer with additional examples and fixed a typo (pattern and series in the correct order now). Expect this to work for your dataset as well. – René Sep 03 '21 at 15:05