0

I'm trying to modify one column in a pandas dataframe. The data is originally from an excel worksheet.

I tried modifying this article using str.replace and re.sub and this is the script that I have at the moment

Question I was using to base my script around: How to replace text in a string column of a Pandas dataframe?

Ihere is what the data looks like in excel

import re
import pandas as pd


# reading the excel file
df = pd.read_excel('Upload to dashboard - Untitled (28).xlsx', skiprows = 7)

print(df.head(3))
print(df.dtypes)
df['trim']= df['Publisher URL'].str.replace(r'(.*)(?:\bm\.)(.*)|(.*)','')
df['trim2']=re.sub('(.*)(?:\bm\.)(.*)|(.*)','',df['Publisher URL'])

df.to_csv("C:/Users/sward/Downloads/out.csv")
#pd.options.display.max_colwidth = None
print(df['trim'])
print(df['trim2']) 

Currently I'm getting an error that says

C:\Users\sward\.spyder-py3\temp.py:24: FutureWarning: The default value of regex will change from True to False in a future version.
 df['trim']= df['Publisher URL'].str.replace(r'(.*)(?:\bm\.)(.*)|(.*)','')
Traceback (most recent call last):

 File ~\.spyder-py3\temp.py:25 in <module>
   df['trim2']=re.sub('(.*)(?:\bm\.)(.*)|(.*)','',df['Publisher URL'])

 File ~\Anaconda3\lib\re.py:210 in sub
   return _compile(pattern, flags).sub(repl, string, count)

TypeError: expected string or bytes-like object

I was trying to use regex to extract the domain from the Publisher URL column. I can get the regex expression. I wanted to make

https://www.healthline.com/health/gerd#home-remedies

into

www.healthline.com

And in this step I'm looking for all of the mobile versions of the website and take out the m. part of the url- the expression

Aki
  • 137
  • 1
  • 4
  • 17

0 Answers0