I'm trying to modify one column in a pandas dataframe. The data is originally from an excel worksheet.
I tried modifying this article using str.replace and re.sub and this is the script that I have at the moment
Question I was using to base my script around: How to replace text in a string column of a Pandas dataframe?
import re
import pandas as pd
# reading the excel file
df = pd.read_excel('Upload to dashboard - Untitled (28).xlsx', skiprows = 7)
print(df.head(3))
print(df.dtypes)
df['trim']= df['Publisher URL'].str.replace(r'(.*)(?:\bm\.)(.*)|(.*)','')
df['trim2']=re.sub('(.*)(?:\bm\.)(.*)|(.*)','',df['Publisher URL'])
df.to_csv("C:/Users/sward/Downloads/out.csv")
#pd.options.display.max_colwidth = None
print(df['trim'])
print(df['trim2'])
Currently I'm getting an error that says
C:\Users\sward\.spyder-py3\temp.py:24: FutureWarning: The default value of regex will change from True to False in a future version.
df['trim']= df['Publisher URL'].str.replace(r'(.*)(?:\bm\.)(.*)|(.*)','')
Traceback (most recent call last):
File ~\.spyder-py3\temp.py:25 in <module>
df['trim2']=re.sub('(.*)(?:\bm\.)(.*)|(.*)','',df['Publisher URL'])
File ~\Anaconda3\lib\re.py:210 in sub
return _compile(pattern, flags).sub(repl, string, count)
TypeError: expected string or bytes-like object
I was trying to use regex to extract the domain from the Publisher URL column. I can get the regex expression. I wanted to make
https://www.healthline.com/health/gerd#home-remedies
into
www.healthline.com
And in this step I'm looking for all of the mobile versions of the website and take out the m. part of the url- the expression