Changing object to string in python

Question

I'm trying to use the re library to use regular expression to substitute text. However, even after I use astype(str), the column that I want to change is still being stored as an object.

import os
import re
import pandas as pd


#https://www.geeksforgeeks.org/update-column-value-of-csv-in-python/


# reading the csv file
df = pd.read_excel('Upload to dashboard - Untitled (1).xlsx', skiprows = 7)
df.drop(df.tail(7).index,inplace=True) # drop last n rows
print(df.head(3))

df['Publisher URL'] = df['Publisher URL'].fillna('')
df['Publisher URL'] = df['Publisher URL'].astype(str)
df['Publisher URL'] = df['Publisher URL'].str.replace('gerd#','')
print(df.dtypes)
df['Publisher URL2'] = df['Publisher URL'].str.replace('www.','')
trim = re.sub('(.*)(?:\bm\.)(.*)|(.*)','',df['Publisher URL'])
#https://docs.python.org/3/library/re.html#re.sub

print(df['trim'])
df.to_csv("C:/out.csv")

this is my output for print(df.dtypes)

[3 rows x 8 columns]
Unnamed: 0        object
Buying Agency     object
Advertiser        object
Publisher URL     object
Date              object
Buy Type          object
Ad Type           object
No. of Ads       float64

And my error is

  File ~\.spyder-py3\temp.py:30 in <module>
    trim = re.sub('(.*)(?:\bm\.)(.*)|(.*)','',df['Publisher URL'])

  File ~\Anaconda3\lib\re.py:210 in sub
    return _compile(pattern, flags).sub(repl, string, count)

TypeError: expected string or bytes-like object

How can I change my script so I can use the re.sub function?

The regex is correct, I had some help from this question:Regex Exclusions

You'll also need to deal with nulls. So do `df['Publisher URL2'] = df['Publisher URL2'].fillna('')` or somethin g like that. Followed by `df['Publisher URL2'] = df['Publisher URL2'].astype(str)`. — chitown88, Oct 12 '22 at 10:36
You probably wanted `df['trim'] = df['Publisher URL'].str.replace(r'(.*)(?:\bm\.)(.*)', r'\1\2', regex=True)` — Wiktor Stribiżew, Oct 12 '22 at 10:37
hi @WiktorStribiżew, I want to replace mobile version of websites, so I want to find and replace the "m." version of websites ``` https://m.gsmarena.com/sony_xperia_5_iv-price-11838.php``` should be ``` https://gsmarena.com/sony_xperia_5_iv-price-11838.php``` However, when I try and find an replace "m." this includes websites like this ``` https://redfm.ie/``` becomes ``` https://redfie/``` which changes the website name. — Aki, Oct 12 '22 at 16:16
https://regex101.com/r/VsiuWO/1 this regex code shows the regex working the way i want it to — Aki, Oct 12 '22 at 16:17
So, `df['trim'] = df['Publisher URL'].str.replace(r'^(.*?)\bm\.', r'\1', regex=True)` will work as you need. — Wiktor Stribiżew, Oct 12 '22 at 17:26
It seems you have been tricked by regex101 regex flavor setting, too. You are using a PCRE option, and the replacement pattern is `$1`, in Python, it is `\1`. — Wiktor Stribiżew, Oct 12 '22 at 17:33

score 1 · Accepted Answer · answered Oct 12 '22 at 17:32

1

You can use

df['trim'] = df['Publisher URL'].str.replace(r'^(.*?)\bm\.', r'\1', regex=True)

Or, a better idea is to use

df['trim'] = df['Publisher URL'].str.replace(r'^(.*?[/.])m\.', r'\1', regex=True)

See regex demo #1 and regex demo #2

Details:

^ - start of string
(.*?[/.]) - Group 1 (in Python, \1, not $1, refers to this group pattern match): any chars other than line break chars, as few as possible, and then a / or .
m\. - m. substring.

answered Oct 12 '22 at 17:32

Wiktor Stribiżew

607,720
39
448
563

1

that's worked, thank you so much for the indepth answer! I didn't know that regex101 had a separate python option as well – Aki Oct 13 '22 at 10:49

Changing object to string in python

1 Answers1