i have inserted data into pandas dataframe. like the picture suggest
as you can see there are some rows that contain url links i want to remove all the url links and replace them with " " (nothing just wiping it ) as you can see row 4 has a url there are other rows too that have url. i want to go through all the rows in the status_message column find any url and remove them. i've been looking at this How to remove any URL within a string in Python but am not sure how to use to it on the dataframe. so row 4 should like vote for labour register now.
Asked
Active
Viewed 7,095 times
2

Muneeb Khan
- 87
- 1
- 4
- 17
4 Answers
8
You can use str.replace
with case=False
parameter:
df = pd.DataFrame({'status_message':['a s sd Www.labour.com',
'httP://lab.net dud ff a',
'a ss HTTPS://dd.com ur o']})
print (df)
status_message
0 a s sd Www.labour.com
1 httP://lab.net dud ff a
2 a ss HTTPS://dd.com ur o
df['status_message'] = df['status_message'].str.replace('http\S+|www.\S+', '', case=False)
print (df)
status_message
0 a s sd
1 dud ff a
2 a ss ur o

jezrael
- 822,522
- 95
- 1,334
- 1,252
-
1Yes, very similar, only one difference there is - `case=False` for case insensitive. – jezrael Jul 30 '17 at 04:11
-
1plus one for `case = False` – Bharath M Shetty Jul 30 '17 at 04:12
2
You can use .replace()
with regex to do that i.e
df = pd.DataFrame({'A':['Nice to meet you www.xy.com amazing','Wow https://www.goal.com','Amazing http://Goooooo.com']})
df['A'] = df['A'].replace(r'http\S+', '', regex=True).replace(r'www\S+', '', regex=True)
Output :
A 0 Nice to meet you amazing 1 Wow 2 Amazing

Bharath M Shetty
- 30,075
- 6
- 57
- 108
0
I think you could do something simple as
for index,row in data.iterrows():
desc = row['status_message'].lower().split()
print ' '.join(word for word in desc if not word.startswith(('www.','http')))
as long as the urls start with "www."

Gayatri
- 2,197
- 4
- 23
- 35