1

I am using tweepy to get tweets pertaining to a certain hashtag(s) and then I send them to a certain black box for some processing. However, tweets containing any URL should not be sent. What would be the most appropriate way of removing any such tweets?

3 Answers3

5

In your query add -filter:links. This will exclude tweets containing urls.

JeffProd
  • 3,088
  • 1
  • 19
  • 38
1

To go with @Colin's suggestion, this question covers the issue of finding urls with regex.

An example code snippet would be;

import re

// tweet_list is a list containing string you with to clean of urls
pattern = 'https?://(?:[-\w.]|(?:%[\da-fA-F]{2}))+'
filtered_tweet_list = [tweet for tweet in tweet_list if not re.findall(pattern, tweet)]
Pax Vobiscum
  • 2,551
  • 2
  • 21
  • 32
0

You can also exclude tweets with urls when querying:

if 'https:/' not in tweet.text:
    <do something eg. get tweet or in your case: send tweet>
Lyrax
  • 331
  • 2
  • 6
  • This does not answer the question. Hence, it should be removed... came here from review – finnmglas Sep 15 '20 at 18:47
  • I know, hence my use of 'also'. My answer is intended to help other programmers coming here to get ideas to edit their scripts to fit this need. Also, this same method can be used on already scrapped tweets to exclude those with urls! – Lyrax Sep 16 '20 at 20:08