Validate if User Id on Twitter to be able to scrape tweets

Question

I created a scraper with python that gets all the followers of a particular twitter user. The issue is that when I use this list of user Ids to get their tweets with logstash, I have an Error. I used http://gettwitterid.com/ to manually check if these Ids are working, and they are but the list is really long to check it one by one.

Is there a solution with python to split the Ids into two lists, one containing Valid Ids and the other contains the Not valid ones, thet I use the Valid list as input for logstash? The first 10 rows of the csv file is like this : "id" "602169027" "95104995" "874339739557670912" "2981270769" "93054327" "870723159011545088" "3008493180" "874804469082533888" "756339889092829184" "1077712806"

I tried this code to get tweets using Ids imported from csv, but unfortunetly it's raising 144 (Not found)

import tweepy
import pandas as pd

consumer_key = ""
consumer_secret = ""
access_token_key = "-"
access_token_secret = ""

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token_key, access_token_secret)

api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)

dfuids = pandas.read_csv('Uids.csv')
for index, row in dfuids.iterrows():
    print row['id']
tweet = api.get_status(dfuids['id'])

importing ids from csv

Isn't a 144 exactly what you'd expect if an ID is invalid? Since your doc contains invalid IDs? Also, why do you call `get_status` from outside your loop and from `dfuids` rather than `row`? — patrick, Jul 18 '17 at 21:10
The problem is that All the IDs got 144 (not found), Those IDs with 144 I manually found them valid using that website in the question. Not all IDs are invalid some but of them, since the list is long coudln't do this manually ! I am also beginner Python user, therefore I don't trust my coding skills ! — lazurens, Jul 18 '17 at 22:13

score 0 · Answer 1 · answered Jul 18 '17 at 22:19

0

Try to change your code to this:

for index, row in dfuids.iterrows():
    print row['id']
    tweet = api.get_status(row['id'])

To escape potential errors, you can add a try / except loop later.

answered Jul 18 '17 at 22:19

patrick

4,455
6
44
61

I think the problem is that when reading Ids from csv it's not in the same format: Original ID : "602169027", When printed on screen after running python to import the ids it became : 602169027.0 – lazurens Jul 18 '17 at 22:44
@lazurens can you paste excerpt from your csv file? Also, can you run `dfuid.head()` and `print type(row['id'])` (the last one in the loop)? That should give us an idea – patrick Jul 18 '17 at 23:30
Thanks for the help, and fortunetly I got the solution after some experiments – lazurens Jul 19 '17 at 20:57

score 0 · Answer 2 · answered Jul 19 '17 at 21:10

I got the solution after some experiments:

dfuids = pd.read_csv('Uids.csv')
valid = []
notvalid = []
for index, row in dfuids.iterrows():
    print index
    x = str(row.id)
    #print x , type(x)
    try:
        tweet = api.user_timeline(row.id)
        #print "Fine :",row.id
        valid.append(x)
        #print x, "added to valid"
    except:
        #print "NotOk :",row.id
        notvalid.append(x)
        #print x, "added to valid"

This Part of the code was what I needed, so it loops for all the Ids, and test if that user id give us some tweets from the timeline, if correct then it's appended as string to a list called (valid) else if we have an exception for any reason then it's appended to (notvalid).

We can save this list into a dataframe and export csv :

df = pd.DataFrame(valid)
dfnotv = pd.DataFrame(notvalid)
df.to_csv('valid.csv', index=False, encoding='utf-8')
dfnotv.to_csv('notvalid.csv', index=False, encoding='utf-8')

Validate if User Id on Twitter to be able to scrape tweets

2 Answers2