Why Graph API skips feed posts?

Question

I am trying to implement a facebook scraper, to get insights about the reactions on feed posts of facebook-pages. I've noticed that the results (posts) of the actual day and last days are right, but the further it goes in the past, the more feed posts get skipped, and the count of the returned results is very low.

Why is Graph skipping many posts? Sometimes it skips even complete months!

Here is the code I'm using:

import json
import datetime
import csv
import time
import urllib.request  
import urllib.error
import requests
import numpy as np
import matplotlib.pyplot as plt
import json
from urllib.parse import urlencode
import pandas as pd

page_id="nytimes"

token="my_User_Token_Here" #using a user token got from [https://developers.facebook.com/tools/explorer/][1]

url="https://graph.facebook.com/v2.12/"+page_id+"/posts/?fields=id,created_time,message,shares.summary(true).limit(0),comments.summary(true).limit(0),likes.summary(true),reactions.type(LOVE).limit(0).summary(total_count).as(Love),reactions.type(WOW).limit(0).summary(total_count).as(Wow),reactions.type(HAHA).limit(0).summary(total_count).as(Haha),reactions.type(SAD).limit(0).summary(1).as(Sad),reactions.type(ANGRY).limit(0).summary(1).as(Angry)&access_token="+token+"&limit=100"

posts = []
found = False

try:
    while (True):
        print(url)
        facebook_connection = urlopen(url)
        data = facebook_connection.read().decode('utf8')
        json_object = json.loads(data)
        allposts=json_object["data"]
        allposts = np.asarray(allposts)
        created = '2018-03-01' 
        for i in range(0,100,1):
            if (pd.to_datetime(allposts[i]['created_time']) > pd.to_datetime(created)):
                #print(allposts[i]['created_time'])
                posts.append(allposts[i])
            else:
                print(i,  "%i fucking here!")
                posts.append(allposts[i])
                found = True
                break;
            if (i == 99):
                #print('here is: ' + i)
                url = json_object["paging"]["next"]
        if (found == True):
            break; 


    df=pd.DataFrame(posts)


except Exception as ex:
    print (ex)

frodik · Answer 1 · 2018-05-25T13:34:23.327

0

This is a reported bug. Since it was reported, the rules have changed with API v2.12 and only the top 600 posts per year can be reached. This is obviously bad news for developers and researchers.

edited May 25 '18 at 13:34

answered May 25 '18 at 13:29

frodik

1
2

and the newer versions of Graph? Is there any version where this bug is fixed? And is there any other solution to get all the posts now? Do you think none is now scraping fb-pages feed data? – ZelelB May 26 '18 at 22:15
I don't think it is possible, according to the api documentation. You can scrape feed data in real-time but history is limited. Please let me know if I am wrong. – frodik May 28 '18 at 10:30

Why Graph API skips feed posts?

1 Answers1