optimize python processing json retrieved from the fb-graph-api

Question

i'm getting json data from the facebook-graph-api about:

my relationship with my friends
my friends relationships with each other.

right now my program looks like this (in python pseudo code, please note some variables have been changed for privacy):

import json
import requests

# protected
_accessCode = "someAccessToken"
_accessStr = "?access_token=" + _accessCode
_myID = "myIDNumber"

r = requests.get("https://graph.facebook.com/" + _myID + "/friends/" + _accessStr)
raw = json.loads(r.text)

terminate = len(raw["data"])

# list used to store the friend/friend relationships
a = list()

for j in range(0, terminate + 1):
    # calculate terminating displacement:
    term_displacement = terminate - (j + 1) 
    print("Currently processing: " + str(j) + " of " + str(terminate))
    for dj in range(1, term_displacement + 1):
        # construct urls based on the raw data:
        url = "https://graph.facebook.com/" + raw["data"][j]["id"] + "/friends/" + raw["data"][j + dj]["id"] + "/" + _accessStr
        # visit site *THIS IS THE BOTTLENECK*:
        reqTemp = requests.get(url)
        rawTemp = json.loads(reqTemp.text)
        if len(rawTemp["data"]) != 0:
            # data dumps to list which dumps to file
            a.append(str(raw["data"][j]["id"]) + "," + str(rawTemp["data"][0]["id"]))

outputFile = "C:/Users/franklin/Documents/gen/friendsRaw.csv"
output = open(outputFile, "w")

# write all me/friend relationship to file
for k in range(0, terminate):
    output.write(_myID + "," + raw["data"][k]["id"] + "\n")

# write all friend/friend relationships to file
for i in range(0, len(a)):
    output.write(a[i])

output.close()

So what its doing is: first it calls my page and gets my friend list (this is allowed through the facebook api using an access_token) calling a friend's friend list is NOT allowed but I can work around that by requesting a relationship between a friend on my list and another friend on my list. so in part two (indicated by the double for loops) i'm making another request to see if some friend, a, is also a friend of b, (both of which are on my list); if so there will be a json object of length one with friend a's name.

but with about 357 friends there's literally thousands of page requests that need to be made. in other words the program is spending a lot of time just waiting around for the json-requests.

my question is then can this be rewritten to be more efficient? currently, due to security restrictions, calling a friend's friend list attribute is disallowed. and it doesn't look like the api will allow this. are there any python tricks that can make this run faster? maybe parallelism?

Update modified code is pasted below in the answers section.

I would be worried about running into an API call limit of this type of query. Have you tried using FQL to make this request? — DMCS, Dec 31 '12 at 21:45
i had suspected there was a better way. for example the wolfram|alpha site makes the request in less than 1 minute. but no i have never used fql. do you have any suggestions as to the implementation? i am intent on sticking to python — franklin, Dec 31 '12 at 21:50
Python Requests can get results asynchronously -see: http://stackoverflow.com/questions/9110593/asynchronous-requests-with-python-requests — Gerrat, Dec 31 '12 at 21:52
You can use Python to do FQL. FQL is Facebook query language that you script up and pass to a normal Graph API call via Python. See https://developers.facebook.com/docs/reference/fql/ for information on the FQL syntax. — DMCS, Dec 31 '12 at 21:53
BTW, I've answered this question before. http://stackoverflow.com/questions/9788392/build-social-graph-of-friends And the answer I gave is still relevant for today too. This will save you the thousands of permutations of calls. — DMCS, Dec 31 '12 at 21:54
nice. @Gerrat does this mean that requests can be garnered concurrently? — franklin, Dec 31 '12 at 21:54
Yes...asynchronously means your requests are run in parallel, instead of synchronously. You may still run into facebook api limits though (I think it limits how many requests you can make/hour). — Gerrat, Dec 31 '12 at 22:05

score 1 · Accepted Answer · edited May 23 '17 at 12:21

Update this is the solution I came up with. Thanks @DMCS for the FQL suggestion but I just decided to use what I had. I will post the FQL solution up when I get a chance to study the implementation. As you can see this method just makes use of more condensed API calls.

Incidentally for future reference the API call limit is 600 calls per 600 seconds, per token & per IP, so for every unique IP address, with a unique access token, the number of calls is limited to 1 call per second. I'm not sure what that means for asynchronous calling @Gerrat, but there is that.

import json
import requests

# protected
_accessCode = "someaccesscode"
_accessStr = "?access_token=" + _accessCode
_myID = "someidnumber"

r = requests.get("https://graph.facebook.com/" 
    + _myID + "/friends/" + _accessStr)
raw = json.loads(r.text)

terminate = len(raw["data"])

a = list()
for k in range(0, terminate - 1):
    friendID = raw["data"][k]["id"]
    friendName = raw["data"][k]["name"]
    url = ("https://graph.facebook.com/me/mutualfriends/" 
        + friendID + _accessStr)
    req = requests.get(url)
    temp = json.loads(req.text)
    print("Processing: " + str(k + 1) + " of " + str(terminate))
    for j in range(0, len(temp["data"])):
        a.append(friendID + "," + temp["data"][j]["id"] + "," 
            + friendName + "," + temp["data"][j]["name"])

# dump contents to file:
outputFile = "C:/Users/franklin/Documents/gen/friendsRaw.csv"
output = open(outputFile, "w")
print("Dumping to file...")
# write all me/friend relationships to file
for k in range(0, terminate):
    output.write(_myID + "," + raw["data"][k]["id"] 
        + ",me," + str(raw["data"][k]["name"].encode("utf-8", "ignore")) + "\n")

# write all friend/friend relationships to file
for i in range(0, len(a)):
    output.write(str(a[i].encode("utf-8", "ignore")) + "\n")

output.close()

I haven't been searching specially for this, but your code helped me to understand hot to use Facebook API request with URL. Thanks! — rzaaeeff, Jul 01 '15 at 15:34

Gerrat · Answer 2 · 2012-12-31T22:08:44.180

This isn't likely optimal, but I tweaked your code a bit to use Requests async method (untested):

import json
import requests
from requests import async

# protected
_accessCode = "someAccessToken"
_accessStr = "?access_token=" + _accessCode
_myID = "myIDNumber"

r = requests.get("https://graph.facebook.com/" + _myID + "/friends/" + _accessStr)
raw = json.loads(r.text)

terminate = len(raw["data"])

# list used to store the friend/friend relationships
a = list()

def add_to_list(reqTemp):
    rawTemp = json.loads(reqTemp.text)
    if len(rawTemp["data"]) != 0:
        # data dumps to list which dumps to file
        a.append(str(raw["data"][j]["id"]) + "," + str(rawTemp["data"][0]["id"]))

async_list = []
for j in range(0, terminate + 1):
    # calculate terminating displacement:
    term_displacement = terminate - (j + 1) 
    print("Currently processing: " + str(j) + " of " + str(terminate))
    for dj in range(1, term_displacement + 1):
        # construct urls based on the raw data:
        url = "https://graph.facebook.com/" + raw["data"][j]["id"] + "/friends/" + raw["data"][j + dj]["id"] + "/" + _accessStr

        req = async.get(url, hooks = {'response': add_to_list})
        async_list.append(req)

# gather up all the results
async.map(async_list)

outputFile = "C:/Users/franklin/Documents/gen/friendsRaw.csv"
output = open(outputFile, "w")

optimize python processing json retrieved from the fb-graph-api

2 Answers2