0

This is the python script i've built, in Jupyter Notebook, to query my own Instagram handle, but I cannot seem to get all of the data into a pandas dataframe, can anyone help me?

    from instagram.client import InstagramAPI    
    import pandas as pd    
    import requests  
    import json   

    access_token = "xxxx"  
    client_secret = "xxxx"  

    recentMediaResponse = requests.get("https://api.instagram.com/v1/users/self/media/recent/",params = {"access_token": access_token})
    recentMediaJson = json.loads(recentMediaResponse.text)     

    DataDF = pd.DataFrame(columns = ['numComments', 'likes', 'id', 'tags'])

    numcomments = pd.DataFrame({'numComments' : [recentMediaJson['data'][1]['comments']['count']],
                               'likes': [recentMediaJson['data'][1]['likes']['count']],
                               'id': [recentMediaJson['data'][1]['id']],
                               'tags': [recentMediaJson['data'][1]['tags']]


                               })
    Final = DataDF.append(numcomments)
    print Final

When I print the 'Final' variable i only get one id and all of the corresponding data values/strings :

enter image description here

I know there's a lot of data because this is how the 'recentMediaJson' file looks: enter image description here

Ulises Sotomayor
  • 159
  • 2
  • 14

2 Answers2

1

In this section I think you are selecting the first element in the list of dictionaries recentMediaJson['data'][1]. You will need to loop through all entries in the list and append numcomments each time to DataDF.

numcomments = pd.DataFrame({'numComments' : [recentMediaJson['data'][1]['comments']['count']],
                                   'likes': [recentMediaJson['data'][1]['likes']['count']],
                                   'id': [recentMediaJson['data'][1]['id']],
                                   'tags': [recentMediaJson['data'][1]['tags']]

Also you can visualize the json object that the api is returning much easier using pprint. Try importing pprint and running pprint.pprint(recentMediaJson) and you will be able to see the structure much better.

aegon52
  • 128
  • 1
  • 7
1

You are only including one record from the JSON data.

I got my answer from here: JSON to pandas DataFrame

This is untested, but something like this should work.

data = json.loads(recentMediaResponse.text)
numComments,likes,id,tags = [],[],[],[]
for result in data['results']:
    numcomments.append(result['comments']['count'])
    likes.append(result['likes']['count'])
    id.append(result['id'])
    tags.append(result['tags'])
df = pd.DataFrame([numcomments,likes,id,tags]).T
print df
bikemule
  • 316
  • 1
  • 9
  • Thank you, this worked! Do you happen to know how I can get more than 20 posts per query? – Ulises Sotomayor Sep 13 '17 at 23:01
  • 1
    Glad to help. Not sure about posts per query, but I would check the Instagram API documentation. It's almost definitely something you would put in the params dict. Would you please mark this as the correct answer and upvote it? – bikemule Sep 14 '17 at 18:07