0

I have a list of places from an excel file which I would enrich with the geonames Ids. Starting from the excel file I made a pandas Data Frame then I would use the values from the DF as params in my request. Here the script I made

import pandas as pd 


import requests
import json


require_cols = [1] 


required_df = pd.read_excel('grp.xlsx', usecols = require_cols) 

print(required_df)

url = 'http://api.geonames.org/searchJSON?'

params = {  'username': "XXXXXXXX",

            'name_equals': (required_df),

            'maxRows': "1"}

e = requests.get(url, params=params)

pretty_json = json.loads(e.content)
print (json.dumps(pretty_json, indent=2))

The problem is related to the defintion of this parameter:


 'name_equals': (required_df)

I would use the Places (around 15k) from the DF as param and recoursively retrieve the related geonames ID and write the output in a separate excel file.

The simple request works:

import requests
import json

url = 'http://api.geonames.org/searchJSON?'

params = {  'username': "XXXXXXX",

            'name_equals': "Aire",

            'maxRows': "1"}

e = requests.get(url, params=params)

pretty_json = json.loads(e.content)
print (json.dumps(pretty_json, indent=2))
#print(e.content)

As well as the definition of Pandas data frame:

# import pandas lib as pd 
import pandas as pd 

require_cols = [0,1] 

# only read specific columns from an excel file 
required_df = pd.read_excel('grp.xlsx', usecols = require_cols) 

print(required_df) 

I also tried via SPARQL without results so I decided to go via Python.

Thanks for your time.

Constantin Groß
  • 10,719
  • 4
  • 24
  • 50
Pelide
  • 468
  • 1
  • 4
  • 19
  • use loop to get every element separatelly and use it in `requests` - ie. `for item in required_df['Places']: requests(..., params={'name_equals': item})` – furas Jul 28 '19 at 10:16

2 Answers2

1

You can use for-loop

import pandas as pd

df = pd.DataFrame({'Places': ['London', 'Paris', 'Berlin']})

for item in df['Places']:
    print('requests for:', item)
    # ... rest of code ...

or df.apply()

import pandas as pd

def run(item):
    print('requests for:', item)
    # ... rest of code ...
    return 'result for ' + item

df = pd.DataFrame({'Places': ['London', 'Paris', 'Berlin']})

df['Results'] = df['Places'].apply(run)
furas
  • 134,197
  • 12
  • 106
  • 148
0

Thanks @furas for your reply.

I solved like this:


import pandas as pd 

import requests
import json

url = 'http://api.geonames.org/searchJSON?'

df = pd.read_excel('Book.xlsx', sheet_name='Sheet1', usecols="B")

for item in df.place_name:

    df.place_name.head()

    params ={   'username': "XXXXXX",

                'name_equals': item,

                'maxRows': "1"}

    e = requests.get(url, params=params)

    pretty_json = json.loads(e.content)

    for item in pretty_json["geonames"]:

        print (json.dumps(item["geonameId"], indent=2))

        with open('data.json', 'w', encoding='utf-8') as f:

            json.dump(item["geonameId"], f, ensure_ascii=False, indent=4)       

    #print(e.content)

The only problem now is related to the json output: By print I'm having the complete IDs list however, when I'm going to write the output to a file I'm getting just the last ID from the list.

Pelide
  • 468
  • 1
  • 4
  • 19
  • if you have new problem then click button `Add Question` and create new question with all information. – furas Jul 29 '19 at 10:25
  • to get all elements in file you have to open file before `for`-loop and close after `for`-loop. OR you have to open inside `for`-loop but in `append` mode - `open(..., 'a')`. Mode `w` always delete previous content. – furas Jul 29 '19 at 10:26