0

my code is

if name == 'main': json_data=requests.get("https://www.ebi.ac.uk/europepmc/annotations_api/annotationsByArticleIds?articleIds=PMC%3A4771370&section=Abstract&provider=Europe%20PMC&format=JSON").content r=json.loads(json_data) df = json_to_dataframe(r) print(df)

My only problem is how can run this for multiple IDs, like i have atleast thousands of ids in a file. Please help I'm using python.

Arvind
  • 67
  • 8

2 Answers2

0

Assuming you know Python and can get all the IDs from the file into a list article_ids, you can use the following script:

URL = 'https://www.ebi.ac.uk/europepmc/annotations_api/annotationsByArticleIds'

article_ids = ['PMC:4771370']

for article_id in article_ids:
    params = {
        'articleIds': article_id,
        'section': 'Abstract',
        'provider': 'Europe PMC',
        'format': 'JSON'
    }
    json_data = requests.get(URL, params=params).content
    r = json.loads(json_data)
    df = json_to_dataframe(r)
    print(df)
Siddhartha
  • 311
  • 3
  • 8
  • hi sidharth,i have updated the info in the question, can you please see again. – Arvind Feb 05 '22 at 10:44
  • thanks sidharth, can you also tell me how can i open my file, because i'm using readlines which is not giving the results. – Arvind Feb 05 '22 at 11:04
  • @Arvind, perhaps a [Python tutorial](https://www.freecodecamp.org/news/python-open-file-how-to-read-a-text-file-line-by-line/) can help. – Siddhartha Feb 05 '22 at 11:12
  • @sidhartha already tried that, maybe i'm implementing wrong, can you help? – Arvind Feb 05 '22 at 11:21
0

After analyzing the shared URL and reading the URL Encodings article, I observed that each value of annotationByArticleIDs has format of SOURCE:EXTERNAL_ID format.

TEST1: If you hit the url:

https://www.ebi.ac.uk/europepmc/annotations_api/annotationsByArticleIds?articleIds=PMC

Output is: It must contain values with format SOURCE:EXTERNAL_ID where SOURCE must have one of the following values [PMC, MED, PAT, AGR, CBA, HIR, CTX, ETH, CIT, PPR, NBK] and EXTERNAL_ID must be a number when SOURCE=PMC

  • Above output shows possible list of sources
  • Each source is separated by EXTERNAL_ID using colon
  • Colon is represented by %3A in URL Encoding article
  • In order to separate one pair of value from another value, you could use comma operator
  • Comma is represented using %2C in the same URL encoding article

ANSWER: So to fetch multiple articles you could generate string of article ids in the format SOURCE1:EXTERNAL_ID1,SOURCE2:EXTERNAL_ID2 .... SOURCE3:EXTERNAL_ID3 and append in the main url

Few Limitations:

  • Max URL Length could be 2048 characters
  • Depending upon possible ids, you will be able to fetch around 150 to 200 articles
  • You could loop over a batch of 150 and then fetch the required information
  • is it'll be a good idea for 100k ids? i'm new to programming so it'll be good idea if you provide sone code solution. – Arvind Feb 05 '22 at 11:13