1

I am trying to retrieve all journals that exist within the a subject area of Scopus, say 'Medicine', using the python package pybliometrics.

According to the Scopus search (online), there are 13,477 Journals in this category.

Accessing the SerialTitle API of Scopus via pybliometrics.scopus.SerialSearch() for category Medicine, the subjArea='MEDI' and subjCode='2700'. The list of all codes associated with the Scopus subject categories are listed here

I am not able to get more than 5000 journals. But with parameter subjArea='MEDI' I am able to retrieve 5000+ documents but not more than 10,000.

I do not understand why searching with subjArea and subjCode fetches different results for me. Can anyone help me understand why this could be happening?

I am adding my code for both these search queries for better understanding:

import pandas as pd
from pybliometrics.scopus import SerialSearch

def search_by_subject_area(subject_area):
    print("Searching journals by subject area....")
    df = pd.DataFrame()
    i = 0
    # limitation of i<10000 is added otherwise raises error of scopus500
    while (i > -1 and i < 10000):
        s = SerialSearch(query={"subj": f"{str(subject_area)}"}, start=f'{i}', refresh=True)
        if s.get_results_size() == 0:
            break
        else:
            i += s.get_results_size()
            df_new = pd.DataFrame(s.results)
            df = pd.concat([df, df_new], axis=0, ignore_index=True)
    print(i, " journals obtained!")

def search_by_subject_code(code):
    print("------------------------------------------------\n Searching journals by subject codes....")
    df = pd.DataFrame()
    i = 0
    while (i > -1):
        s = SerialSearch(query={"subjCode": f"{code}"}, start=f'{i}', refresh=True)
        if s.get_results_size() == 0:
            break
        else:
            i += s.get_results_size()
            df_new = pd.DataFrame(s.results)
            df = pd.concat([df, df_new], axis=0, ignore_index=True)
    print(i, " journals obtained!")

if __name__ == '__main__':

    search_by_subject_area(subject_area = 'MEDI')

    search_by_subject_code('2700')

1 Answers1

1

Certain Scopus APIs, including the Serial Search API, are restricted: They do not allow more than 5,000 results.

There are some Search APIs that have pagination active, where they allow you to cycle through a potentially unlimited number of results.

MERose
  • 4,048
  • 7
  • 53
  • 79
  • my query was related to using same SerialSearch API with different parameters. These different parameters are affecting the number of results being fetched. – stackersTech101 Jul 04 '22 at 04:47