I am trying to retrieve all journals that exist within the a subject area of Scopus, say 'Medicine', using the python package pybliometrics
.
According to the Scopus search (online), there are 13,477 Journals in this category.
Accessing the SerialTitle API of Scopus via pybliometrics.scopus.SerialSearch()
for category Medicine, the subjArea='MEDI'
and subjCode='2700'
. The list of all codes associated with the Scopus subject categories are listed here
I am not able to get more than 5000 journals. But with parameter subjArea='MEDI'
I am able to retrieve 5000+ documents but not more than 10,000.
I do not understand why searching with subjArea
and subjCode
fetches different results for me. Can anyone help me understand why this could be happening?
I am adding my code for both these search queries for better understanding:
import pandas as pd
from pybliometrics.scopus import SerialSearch
def search_by_subject_area(subject_area):
print("Searching journals by subject area....")
df = pd.DataFrame()
i = 0
# limitation of i<10000 is added otherwise raises error of scopus500
while (i > -1 and i < 10000):
s = SerialSearch(query={"subj": f"{str(subject_area)}"}, start=f'{i}', refresh=True)
if s.get_results_size() == 0:
break
else:
i += s.get_results_size()
df_new = pd.DataFrame(s.results)
df = pd.concat([df, df_new], axis=0, ignore_index=True)
print(i, " journals obtained!")
def search_by_subject_code(code):
print("------------------------------------------------\n Searching journals by subject codes....")
df = pd.DataFrame()
i = 0
while (i > -1):
s = SerialSearch(query={"subjCode": f"{code}"}, start=f'{i}', refresh=True)
if s.get_results_size() == 0:
break
else:
i += s.get_results_size()
df_new = pd.DataFrame(s.results)
df = pd.concat([df, df_new], axis=0, ignore_index=True)
print(i, " journals obtained!")
if __name__ == '__main__':
search_by_subject_area(subject_area = 'MEDI')
search_by_subject_code('2700')