I have a piece of code I use to retrieve specific rows in a large usage data file for specific items which I write out as a list. For any matches in the DOI field the entire row is retrieved.
import pandas as pd
import numpy as np
from pandas import DataFrame
df_usage = pd.read_csv(r"usage_data.csv", encoding='latin-1')
article_usage = df.loc[df['DOI'] .isin(['DOI_1','DOI_2','DOI_3'])]
article_usage.to_csv(r"article_usage.csv")
However, I now need to retrieve a much larger amount of rows and so instead of writing out a list in my code with each item I want to change this to read a CSV containing all the items for which I need to retrieve rows in the data file.
df_usage = pd.read_csv(r"usage_data.csv", encoding='latin-1')
df_item_list = pd.read_csv(r"item_list.csv", encoding='latin-1', header=None)
article_usage = df_usage.loc[df_usage['DOI'] .isin(df_item_list)]
article_usage.to_csv(r"article_usage.csv")
The usage data file is a large CSV (100K+ rows and 53 columns with "DOI" being the column I am trying to match by).
DOI Jan Feb Mar
0 DOI_1 27 123 345
1 DOI_2 89 34 234
2 DOI_3 65 56 102
The item list CSV is just a file containing each DOI I am trying to match.
0
0 DOI_1
1 DOI_2
2 DOI_3
When I run the above the article_usage.csv file will only contain the data for the first item in my article list. How can i modify this code to retrieve data for all the items in the list? Any help would be greatly appreciated!