I have created a script that allows me to read multiple pdf files and extract information recursively one by one. This script generates a dictionary with data by pdf. Ex: 1º Iteration from 1º PDF file:
d = {"GGT":["transl","mut"], "ATT":["alt3"], "ATC":["alt5"], "AUC":["alteration"]}
2º In the Second Iteration from 2º PDF file:
d = {"GGT":["transl","mut"], "AUC":["alteration"]}
. . . Doing this until 200 pdf files.
Initially I have a dataframe created with all the genes that allow to detect that analysis.
df = pd.DataFrame(data=None, columns=["GGT","AUC","ATC","ATT","UUU","UUT"], dtype=None, copy=False)
Desire output: What I would like to obtain is a dataframe where the information of the values is stored in a recursive way line by line. For example:
Is there an easy way to implement this? or functions that can help me?