0

I have created a script that allows me to read multiple pdf files and extract information recursively one by one. This script generates a dictionary with data by pdf. Ex: 1º Iteration from 1º PDF file:

d = {"GGT":["transl","mut"], "ATT":["alt3"], "ATC":["alt5"], "AUC":["alteration"]}

2º In the Second Iteration from 2º PDF file:

d = {"GGT":["transl","mut"], "AUC":["alteration"]}

. . . Doing this until 200 pdf files.

Initially I have a dataframe created with all the genes that allow to detect that analysis.

df = pd.DataFrame(data=None, columns=["GGT","AUC","ATC","ATT","UUU","UUT"], dtype=None, copy=False)

Desire output: What I would like to obtain is a dataframe where the information of the values is stored in a recursive way line by line. For example:

enter image description here

Is there an easy way to implement this? or functions that can help me?

Enrique
  • 842
  • 1
  • 9
  • 21
  • What does 1º mean? Did you mean №1? Or is this some scientific notation? – Ivan Gorin Jan 01 '21 at 23:47
  • how are the different dictionaries stored, e.g. your 1 degree and 2 degree dictionaries? Are they 200 separate variables stored as dictionaries? Or one list of 200 dictionaries? Also, it sounds like you are simply trying to create 200 X 6 dataframe based off the key, values of the 200 dictionaries? – David Erickson Jan 01 '21 at 23:55
  • @ivan I mean n°1. – Enrique Jan 02 '21 at 00:00
  • @david Erickson The dictionary is created in a for loop that read each pdf one by one, not creating 200 separated variables. For that reason I would like to export the information to dataframe and then read another pdf and replace dictionary variable with the new data. I don't know if I answered your question.. – Enrique Jan 02 '21 at 00:06
  • Does my answer solve this part of your problem: " For that reason I would like to export the information to dataframe"? – David Erickson Jan 02 '21 at 00:12

1 Answers1

2

IIUC, you are trying to loop through the dictionaries and add them as rows in your dataframe? I'm not sure how this applies to recursion with "What I would like to obtain is a dataframe where the information of the values is stored in a recursive way line by line."

d1 = {"GGT":["transl","mut"], "ATT":["alt3"], "ATC":["alt5"], "AUC":["alteration"]}
d2 = {"GGT":["transl","mut"], "AUC":["alteration"]}
dicts = [d1, d2] #imagine this list contains the 200 dictionaries
df = pd.DataFrame(data=None, columns=["GGT","AUC","ATC","ATT","UUU","UUT"], dtype=None, copy=False)
for d in dicts: #since only 200 rows a simple loop with append
    df = df.append(d, ignore_index=True)
df
Out[1]: 
             GGT           AUC     ATC     ATT  UUU  UUT
0  [transl, mut]  [alteration]  [alt5]  [alt3]  NaN  NaN
1  [transl, mut]  [alteration]     NaN     NaN  NaN  NaN
David Erickson
  • 16,433
  • 2
  • 19
  • 35