0

i am extracting selected pages from a pdf file. and want to assign dataframe name based on the pages extracted:

file = "abc"
selected_pages = ['10','11'] #can be any combination eg ['6','14','20]
for i in selected_pages():
    df{str(i)} = read_pdf(path + file + ".pdf",encoding = 'ISO-8859-1', stream = True,area = [100,10,740,950],pages= (i), index = False)
    print (df{str(i)} )

The idea, ultimately, as in above example, is to have dataframes: df10, df11. I have tried "df" + str(i), "df" & str(i) & df{str(i)}. however all are giving error msg: SyntaxError: invalid syntax Or any better way of doing it is most welcome. thanks

murfyang
  • 1
  • 3
  • 1
    Possible duplicate of [How can you dynamically create variables via a while loop?](https://stackoverflow.com/questions/5036700/how-can-you-dynamically-create-variables-via-a-while-loop) – oreopot Oct 13 '19 at 08:31
  • Assign to a dictionary instead. df = dict() outside of your for loop, and replace the first line with df[i] = read_pdf(...). – QuantStats Oct 13 '19 at 08:32

2 Answers2

0

This is where a dictionary would be a much better option.

Also note the error you have at the start of the loop. selected_pages is a list, so you can't do selected_pages().

file = "abc"
selected_pages = ['10','11'] #can be any combination eg ['6','14','20]

df = {}
for i in selected_pages:
    df[i] = read_pdf(path + file + ".pdf",encoding = 'ISO-8859-1', stream = True, area = [100,10,740,950], pages= (i), index = False)
ba_ul
  • 2,049
  • 5
  • 22
  • 35
  • hi, however these 2 pages or dataframes, eg df10, df11 are tables of data with different number of rows. after the assigned names, the codes will then use these names to continue with some cleaning, different rows removal for each of these – murfyang Oct 13 '19 at 08:42
  • That shouldn't be a problem. You're just storing them in the dictionary for now. Later you can do whatever you want to do with that data. What exactly do you want to do eventually? It's not clear in your question. – ba_ul Oct 13 '19 at 08:46
  • thks, the 1st part is working. so after running the loop, it's df['10'] & df['11'] & i=11. my next few lines of codes: – murfyang Oct 13 '19 at 09:22
  • thanks greatly QuantStats. am new to the forum, still trying to figure how to use the add comment or add another answer. I got it figured out yesterday & posted it but somehow it's not appearing in the forum yes, according to your solution df[str(i)] – murfyang Oct 14 '19 at 12:21
0
i = int(i) - 1 # this will bring it to 10
dfB = df[str(i)]
#select row number  to drop: 0:4
dfB.drop(dfB.index[0:4],axis =0, inplace = True)
dfB.columns = ['col1','col2','col3','col4','col5']
murfyang
  • 1
  • 3