1

With reference to Python module for converting PDF to text post, the pdf file is scraped and data are extracted. While scraping, data are broken into two sperate variables. How can I merge those data and extract it as a dictionary?
E.g.

content = ['Sample Questions Set 1 ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', '01  Which function among the following can’t be accessed outside ', 'the class in java in same package? ', 'A. public void show()。 ', 'B. void show()。 ', 'C. protected show()。 ', 'D. static void show()。 ', '02  How many private member functions are allowed in a class ? ', 'A. Only 1 ', 'B. Only 7 ', 'C. Only 255 ', 'D. As many as required ', '03  Can main() function be made private? ', 'A. Yes, always。 ', 'B. Yes, if program doesn’t contain any classes。 ', 'C. No, because main function is user defined。 ', 'D. No, never。 ', '04  If private member functions are to be declared in C++ then_________。 ', 'A. private:  ', 'B. private ', 'C. private(private member list) ', 'D. private :- <private members> ', '05  If a function in java is declared private then it _________。 ', 'A. Can’t access the standard output ', 'B. Can access the standard output。 ', 'C. Can’t access any output stream。 ', 'D. Can access only the output streams。 ']

Output:

questions = [{'Qid':01,'Qtext':'Which function among the following can’t be accessed outside the class in java in same package?','A.':'public void show()。','B.':' void show()。','C.':'protected show()。','D.':'static void show()'},{'Qid':02,....},{...},{...},{...}]
Zara
  • 146
  • 1
  • 13

2 Answers2

1

The following will do:

questions = []
for s in content:
    s = s.lstrip()
    if s:
        if s[0].isdigit():
            questions.append({'Qid': len(questions) + 1, 'Qtext': s.split(maxsplit=1)[1]})
        elif s[0].isalpha() and s[1] == '.':
            questions[-1][s[:2]] = s.split(maxsplit=1)[1]
        elif questions:
            questions[-1]['Qtext'] += s

questions will become:

[{'Qid': 1, 'Qtext': 'Which function among the following can’t be accessed outside the class in java in same package? ', 'A.': 'public void show()。 ', 'B.': 'void show()。 ', 'C.': 'protected show()。 ', 'D.': 'static void show()。 '}, {'Qid': 2, 'Qtext': 'How many private member functions are allowed in a class ? ', 'A.': 'Only 1 ', 'B.': 'Only 7 ', 'C.': 'Only 255 ', 'D.': 'As many as required '}, {'Qid': 3, 'Qtext': 'Can main() function be made private? ', 'A.': 'Yes, always。 ', 'B.': 'Yes, if program doesn’t contain any classes。 ', 'C.': 'No, because main function is user defined。 ', 'D.': 'No, never。 '}, {'Qid': 4, 'Qtext': 'If private member functions are to be declared in C++ then_________。 ', 'A.': 'private:  ', 'B.': 'private ', 'C.': 'private(private member list) ', 'D.': 'private :- <private members> '}, {'Qid': 5, 'Qtext': 'If a function in java is declared private then it _________。 ', 'A.': 'Can’t access the standard output ', 'B.': 'Can access the standard output。 ', 'C.': 'Can’t access any output stream。 ', 'D.': 'Can access only the output streams。 '}]
blhsing
  • 91,368
  • 6
  • 71
  • 106
  • Condition is false for Option ' A. OOPS' as line[0].isspace() is true. – Zara Aug 15 '18 at 13:49
  • 1
    If you have strings with a leading space like that, you can use the `str.lstrip()` method to strip it first. I've updated my answer accordingly. – blhsing Aug 15 '18 at 13:59
0

This will merge them into questions list:-

import re

questions = []
loc = 0

for i in range(len(content)):
    res = content[i]
    prefix = res[0]
    if(prefix.isalpha() and res[1]=='.'):
        questions[loc][prefix + "."] = re.sub(r"[ABCD]\.\s*", '', res)
        if(prefix == "D"):loc += 1
    elif(prefix.isdigit()):
        questions.append({'Qid':loc+1, 'Qtext': re.sub(r"\d+\s+", '', res)})
    elif(len(questions) != 0):
        questions[loc]['Qtext'] += res #for this line which after a question cutted

Result :

[{'Qid': 1, 'Qtext': 'Which function among the following can’t be accessed outside the class in java in same package? ', 'A.': 'public void show()。 ', 'B.': 'void show()。 ', 'C.': 'protected show()。 ', 'D.': 'static void show()。 '}, {'Qid': 2, 'Qtext': 'How many private member functions are allowed in a class ? ', 'A.': 'Only 1 ', 'B.': 'Only 7 ', 'C.': 'Only 255 ', 'D.': 'As many as required '}, {'Qid': 3, 'Qtext': 'Can main() function be made private? ', 'A.': 'Yes, always。 ', 'B.': 'Yes, if program doesn’t contain any classes。 ', 'C.': 'No, because main function is user defined。 ', 'D.': 'No, never。 '}, {'Qid': 4, 'Qtext': 'If private member functions are to be declared in C++ then_________。 ', 'A.': 'private:  ', 'B.': 'private ', 'C.': 'private(private member list) ', 'D.': 'private :- <private members> '}, {'Qid': 5, 'Qtext': 'If a function in java is declared private then it _________。 ', 'A.': 'Can’t access the standard output ', 'B.': 'Can access the standard output。 ', 'C.': 'Can’t access any output stream。 ', 'D.': 'Can access only the output streams。 '}]
Azhy
  • 704
  • 3
  • 16