0

I am trying to capture all claims text tax in a bunch of xml patent files but having trouble with tags within the <claim-test>. Sometimes there's another <claim-text> and sometimes there is also <claim-ref> interrupting the text. In my output, the code gets cut off. Usually there are over 10 claims. I am trying to only get the text in the claim text.

I've already looked and tried the following but these don't work: xml elementree missing elements python and How to get all sub-elements of an element tree with Python ElementTree? I've included a snippet here as it does get quite long to capture all.

Claims_text_xml

My code for this is below (where fullname is the file name and directory).

for _, elem in iterparse(fullname):

        description = '' # reset to empty string at beginning of each loop
        abtext = '' # reset to empty string at beginning of each loop
        claimtext= '' # reset to empty string

        if elem.tag == 'claims':
            for node4 in tree.findall('.//claims/claim/claim-text'):
                claimtext =  claimtext + node4.text
                f.write('\n\nCLAIMTEXT\n\n\n') 
                f.write(smart_str(claimtext) + '\n\n')


      #put row in df          
    row = dict(zip(['PATENT_ID', 'CLASS', 'ABSTRACT', 'DESCRIPTION','CLAIMS'], [data,cat,abtext,description,claimtext]))
    row_s = pd.Series(row)           
    row_s.name = i
    df = df.append(row_s)

So the resulting problem is twofold a) I only get one of the text printed to fil and b) nothing comes into the dataframe at all. I'm not sure if that's part of the same problem or two separate problems. I can get the claims to print into a file and that works fine but skips some of the text.

Community
  • 1
  • 1
maric
  • 79
  • 1
  • 11
  • can you try xpath with `.//claims//claim/claim-text` and see if all `claim`s come up? – Anzel Jul 15 '16 at 13:30
  • Does your XML has default namespace? (namespace declared without prefix, like `xmlns="some_namespace_URI_here"`) – har07 Jul 16 '16 at 04:18
  • 1
    A [mcve] would make it easier to help. It's hard to reproduce the problem with just a screenshot of an XML fragment. – mzjn Jul 16 '16 at 13:06

0 Answers0