4

Alright, I been banging my head at this problem for the past couple of days without a solution (scouring the internet and StackOverflow posts for potential solutions without avail), so maybe someone else knows what's going on here.

I'm trying to parse JMDict with SimpleXML, but I'm hitting really strange errors. The other weird thing is that the errors only happen when I'm debugging on Android (SDK 23, version 6.0.1), never locally on my box (as a unit test). There are 2 errors I'm currently getting:

Error 1

org.xmlpull.v1.XmlPullParserException: Unexpected token (position:TEXT\n@399:1 in java.io.InputStreamReader@e2fbd29)

I know what's causing this error is the comment between the DTD declaration and the root element, and the newline that's there. It goes away when removed.

<!ENTITY joc "jocular, humorous term">
<!ENTITY anat "anatomical term">
]>
<!-- JMdict created: 2016-04-26 -->
<JMdict>
<entry>

But why does this happen? How can I suppress this error? And more importantly, why does it happen only when I run it in Android? (It runs completely fine in my unit test)

Error 2

04-27 23:58:13.038 9527-9527/ca.fuwafuwa.kaku W/System.err: org.xmlpull.v1.XmlPullParserException: unresolved: &n; (position:ENTITY_REF null@408:9 in java.io.InputStreamReader@bc1fe42)

The error that's causing this is probably:

<entry>
    <ent_seq>1000000</ent_seq>
    <r_ele>
        <reb>ヽ</reb>
    </r_ele>
    <r_ele>
        <reb>くりかえし</reb>
    </r_ele>
    <sense>
        <pos>&n;</pos>    <------------------------------------ THIS GUY
        <gloss>repetition mark in katakana</gloss>
        <gloss xml:lang="ita">simbolo di ripetizione in katakana</gloss>
    </sense>
</entry>

Except this is definitely defined in the DTD:

<!ENTITY n "noun (common) (futsuumeishi)">

Stuff Attempted

And again, both these errors do NOT occur when run as a unit test (on the computer, not on a Android device / emulator), and the entire JMDict gets parsed correctly. Here's the unit test:

Stuff Works as a Unit Test!

public class ExampleUnitTest {

    @BeforeClass
    public static void ClassSetup(){
        // Needed because there's a 64000 default limit in the JDK
        // I didn't see this limit on Android for some reason
        System.setProperty("jdk.xml.entityExpansionLimit", "0");
    }

    @Test
    public void wtfXmlSrsly() throws Exception {
        Serializer serializer = new Persister();
        File file = new File("D:\\Android\\JMDictOriginal.xml");
        JmDict dict = serializer.read(JmDict.class, file, false);
    }
}

Here's the Android version failing gloriously:

Not a unit test? Haha time to fail!

public void parseDict() throws Exception{

    Log.d(TAG, "INITIALIZING DICTIONARY");

    long startTime = System.currentTimeMillis();

    Serializer serializer = new Persister();
    String fileLoc = mContext.getExternalFilesDir(null).getAbsolutePath();
    File file = new File(fileLoc, "JMDict.xml");

    Log.d(TAG, file.getAbsolutePath());

    JmDict dict = serializer.read(JmDict.class, file, false);

    Log.d(TAG, String.format("FINISHED, TOOK %d", System.currentTimeMillis() - startTime));
}

As far as I can tell, those are basically doing the same thing.

SimpleXML Deserialization Objects

@Root(name="JMdict")
public class JmDict {
    @ElementList(entry = "entry", inline = true)
    List<JmEntry> entry;
}

@Root(name="entry")
public class JmEntry {
    @Element(name = "ent_seq")
    private String ent_seq;
    @ElementList(entry = "k_ele", inline = true, required = false)
    private List<JmKEle> k_ele;
    @ElementList(entry = "r_ele", inline = true)
    private List<JmREle> r_ele;
    @Element(name = "info", required = false)
    private JmInfo info;
    @ElementList(entry = "sense", inline = true)
    private List<JmSense> sense;
}

@Root(name = "k_ele")
public class JmKEle {
    @Element(name = "keb")
    private String keb;
    @ElementList(entry = "ke_inf", inline = true, required = false)
    private List<String> ke_inf;
    @ElementList(entry = "ke_pri", inline = true, required = false)
    private List<String> ke_pri;
}

@Root(name = "r_ele")
public class JmREle {
    @Element(name = "reb")
    private String reb;
    @Element(name = "re_nokanji", required = false)
    private String re_nokanji;
    @ElementList(entry = "re_restr", inline = true, required = false)
    private List<String> re_restr;
    @ElementList(entry = "re_inf", inline = true, required = false)
    private List<String> re_inf;
    @ElementList(entry = "re_pri", inline = true, required = false)
    private List<String> re_pri;
}

@Root(name = "info")
public class JmInfo {
    @ElementList(entry = "links", inline = true, required = false)
    private List<Links> links;
    @ElementList(entry = "bibl", inline = true, required = false)
    private List<Bibl> bibl;
    @ElementList(entry = "etym", inline = true, required = false)
    private List<String> etym;
    @ElementList(entry = "audit", inline = true, required = false)
    private List<Audit> audit;
}

@Root(name = "links")
public class Links {
    @Element(name = "link_tag")
    private String link_tag;
    @Element(name = "link_desc")
    private String link_desc;
    @Element(name = "link_uri")
    private String link_uri;
}

@Root(name = "bibl")
public class Bibl {
    @Element(name = "bib_tag", required = false)
    private String bib_tag;
    @Element(name = "bib_txt", required = false)
    private String bib_txt;
}

@Root(name = "audit")
public class Audit {
    @Element(name = "upd_date")
    private String upd_date;
    @Element(name = "upd_detl")
    private String upd_detl;
}

@Root(name = "sense")
public class JmSense {
    @ElementList(entry = "stagk", inline = true, required = false)
    private List<String> stagk;
    @ElementList(entry = "stagr", inline = true, required = false)
    private List<String> stagr;
    @ElementList(entry = "pos", inline = true, required = false)
    private List<String> pos;
    @ElementList(entry = "xref", inline = true, required = false)
    private List<String> xref;
    @ElementList(entry = "ant", inline = true, required = false)
    private List<String> ant;
    @ElementList(entry = "field", inline = true, required = false)
    private List<String> field;
    @ElementList(entry = "misc", inline = true, required = false)
    private List<String> misc;
    @ElementList(entry = "s_inf", inline = true, required = false)
    private List<String> s_inf;
    @ElementList(entry = "lsource", inline = true, required = false)
    private List<String> lsource;
    @ElementList(entry = "dial", inline = true, required = false)
    private List<String> dial;
    @ElementList(entry = "gloss", inline = true, required = false)
    private List<String> gloss;
    @ElementList(entry = "example", inline = true, required = false)
    private List<String> example;
}

Alternatives

Alternatively, if someone can suggest a XML parsing framework on Android that'll actually work for this use case, that works too. It should also handle this case (SimpleXML doesn't do it very well):

<!ELEMENT gloss (#PCDATA | pri)*>
<!ELEMENT pri (#PCDATA)>
<!-- Above breaks down into: -->

<gloss>GLOSS TEXT</gloss>
<!-- or -->
<gloss><pri>PRI TEXT</pri></gloss>

I thought parsing XML was a solved problem by now. I never expected to be stuck on XML parsing for 3 days straight :/

Update

Alright, I'm almost sure something's broken on Android now, and it can't parse DTD entities or something. I read here that you have to exclude some dependencies that are there by default in Android - maybe the implementation changed there somehow?

I wasn't even able to parse this really simple XML file with a ENTITY declaration:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE XmlTest [
<!ELEMENT XmlTest (sometest, atest*)>
<!ELEMENT sometest (#PCDATA) >
<!ELEMENT atest (#PCDATA) >
<!ENTITY noun "noun">
]><XmlTest>
    <sometest>sometest</sometest>
    <atest>test1</atest>
    <atest>&noun;</atest>
</XmlTest>

It resulted in:

org.xmlpull.v1.XmlPullParserException: unresolved: &noun; (position:ENTITY_REF null@10:18 in java.io.InputStreamReader@4c793e1)

Community
  • 1
  • 1
0xbad1d3a5
  • 126
  • 1
  • 6

0 Answers0