10

I am currently developing an Android app which is a Dictionary, where I am fetching meanings online with Wiktionary API with this: [http://en.wiktionary.org/w/api.php?action=query&prop=revisions&titles=overflow&rvprop=content&format=jsonfm

But I want to download the Wiktionary database offline and embed it inside my Android App.

Here is the Wiktionary Database Download Page:
1. Wiktionary
2. Wikimedia Downloads

According to my research I found out that Wiktionary Offline Database is in XML and SQL. But these files are too big. Embedding these files would make the APK size huge.
So is there any solution to embed this easily in my App?

zackygaurav
  • 4,369
  • 5
  • 26
  • 40
  • how much size it is taking that resources? – GvSharma Feb 07 '16 at 09:58
  • Wiktionary Offline Uncompressed is 700MBs and 7zip compressed one is 100 MBs – zackygaurav Feb 07 '16 at 09:59
  • Is there a version with the more common words? – Carlos Feb 07 '16 at 10:11
  • 1
    http://developer.android.com/intl/zh-cn/google/play/expansion-files.html this may helps you – GvSharma Feb 07 '16 at 12:20
  • 100mb is not that great, people download facebook app which is more then 140mb, if your app fits in 10mb. – Akash Kava Feb 14 '16 at 10:00
  • @AkashKava its only 100mb whilst compressed, each time it's used the end-user will wait a long time for the uncompressing to 700mb and must have nearly a gigabyte free of mobile ram to hold that data until app is closed. It'll crash everytime alongside running games, facebook, youtube etc. – VC.One Feb 14 '16 at 20:30
  • @VC.One, if you create a service that will run in background and uncompress when using it first time, you can use some uncompress algorithm with fixed buffer size of few kb that would be fine. You don't need gigabyte to uncompress. – Akash Kava Feb 15 '16 at 07:38
  • @Akash I hear you. I'm no expert on **bzip** compression. My impression was you can't just sample a random x-num of bytes from the middle and de-compress to find a "half paragraph" of some text document. If possibe then fine. I downloaded their listed 267mb **bz2** file and it de-compressed to a 1.2 gig XML file. I didnt want him doing **that** to his users devices ram or storage just to check one word in a dictionary. I assume `using it first time` means save that on memory for later? Is it fair... just for text? – VC.One Feb 15 '16 at 19:21
  • @VC.One What I mean by using it first time, you ship zip file with your app, then uncompress your file and save it as a new file in the SD card. You can query uncompressed file from next time. – Akash Kava Feb 16 '16 at 07:54
  • @AkashKava, Like I said I hear your point. My final say in this is that you're seem to be talking from a **technical possibility** and with such logic why worry about 1.2 gig XML? Let's make it unzip a 8 gig XML as long as there is background service decompressing. `In real life... I'm deleting that dictionary and going with his competition` that gives same result for 22mb. **Practical possiblity** is an awareness that a dictionary (text) never needs some background service uncompressing it. A 20 volume Oxford dictionary fits in CD-rom of 650mb. Involving gigabytes is a bad idea. – VC.One Feb 19 '16 at 18:36

2 Answers2

4

The developer [ of English Dictionary - Offline ] claims that they are using Wiktionary. I am still wondering where did they get a Wiktionary Dump File >22 MB

I'm not being paid enough to tell you that.. (joke). Thing is you need to extract the dictionary entries from the XML files and once you get only those then the final content (text) file becomes smaller.

Alternatively...

You can try this TSV file (courtesy of: semisignal.com) which is a snapshot of November 2012 definitions. This contains most words your end-user checking English would need. The TSV is 54MB and is handled like a text file.

Try a definition : brushable -- TSV has below :(Compare to Wiktionary's entry for Brushable).

English brushable Adjective # Able to be [[brushed]]
English brushable Adjective # Able to be controlled by [[brushing]]


TIPS: For reducing filesize, you can trim off the starting "English" since you already know its all English definitions. Each trim will save you 7 bytes (multiply by total definitions).

  • Use a String.replace on "English " (with that space) to clear it.

  • Also replace "Adjective" "Verb" "Noun" with short codes that your App knows the meaning of and shows entry type in the User Interface. Code could be 1 meaning list entry as Adjective.

Your trimmed text file could like example below. Each double fullstop just means "next section of entry", so basically entry..type..definition where <xyz> is a link to another entry in the dictionary. 54 bytes of TSV entry now becomes 35 bytes for that one line.

brushable..1..Able to be <brushed>.

Save the final edited (reduced) text file. Embed that into your APK.

VC.One
  • 14,790
  • 4
  • 25
  • 57
3

I suggest implementing the online API access, so small app can be downloaded and used, plus add a button somewhere that downloads the offline part. Also check network connection, and if it's not wi-fi, warn the user so the mobile data plan will not be abused for downloading 100 MB dictionary.

Ivan Marinov
  • 2,737
  • 1
  • 25
  • 17