10

Background

I wanted to get information about APK files(including split APK files), even if they are inside compressed zip files (without de-compressing them). In my case, this include various things, such as package name, version-code, version-name, app-label, app-icon, and if it's a split APK file or not.

Do note that I want to do it all inside an Android app, not using a PC, so some tools might not be possible to be used.

The problem

This means I can't use the getPackageArchiveInfo function, as this function requires a path to the APK file, and works only on non-split-apk files.

In short, there is no framework function to do it, so I have to find a way of how to do it by going into the zipped file, using the InputStream as input for parsing it in a function.

There are various solutions online, including outside of Android, but I don't know of one that is stable and works for all cases. Many might be good even for Android (example here), but might fail parsing and might require a file path instead of Uri/InputStream.

What I've found&tried

I've found this on StackOverflow, but sadly according to my tests, it always generates content, but in some rare cases it's not a valid XML content.

So far, I've found these apps package names and their version codes that the parser fails to parse, as the output XML content is invalid:

  1. com.farproc.wifi.analyzer 139
  2. com.teslacoilsw.launcherclientproxy 2
  3. com.hotornot.app 3072
  4. android 29 (that's the "Android System" system app itself)
  5. com.google.android.videos 41300042
  6. com.facebook.katana 201518851
  7. com.keramidas.TitaniumBackupPro 10
  8. com.google.android.apps.tachyon 2985033
  9. com.google.android.apps.photos 3594753

Using an XML viewer and XML validator, here are the issues with these apps:

  • For #1,#2, I got a very weird content, starting with <mnfs .
  • For #3, it doesn't like the "&" in <activity theme="resourceID 0x7f13000b" label="Features & Tests" ...
  • For #4, it missed the end tag of "manifest" in the end.
  • For #5, it missed multiple end tags, at least of "intent-filter","receiver" and "manifest". Maybe more.
  • For #6, it got "allowBackup" attribute twice in the "application" tag for some reason.
  • For #7, it got a value without attribute in the manifest tag: <manifest versionCode="resourceID 0xa" ="1.3.2".
  • For #8, it missed a lot of content after getting some "uses-feature" tags, and didn't have an ending tag for "manifest".
  • For #9, it missed a lot of content after getting some "uses-permission" tags, and didn't have an ending tag for "manifest"

Surprisingly, I didn't find any issue with split APK files. Only with main APK files.

Here's the code (also available here) :

MainActivity .kt

class MainActivity : AppCompatActivity() {
    override fun onCreate(savedInstanceState: Bundle?) {
        super.onCreate(savedInstanceState)
        setContentView(R.layout.activity_main)
        thread {
            val problematicApkFiles = HashMap<ApplicationInfo, HashSet<String>>()
            val installedApplications = packageManager.getInstalledPackages(0)
            val startTime = System.currentTimeMillis()
            for ((index, packageInfo) in installedApplications.withIndex()) {
                val applicationInfo = packageInfo.applicationInfo
                val packageName = packageInfo.packageName
//                Log.d("AppLog", "$index/${installedApplications.size} parsing app $packageName ${packageInfo.versionCode}...")
                val mainApkFilePath = applicationInfo.publicSourceDir
                val parsedManifestOfMainApkFile =
                        try {
                            val parsedManifest = ManifestParser.parse(mainApkFilePath)
                            if (parsedManifest?.isSplitApk != false)
                                Log.e("AppLog", "$packageName - parsed normal APK, but failed to identify it as such")
                            parsedManifest?.manifestAttributes
                        } catch (e: Exception) {
                            Log.e("AppLog", e.toString())
                            null
                        }
                if (parsedManifestOfMainApkFile == null) {
                    problematicApkFiles.getOrPut(applicationInfo, { HashSet() }).add(mainApkFilePath)
                    Log.e("AppLog", "$packageName - failed to parse main APK file $mainApkFilePath")
                }
                if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.LOLLIPOP)
                    applicationInfo.splitPublicSourceDirs?.forEach {
                        val parsedManifestOfSplitApkFile =
                                try {
                                    val parsedManifest = ManifestParser.parse(it)
                                    if (parsedManifest?.isSplitApk != true)
                                        Log.e("AppLog", "$packageName - parsed split APK, but failed to identify it as such")
                                    parsedManifest?.manifestAttributes
                                } catch (e: Exception) {
                                    Log.e("AppLog", e.toString())
                                    null
                                }
                        if (parsedManifestOfSplitApkFile == null) {
                            Log.e("AppLog", "$packageName - failed to parse main APK file $it")
                            problematicApkFiles.getOrPut(applicationInfo, { HashSet() }).add(it)
                        }
                    }
            }
            val endTime = System.currentTimeMillis()
            Log.d("AppLog", "done parsing. number of files we failed to parse:${problematicApkFiles.size} time taken:${endTime - startTime} ms")
            if (problematicApkFiles.isNotEmpty()) {
                Log.d("AppLog", "list of files that we failed to get their manifest:")
                for (entry in problematicApkFiles) {
                    Log.d("AppLog", "packageName:${entry.key.packageName} , files:${entry.value}")
                }
            }
        }
    }
}

ManifestParser.kt

class ManifestParser{
    var isSplitApk: Boolean? = null
    var manifestAttributes: HashMap<String, String>? = null

    companion object {
        fun parse(file: File) = parse(java.io.FileInputStream(file))
        fun parse(filePath: String) = parse(File(filePath))
        fun parse(inputStream: InputStream): ManifestParser? {
            val result = ManifestParser()
            val manifestXmlString = ApkManifestFetcher.getManifestXmlFromInputStream(inputStream)
                    ?: return null
            val factory: DocumentBuilderFactory = DocumentBuilderFactory.newInstance()
            val builder: DocumentBuilder = factory.newDocumentBuilder()
            val document: Document? = builder.parse(manifestXmlString.byteInputStream())
            if (document != null) {
                document.documentElement.normalize()
                val manifestNode: Node? = document.getElementsByTagName("manifest")?.item(0)
                if (manifestNode != null) {
                    val manifestAttributes = HashMap<String, String>()
                    for (i in 0 until manifestNode.attributes.length) {
                        val node = manifestNode.attributes.item(i)
                        manifestAttributes[node.nodeName] = node.nodeValue
                    }
                    result.manifestAttributes = manifestAttributes
                }
            }
            result.manifestAttributes?.let {
                result.isSplitApk = (it["android:isFeatureSplit"]?.toBoolean()
                        ?: false) || (it.containsKey("split"))
            }
            return result
        }

    }
}

ApkManifestFetcher.kt

object ApkManifestFetcher {
    fun getManifestXmlFromFile(apkFile: File) = getManifestXmlFromInputStream(FileInputStream(apkFile))
    fun getManifestXmlFromFilePath(apkFilePath: String) = getManifestXmlFromInputStream(FileInputStream(File(apkFilePath)))
    fun getManifestXmlFromInputStream(ApkInputStream: InputStream): String? {
        ZipInputStream(ApkInputStream).use { zipInputStream: ZipInputStream ->
            while (true) {
                val entry = zipInputStream.nextEntry ?: break
                if (entry.name == "AndroidManifest.xml") {
//                    zip.getInputStream(entry).use { input ->
                    return decompressXML(zipInputStream.readBytes())
//                    }
                }
            }
        }
        return null
    }

    /**
     * Binary XML doc ending Tag
     */
    private var endDocTag = 0x00100101

    /**
     * Binary XML start Tag
     */
    private var startTag = 0x00100102

    /**
     * Binary XML end Tag
     */
    private var endTag = 0x00100103


    /**
     * Reference var for spacing
     * Used in prtIndent()
     */
    private var spaces = "                                             "

    /**
     * Parse the 'compressed' binary form of Android XML docs
     * such as for AndroidManifest.xml in .apk files
     * Source: http://stackoverflow.com/questions/2097813/how-to-parse-the-androidmanifest-xml-file-inside-an-apk-package/4761689#4761689
     *
     * @param xml Encoded XML content to decompress
     */
    private fun decompressXML(xml: ByteArray): String {

        val resultXml = StringBuilder()

        // Compressed XML file/bytes starts with 24x bytes of data,
        // 9 32 bit words in little endian order (LSB first):
        //   0th word is 03 00 08 00
        //   3rd word SEEMS TO BE:  Offset at then of StringTable
        //   4th word is: Number of strings in string table
        // WARNING: Sometime I indiscriminently display or refer to word in
        //   little endian storage format, or in integer format (ie MSB first).
        val numbStrings = lew(xml, 4 * 4)

        // StringIndexTable starts at offset 24x, an array of 32 bit LE offsets
        // of the length/string data in the StringTable.
        val sitOff = 0x24  // Offset of start of StringIndexTable

        // StringTable, each string is represented with a 16 bit little endian
        // character count, followed by that number of 16 bit (LE) (Unicode) chars.
        val stOff = sitOff + numbStrings * 4  // StringTable follows StrIndexTable

        // XMLTags, The XML tag tree starts after some unknown content after the
        // StringTable.  There is some unknown data after the StringTable, scan
        // forward from this point to the flag for the start of an XML start tag.
        var xmlTagOff = lew(xml, 3 * 4)  // Start from the offset in the 3rd word.
        // Scan forward until we find the bytes: 0x02011000(x00100102 in normal int)
        run {
            var ii = xmlTagOff
            while (ii < xml.size - 4) {
                if (lew(xml, ii) == startTag) {
                    xmlTagOff = ii
                    break
                }
                ii += 4
            }
        } // end of hack, scanning for start of first start tag

        // XML tags and attributes:
        // Every XML start and end tag consists of 6 32 bit words:
        //   0th word: 02011000 for startTag and 03011000 for endTag
        //   1st word: a flag?, like 38000000
        //   2nd word: Line of where this tag appeared in the original source file
        //   3rd word: FFFFFFFF ??
        //   4th word: StringIndex of NameSpace name, or FFFFFFFF for default NS
        //   5th word: StringIndex of Element Name
        //   (Note: 01011000 in 0th word means end of XML document, endDocTag)

        // Start tags (not end tags) contain 3 more words:
        //   6th word: 14001400 meaning??
        //   7th word: Number of Attributes that follow this tag(follow word 8th)
        //   8th word: 00000000 meaning??

        // Attributes consist of 5 words:
        //   0th word: StringIndex of Attribute Name's Namespace, or FFFFFFFF
        //   1st word: StringIndex of Attribute Name
        //   2nd word: StringIndex of Attribute Value, or FFFFFFF if ResourceId used
        //   3rd word: Flags?
        //   4th word: str ind of attr value again, or ResourceId of value

        // TMP, dump string table to tr for debugging
        //tr.addSelect("strings", null);
        //for (int ii=0; ii<numbStrings; ii++) {
        //  // Length of string starts at StringTable plus offset in StrIndTable
        //  String str = compXmlString(xml, sitOff, stOff, ii);
        //  tr.add(String.valueOf(ii), str);
        //}
        //tr.parent();

        // Step through the XML tree element tags and attributes
        var off = xmlTagOff
        var indent = 0
//        var startTagLineNo = -2
        while (off < xml.size) {
            val tag0 = lew(xml, off)
            //int tag1 = LEW(xml, off+1*4);
//            val lineNo = lew(xml, off + 2 * 4)
            //int tag3 = LEW(xml, off+3*4);
//            val nameNsSi = lew(xml, off + 4 * 4)
            val nameSi = lew(xml, off + 5 * 4)

            if (tag0 == startTag) { // XML START TAG
//                val tag6 = lew(xml, off + 6 * 4)  // Expected to be 14001400
                val numbAttrs = lew(xml, off + 7 * 4)  // Number of Attributes to follow
                //int tag8 = LEW(xml, off+8*4);  // Expected to be 00000000
                off += 9 * 4  // Skip over 6+3 words of startTag data
                val name = compXmlString(xml, sitOff, stOff, nameSi)
                //tr.addSelect(name, null);
//                startTagLineNo = lineNo

                // Look for the Attributes
                val sb = StringBuffer()
                for (ii in 0 until numbAttrs) {
//                    val attrNameNsSi = lew(xml, off)  // AttrName Namespace Str Ind, or FFFFFFFF
                    val attrNameSi = lew(xml, off + 1 * 4)  // AttrName String Index
                    val attrValueSi = lew(xml, off + 2 * 4) // AttrValue Str Ind, or FFFFFFFF
//                    val attrFlags = lew(xml, off + 3 * 4)
                    val attrResId = lew(xml, off + 4 * 4)  // AttrValue ResourceId or dup AttrValue StrInd
                    off += 5 * 4  // Skip over the 5 words of an attribute

                    val attrName = compXmlString(xml, sitOff, stOff, attrNameSi)
                    val attrValue = if (attrValueSi != -1)
                        compXmlString(xml, sitOff, stOff, attrValueSi)
                    else
                        "resourceID 0x" + Integer.toHexString(attrResId)
                    sb.append(" $attrName=\"$attrValue\"")
                    //tr.add(attrName, attrValue);
                }
                resultXml.append(prtIndent(indent, "<$name$sb>"))
                indent++

            } else if (tag0 == endTag) { // XML END TAG
                indent--
                off += 6 * 4  // Skip over 6 words of endTag data
                val name = compXmlString(xml, sitOff, stOff, nameSi)
                resultXml.append(prtIndent(indent, "</$name>")) //  (line $startTagLineNo-$lineNo)
                //tr.parent();  // Step back up the NobTree

            } else if (tag0 == endDocTag) {  // END OF XML DOC TAG
                break

            } else {
//                println("  Unrecognized tag code '" + Integer.toHexString(tag0)
//                        + "' at offset " + off
//                )
                break
            }
        } // end of while loop scanning tags and attributes of XML tree
//        println("    end at offset $off")

        return resultXml.toString()
    } // end of decompressXML


    /**
     * Tool Method for decompressXML();
     * Compute binary XML to its string format
     * Source: Source: http://stackoverflow.com/questions/2097813/how-to-parse-the-androidmanifest-xml-file-inside-an-apk-package/4761689#4761689
     *
     * @param xml Binary-formatted XML
     * @param sitOff
     * @param stOff
     * @param strInd
     * @return String-formatted XML
     */
    private fun compXmlString(xml: ByteArray, @Suppress("SameParameterValue") sitOff: Int, stOff: Int, strInd: Int): String? {
        if (strInd < 0) return null
        val strOff = stOff + lew(xml, sitOff + strInd * 4)
        return compXmlStringAt(xml, strOff)
    }


    /**
     * Tool Method for decompressXML();
     * Apply indentation
     *
     * @param indent Indentation level
     * @param str String to indent
     * @return Indented string
     */
    private fun prtIndent(indent: Int, str: String): String {

        return spaces.substring(0, min(indent * 2, spaces.length)) + str
    }


    /**
     * Tool method for decompressXML()
     * Return the string stored in StringTable format at
     * offset strOff.  This offset points to the 16 bit string length, which
     * is followed by that number of 16 bit (Unicode) chars.
     *
     * @param arr StringTable array
     * @param strOff Offset to get string from
     * @return String from StringTable at offset strOff
     */
    private fun compXmlStringAt(arr: ByteArray, strOff: Int): String {
        val strLen = (arr[strOff + 1] shl (8 and 0xff00)) or (arr[strOff].toInt() and 0xff)
        val chars = ByteArray(strLen)
        for (ii in 0 until strLen) {
            chars[ii] = arr[strOff + 2 + ii * 2]
        }
        return String(chars)  // Hack, just use 8 byte chars
    } // end of compXmlStringAt


    /**
     * Return value of a Little Endian 32 bit word from the byte array
     * at offset off.
     *
     * @param arr Byte array with 32 bit word
     * @param off Offset to get word from
     * @return Value of Little Endian 32 bit word specified
     */
    private fun lew(arr: ByteArray, off: Int): Int {
        return (arr[off + 3] shl 24 and -0x1000000 or ((arr[off + 2] shl 16) and 0xff0000)
                or (arr[off + 1] shl 8 and 0xff00) or (arr[off].toInt() and 0xFF))
    } // end of LEW

    private infix fun Byte.shl(i: Int): Int = (this.toInt() shl i)
//    private infix fun Int.shl(i: Int): Int = (this shl i)
}

The questions

  1. How come I get an invalid XML content for some APK manifest files (hence getting the XML parsing fail for them) ?
  2. How can I get it to work, always?
  3. Is there a better way to parse the manifest file into a valid XML ? Maybe a better alternative, that could work with all kinds of APK files, including inside zipped files, without decompressing them?
android developer
  • 114,585
  • 152
  • 739
  • 1,270
  • I think that some of the manifests are obfuscated by DexGuard (see [here](https://www.guardsquare.com/en/blog/dexguard-vs-proguard)) where manifest file obfuscation is mentioned. This seems to be the case for #1 on your list, com.farproc.wifi.analyzer. Its manifest file starts with " – Cheticamp Mar 14 '20 at 21:37
  • @Cheticamp Still, the framework itself can read it just fine. Those are all APK files that are installed fine on my device. Some didn't have this exact issue that you describe, and one of them is extremely old. – android developer Mar 14 '20 at 22:34
  • And yet, DexGuard claims to be able to obfuscate the manifest file. I don't know how they do it and still have the framework read the manifest, but it's an area to look into IMO. As for the other issues, have you looked into using XmlPullParser to extract just what you need? Maybe you already tried this and I didn't read carefully enough. – Cheticamp Mar 14 '20 at 22:40
  • I already mentioned all of the issues I've found, and it's not "mnfs" for most cases. It's only for the first 2 cases. Also, if you try to parse those via some online tool, it will still work fine. – android developer Mar 14 '20 at 22:44
  • What doesn't work with [apk-parser](https://github.com/hsiafan/apk-parser)? I was able to run it on an emulator and it worked OK. Would it be required to accept an InputStream? – Cheticamp Mar 19 '20 at 00:03
  • @Cheticamp I already provided a link to this library, and said that it supports only file-path and not InputStream. This means it can't parse files from SAF well, and can't parse APK files that are inside zip files. Not only that, but sadly this library has issues parsing even normal APK sometimes: https://github.com/hsiafan/apk-parser/issues – android developer Mar 19 '20 at 00:55
  • That's clear. Thanks. That library can be made to accept InputStream but the issues will remain. Even Apktool which, I believe, is the gold standard of this type of application has issues and recent ones. – Cheticamp Mar 19 '20 at 01:13
  • @Cheticamp What about jadx ? I've tested it online and I think it can handle APK files. It's also open sourced with apache license and it's in Java. Sadly though, I couldn't find how to use it for Android, and I think it also uses a file path, and it has a weird warning on the FAQ about OOM, that has the answer of raising the memory allowed to be quite high... But still, I wonder if it's possible to to try it out. Could be a nice step in the long path to make it work. – android developer Mar 19 '20 at 08:46
  • I learned today that _ZipInputStream_ can't always unzip a file for several reasons identified [here](https://stackoverflow.com/a/54236244/6287910). What I saw was "java.util.zip.ZipException: only DEFLATED entries can have EXT descriptor" on one APK but the remaining 430 APKs were OK. Out of curiosity, I tried a test of the relative speed of extracting and parsing manifests on my S7 with 431 APKs: Extract using ZipFile (no copying): 65 seconds; extract by copying input stream to local file then using ZipFile: 285 secs; Extract using ZipInputStream without copying: 285 secs. I used apk-parser. – Cheticamp Mar 19 '20 at 21:07
  • @Cheticamp I tested it. Seems that a lot of zip compression types are not supported, and in some weird cases, ZipFile can open while ZipInputStream can't. So I reported here: https://issuetracker.google.com/issues/151990857 . However, as I've tested, I couldn't reproduce this issue on APK files. The file he tested was of "xapk" which is something that some website invented, to include more than just APK file. The link says that we can use SeekableInMemoryByteChannel . Wonder how this works. Couldn't find a sample. I don't understand what is the conclusion from your testing. – android developer Mar 20 '20 at 09:23
  • @Cheticamp Also, the zip opening isn't the issue. You can probably find various libraries that can handle all kinds of zip files (maybe this: https://github.com/zeroturnaround/zt-zip ) . The issue is parsing using InputStream. It doesn't have to be ZipInputStream. I used it in the sample to show the real issue. – android developer Mar 20 '20 at 09:30
  • The only APK I had trouble unpacking was FireFox (org.mozilla.firefox). I only noted the issue because I thought that it was a structural issue, i.e., that unzipping couldn't be done reliably with a sequential read. As you say, other libs might handle sequential reads just fine. – Cheticamp Mar 20 '20 at 11:43
  • As for the tests, I was curious about the penalty that sequential reads would impose. I didn't mention a final test I did which was to use the code I posted here instead of apk-parser using ZipInputStream. Running code resulted in scan time of 83 seconds which compares favorably with ZipFile. What the code doesn't do is to read the _resources.arsc_ file which is required for a full parse and which tends to be at the very end of APKs. When running this parser while just reading the _resources.arsc_ file, the run time is 979 seconds. – Cheticamp Mar 20 '20 at 11:44
  • @Cheticamp So what can be done? You think that maybe jadx could be sufficient? Even if Google would allow to parse APKs when using SAF and file-path (I think they might), I want to know how to do it for InputStream, as I want to parse even APK files that are inside a zip file (or maybe other sources too). – android developer Mar 21 '20 at 14:49
  • I see two issues: 1) Getting a reliable unzipping methodology for input streams (but that's not the question) and 2) adapting an open source APK unpacker to run on Android. All the unpackers I have looked at work with input streams anyway due to the nature of APKs so they could be adapted. Unfortunately, the simple ones aren't robust and the robust ones do many other things and are involved. It's doable, IMO, but it will be a job. If you are looking to extract just a few manifest entries, though, that would be more aproachable. – Cheticamp Mar 22 '20 at 04:16
  • @Cheticamp I know it's probably a lot of work, but I thought that maybe the first step that someone has done might help. I'm also ok with third party libraries, if they are both reliable and won't cause issues (example: too high memory usage). – android developer Mar 22 '20 at 15:05

2 Answers2

0

Likely you'd have to handle all of the special cases you've already identified.

Aliases & hexadecimal references might confuse it; these would need to be resolved.

For example, to fall-back from manifest to mnfs would at least solve one issue:

fun getRootNode(document: Document): Node? {
    var node: Node? = document.getElementsByTagName("manifest")?.item(0)
    if (node == null) {
        node = document.getElementsByTagName("mnfs")?.item(0)
    }
    return node
}

"Features & Tests" would require TextUtils.htmlEncode() for &amp; or another parser configuration.

Making it parse single AndroidManifest.xml files would make it easier to test, because with each other package there may be more unexpected input - until it comes close to the manifest parser which the OS uses (the source code might help). As one can see, it may set cookies for reading it. Take this list of package names and set up a test case for each of them, then the issues are rather isolated. But the main issue is that these cookies are most likely not available to 3rd party applications.

Martin Zeitler
  • 1
  • 19
  • 155
  • 216
  • It's not just that, but as I wrote the XML itself is invalid. The issue is before even parsing the XML. Meaning: some tags don't exist, and some don't have end-tags. Please, if you've found a way to fix this issue, tell me how. – android developer Mar 15 '20 at 08:43
0

It seems that ApkManifestFetcher doesn't handle all cases such as text (between tags) and name space declarations and, maybe, a few other things. Below is a rework of ApkManifestFetcher that handles all 300+ APKs on my phone except for the Netflix APK which is coming up with some blank attributes.

I no longer believe that the files that start with <mnfs have anything to do with obfuscation but are encoded using UTF-8 rather than UTF-16 which the app assumes (16 bits vs 8 bits). The reworked app handles UTF-8 encoding and can parse these files.

As mentioned above, name spaces are not handled correctly by the original class or this rework although the rework can skip past them. Comments in the code describe this a little.

That said, the code below may be good enough for certain applications. The better, although longer, course of action would be to use code from apktool which seems to be able to handle all APKs.

ApkManifestFetcher

object ApkManifestFetcher {
    fun getManifestXmlFromFile(apkFile: File) =
            getManifestXmlFromInputStream(FileInputStream(apkFile))

    fun getManifestXmlFromFilePath(apkFilePath: String) =
            getManifestXmlFromInputStream(FileInputStream(File(apkFilePath)))

    fun getManifestXmlFromInputStream(ApkInputStream: InputStream): String? {
        ZipInputStream(ApkInputStream).use { zipInputStream: ZipInputStream ->
            while (true) {
                val entry = zipInputStream.nextEntry ?: break
                if (entry.name == "AndroidManifest.xml") {
                    return decompressXML(zipInputStream.readBytes())
                }
            }
        }
        return null
    }

    /**
     * Binary XML name space starts
     */
    private const val startNameSpace = 0x00100100

    /**
     * Binary XML name space ends
     */
    private const val endNameSpace = 0x00100101

    /**
     * Binary XML start Tag
     */
    private const val startTag = 0x00100102

    /**
     * Binary XML end Tag
     */
    private const val endTag = 0x00100103

    /**
     * Binary XML text Tag
     */
    private const val textTag = 0x00100104

    /*
     * Flag for UTF-8 encoded file. Default is UTF-16.
     */
    private const val FLAG_UTF_8 = 0x00000100

    /**
     * Reference var for spacing
     * Used in prtIndent()
     */
    private const val spaces = "                                             "

    // Flag if the manifest is in UTF-8 but we don't really handle it.
    private var mIsUTF8 = false

    /**
     * Parse the 'compressed' binary form of Android XML docs
     * such as for AndroidManifest.xml in .apk files
     * Source: http://stackoverflow.com/questions/2097813/how-to-parse-the-androidmanifest-xml-file-inside-an-apk-package/4761689#4761689
     *
     * @param xml Encoded XML content to decompress
     */
    private fun decompressXML(xml: ByteArray): String {
        val resultXml = StringBuilder()
        /*
        Compressed XML file/bytes starts with 24x bytes of data
            9 32 bit words in little endian order (LSB first):
                0th word is 03 00 (Magic number) 08 00 (header size words 0-1)
                1st word is the size of the compressed XML. This should equal size of xml array.
                2nd word is 01 00 (Magic number) 1c 00 (header size words 2-8)
                3rd word is offset of byte after string table
                4th word is number of strings in string table
                5th word is style count
                6th word are flags
                7th word string table offset
                8th word is styles offset
                [string index table (little endian offset into string table)]
                [string table (two byte length followed by text for each entry UTF-16, nul)]
        */

        mIsUTF8 = (lew(xml, 24) and FLAG_UTF_8) != 0

        val numbStrings = lew(xml, 4 * 4)

        // StringIndexTable starts at offset 24x, an array of 32 bit LE offsets
        // of the length/string data in the StringTable.
        val sitOff = 0x24  // Offset of start of StringIndexTable

        // StringTable, each string is represented with a 16 bit little endian
        // character count, followed by that number of 16 bit (LE) (Unicode) chars.
        val stOff = sitOff + numbStrings * 4  // StringTable follows StrIndexTable

        // XMLTags, The XML tag tree starts after some unknown content after the
        // StringTable.  There is some unknown data after the StringTable, scan
        // forward from this point to the flag for the start of an XML start tag.
        var xmlTagOff = lew(xml, 3 * 4)  // Start from the offset in the 3rd word.
        // Scan forward until we find the bytes: 0x02011000(x00100102 in normal int)
        run {
            var ii = xmlTagOff
            while (ii < xml.size - 4) {
                if (lew(xml, ii) == startTag) {
                    xmlTagOff = ii
                    break
                }
                ii += 4
            }
        }

        /*
        XML tags and attributes:

        Every XML start and end tag consists of 6 32 bit words:
            0th word: 02011000 for startTag and 03011000 for endTag
            1st word: a flag?, like 38000000
            2nd word: Line of where this tag appeared in the original source file
            3rd word: 0xFFFFFFFF ??
            4th word: StringIndex of NameSpace name, or 0xFFFFFF for default NS
            5th word: StringIndex of Element Name
            (Note: 01011000 in 0th word means end of XML document, endDocTag)

        Start tags (not end tags) contain 3 more words:
            6th word: 14001400 meaning??
            7th word: Number of Attributes that follow this tag(follow word 8th)
            8th word: 00000000 meaning??

        Attributes consist of 5 words:
            0th word: StringIndex of Attribute Name's Namespace, or 0xFFFFFF
            1st word: StringIndex of Attribute Name
            2nd word: StringIndex of Attribute Value, or 0xFFFFFFF if ResourceId used
            3rd word: Flags?
            4th word: str ind of attr value again, or ResourceId of value

        Text blocks consist of 7 words
            0th word: The text tag (0x00100104)
            1st word: Size of the block (28 bytes)
            2nd word: Line number
            3rd word: 0xFFFFFFFF
            4th word: Index into the string table
            5th word: Unknown
            6th word: Unknown

        startNameSpace blocks consist of 6 words
            0th word: The startNameSpace tag (0x00100100)
            1st word: Size of the block (24 bytes)
            2nd word: Line number
            3rd word: 0xFFFFFFFF
            4th word: Index into the string table for the prefix
            5th word: Index into the string table for the URI

        endNameSpace blocks consist of 6 words
            0th word: The endNameSpace tag (0x00100101)
            1st word: Size of the block (24 bytes)
            2nd word: Line number
            3rd word: 0xFFFFFFFF
            4th word: Index into the string table for the prefix
            5th word: Index into the string table for the URI
        */

        // Step through the XML tree element tags and attributes
        var off = xmlTagOff
        var indent = 0
        while (off < xml.size) {
            val tag0 = lew(xml, off)
            val nameSi = lew(xml, off + 5 * 4)

            when (tag0) {
                startTag -> {
                    val numbAttrs = lew(xml, off + 7 * 4)  // Number of Attributes to follow
                    off += 9 * 4  // Skip over 6+3 words of startTag data
                    val name = compXmlString(xml, sitOff, stOff, nameSi)

                    // Look for the Attributes
                    val sb = StringBuffer()
                    for (ii in 0 until numbAttrs) {
                        val attrNameSi = lew(xml, off + 1 * 4)  // AttrName String Index
                        val attrValueSi = lew(xml, off + 2 * 4) // AttrValue Str Ind, or 0xFFFFFF
                        val attrResId = lew(xml, off + 4 * 4)  // AttrValue ResourceId or dup AttrValue StrInd
                        off += 5 * 4  // Skip over the 5 words of an attribute

                        val attrName = compXmlString(xml, sitOff, stOff, attrNameSi)
                        val attrValue = if (attrValueSi != -1)
                            compXmlString(xml, sitOff, stOff, attrValueSi)
                        else
                            "resourceID 0x" + Integer.toHexString(attrResId)
                        sb.append(" $attrName=\"$attrValue\"")
                    }
                    resultXml.append(prtIndent(indent, "<$name$sb>"))
                    indent++
                }
                endTag -> {
                    indent--
                    off += 6 * 4  // Skip over 6 words of endTag data
                    val name = compXmlString(xml, sitOff, stOff, nameSi)
                    resultXml.append(prtIndent(indent, "</$name>")
                    )

                }
                textTag -> {  // Text that is hanging out between start and end tags
                    val text = compXmlString(xml, sitOff, stOff, lew(xml, off + 16))
                    resultXml.append(text)
                    off += lew(xml, off + 4)
                }
                startNameSpace -> {
                    //Todo startNameSpace and endNameSpace are effectively skipped, but they are not handled.
                    off += lew(xml, off + 4)
                }
                endNameSpace -> {
                    off += lew(xml, off + 4)
                }
                else -> {
                    Log.d(
                            "Applog", "  Unrecognized tag code '" + Integer.toHexString(tag0)
                            + "' at offset " + off
                    )
                }
            }
        }
        return resultXml.toString()
    }

    /**
     * Tool Method for decompressXML();
     * Compute binary XML to its string format
     * Source: Source: http://stackoverflow.com/questions/2097813/how-to-parse-the-androidmanifest-xml-file-inside-an-apk-package/4761689#4761689
     *
     * @param xml Binary-formatted XML
     * @param sitOff
     * @param stOff
     * @param strInd
     * @return String-formatted XML
     */
    private fun compXmlString(
            xml: ByteArray, @Suppress("SameParameterValue") sitOff: Int,
            stOff: Int,
            strInd: Int
    ): String? {
        if (strInd < 0) return null
        val strOff = stOff + lew(xml, sitOff + strInd * 4)
        return compXmlStringAt(xml, strOff)
    }

    /**
     * Tool Method for decompressXML();
     * Apply indentation
     *
     * @param indent Indentation level
     * @param str String to indent
     * @return Indented string
     */
    private fun prtIndent(indent: Int, str: String): String {
        return spaces.substring(0, min(indent * 2, spaces.length)) + str
    }

    /**
     * Tool method for decompressXML()
     * Return the string stored in StringTable format at
     * offset strOff.  This offset points to the 16 bit string length, which
     * is followed by that number of 16 bit (Unicode) chars.
     *
     * @param arr StringTable array
     * @param strOff Offset to get string from
     * @return String from StringTable at offset strOff
     */
    private fun compXmlStringAt(arr: ByteArray, strOff: Int): String {
        var start = strOff
        var charSetUsed: Charset = Charsets.UTF_16LE

        val byteLength = if (mIsUTF8) {
            charSetUsed = Charsets.UTF_8
            start += 2
            arr[strOff + 1].toInt() and 0xFF
        } else { // UTF-16LE
            start += 2
            ((arr[strOff + 1].toInt() and 0xFF shl 8) or (arr[strOff].toInt() and 0xFF)) * 2
        }
        return String(arr, start, byteLength, charSetUsed)
    }

    /**
     * Return value of a Little Endian 32 bit word from the byte array
     * at offset off.
     *
     * @param arr Byte array with 32 bit word
     * @param off Offset to get word from
     * @return Value of Little Endian 32 bit word specified
     */
    private fun lew(arr: ByteArray, off: Int): Int {
        return (arr[off + 3] shl 24 and -0x1000000 or ((arr[off + 2] shl 16) and 0xff0000)
                or (arr[off + 1] shl 8 and 0xff00) or (arr[off].toInt() and 0xFF))
    }

    private infix fun Byte.shl(i: Int): Int = (this.toInt() shl i)
}
Cheticamp
  • 61,413
  • 10
  • 78
  • 131
  • So it's still not a reliable one. Have you tried perhaps jadx ? I wonder if this one can handle APK files well even on Android app itself. – android developer Mar 17 '20 at 10:40
  • @androiddeveloper I haven't looked at jadx. I have perused [Apktool](https://github.com/iBotPeaches/Apktool) and think that it is a good source (and open.) It would take some work to host it on Android but maybe just the manifest part would be doable. What is posted here is definitely not production worthy since there are many aspects of manifest files that it does not address. – Cheticamp Mar 17 '20 at 12:36
  • @androiddeveloper I believe that the manifests that start with " – Cheticamp Mar 17 '20 at 13:08
  • jadx is also open sourced, and in Java too. Not sure if it supports InputStream though (as I asked about). If you find a good solution, please let me know. I've tried various solutions already and didn't find any reliable one. I'm afraid of using the big tools too, because maybe they could take too much memory which could cause a crash on some devices (jadx talks about OOM on its FAQ: https://github.com/skylot/jadx/wiki/Troubleshooting-Q&A ) . So I prefer a minimal solution/library that is best working for Android. – android developer Mar 17 '20 at 13:46
  • Curious about whether you are still looking for an answer to this question? – Cheticamp Jun 10 '20 at 21:40
  • I'm not sure. For now I've put some workaround for the library I forked : https://github.com/AndroidDeveloperLB/apk-parser . But, it could always be nice to see if there is a reliable way to get the manifest, which will handle all cases. – android developer Jun 13 '20 at 18:32
  • @androiddeveloper Got a chance to look at this again. See [this project](https://github.com/Cheticamp/ApkManifestReader) on GitHub. I would be interested to know if you have any problem APKs that it can't handle. It doesn't lookup resources, but I believe it doesreliably produce manifests that will parse. – Cheticamp Jun 30 '20 at 18:42
  • This worked well on almost all apps. All, except "com.keramidas.TitaniumBackupPro" : https://i.imgur.com/WkzP7R1.png . The way I tested is by using my XMLTag class to try to parse the manifest string I get via the function "decodeXml" : https://stackoverflow.com/a/19115036/878126 – android developer Jul 01 '20 at 22:42
  • @androiddeveloper Probably an ill-formed manifest that doesn't effect Android. Can you supply the APK that failed? Is it just the one? – Cheticamp Jul 02 '20 at 00:02
  • OK here: https://ufile.io/hrozswq3 . For now, this is the only one that failed. – android developer Jul 02 '20 at 14:34
  • @androiddeveloper That link is not working for me but I grabbed it from the Play Store. The problem is that the package is reporting a blank (zero-length string) for a name space which is not legal. I have a fix but won't push it out just yet. – Cheticamp Jul 02 '20 at 15:08
  • Wait, you are the one who made the Github sample? How did you do that? Where did you read about how to parse it? Or you got it from somewhere else? – android developer Jul 02 '20 at 16:12
  • @androiddeveloper Yes, that's my repo. This question gave me the general outline but I also referred to the [AOSP source code](https://cs.android.com/) and some online search results. – Cheticamp Jul 02 '20 at 16:18
  • Wow you are incredible – android developer Jul 02 '20 at 22:22
  • If you wish, you can use my app that uses the recent solutions I've found about apps and APKs : https://play.google.com/store/apps/details?id=com.lb.app_manager . The recent changes handle installing various kinds of split-apk files: APKS, APKM, XAPK, and APK (that are in the same fodler) - all when opened via file manager apps. – android developer Jul 05 '20 at 07:52
  • @androiddeveloper Thanks. I'll check it out. – Cheticamp Jul 05 '20 at 12:53