My app need to process input from PDF files consisting of text (mostly). I could do the parsing on my server, but I'd prefer not to. Anyway, after exploring my options for text extraction I found PDFBox library and its port to use with Android (https://github.com/TomRoush/PdfBox-Android)
In the app I show my users a standard UI for selecting the source document through ACTION_OPEN_DOCUMENT. Then override onActivityResult to get Uri - you know, the usual stuff.
The problem is that I can't figure out how to feed it to PDFBox. Since we're not talking "files" but rather "documents" and the lib wants a real file path. If I provide it with it for a certain file, the text parsing goes okay, but it's certainly not the best practice and it can't be done for all documents out there (cloud storage etc) so instead I do this:
InputStream inputStream = getContentResolver().openInputStream(uri);
and then read it line by line so in the end I can have it all in one string. Obviously, it works okay.
But how to actually input this data into PDFBox to do its text extraction magic? I can't find any docs on how to do it in a scenario when I don't have the "real file path".
Maybe there are better ways now? This library is quite old.. Basically I need to extract text from PDF and do it on an Android device, not through an API call. Really stuck here.