lately a new requirement came up to retreive the content from a word document and display/edit this content in our application with some kind of editorpane.
Therefore I am free to choose the workflow. On first sight two ideas came up in my mind:
- Using the clipboard to get the word document content via copy/paste in my editorpane.
- Parsing the word document within my application and insert it in an editorpane.
So I started with (1) since this was easy and fast to test, which moreover might even work with different kind of programs (not word specific) and I already became promissing results, when I copypasted the content into a simplyhtml editorpane. Formats like bold etc. have been kept, signatures where shown and even pictures have been displayed.
Unfortunatly just displaying the content is not enough. I need to be able to edit and save the changes in my application. And this is where it gets complicated. When copy/pasting them into my editorpane word uses a lot of unnecessary word specific tags (like the famous o-tag) which have no use when displaying them in html or even sometimes have unwanted side effects. But since I do not need to transfer the data back in Word I dont need those tags at all.
Moreover pictures get just temporarily created in some kind of temp folder and are lost as soon as I copy paste another document, or restart the system. Therefore I thought encoding these pictures in base64 could be a solution, since I wouldnt need to handle some sort of filesystem and would be able to save these pictures within a html string in our database.
Thanks to this entry, I was able to display base64 encoded pictures in my editorpane but unfortunatly I have no idea how to convert image tags "on the fly" into base64 encoded pictures. I thought about some kind of clipboard listener, but I am not sure, if this is the right way. I also checked what kind of data flavours word offers in clipboard. RTF looked promissing tough, since there the pictures seem to be already encoded as base64, but I am not sure, if I can influence the behaviour of an editorpane with telling it, what dataflavour to use.
So in short words my question is: How would you retreive word document content (with pictures) and save it (for example as html string) in a database backend within your application?
I am curious if anyone of you already had simular goals or any ideas how to include such a functionality into our application, has any recommendation or at least can put me in the right direction to achieve this goal. Thank you already in advance for taking the time to get through this question and hopefully you guys have some ideas!