3

I require to replace a HWPFDocument paragraph text of .doc file if it contains a particular text using java. It replaces the text. But the process writes the output text in a strange way. Please help me to rectify this issue. Code snippet used:

public static HWPFDocument processChange(HWPFDocument doc)
{
    try
    {
        Range range = doc.getRange();
        for (int i = 0; i < range.numParagraphs(); i++)
        {
            Paragraph paragraph = range.getParagraph(i);
            if (paragraph.text().contains("Place Holder"))
            {
                String text = paragraph.text();
                paragraph.replaceText(text, "*******");

            }
        }
    }
    catch (Exception ex)
    {
        ex.printStackTrace();
    }
    return doc;
}

Input:

Place Holder 
Textvalue1
Textvalue2
Textvalue3

Output:

*******Textvalue1
Textvalue1
Textvalue2
Textvalue3
Sherin
  • 349
  • 1
  • 15

1 Answers1

3

The HWPF library is not in a perfect state for changing / writing .doc files. (At least at the last time that I looked. Some time ago I developed a custom variant of HWPF for my client which - among many other things - provides correct replace and save operations, but that library is not publicly available.)

If you absolutely must use .doc files and Java you may get away by replacing with strings of exactly same length. For instance "12345" -> "abc__" (_ being spaces or whatever works for you). It might make sense to find the absolute location of the to be replaced string in the doc file (using HWPF) and then changing it in the doc file directly (without using HWPF).

Word file format is very complicated and "doing it right" is not a trivial task. Unless you are willing to spend many man months, it will also not be possible to fix part of the library so that just saving works. Many data structures must be handled very precisely and a single "slip up" lets Word crash on the generated output file.

Rainer Schwarze
  • 4,725
  • 1
  • 27
  • 49
  • Thanks ! for your valuable reply..Changing the string in the .doc file directly without using HWPF? How it would be possible? Could you please give a bit explanation on this? – Sherin Apr 30 '15 at 17:20
  • 1
    You need to dive into HWPF source code. There are two levels of the data model: `usermodel` and `model` (both in package `org.apache.poi.hwpf`). When you have a text run in the "usermodel", you can look at how it references the data in the "model". Eventually you probably end up at class `CHPBinTable`. There look for things having `FC` in their name which are already very close to a file location. Then descend down to `POIFS` package which represents the underlying OLE2 data format. You may have to customize HWPF a little to make private classes/methods/fields accessible. – Rainer Schwarze Apr 30 '15 at 20:11
  • I'm trying the code change as you suggested.I expect your valuable suggestions if I face any difficulty. Thanks for your suggestions. – Sherin May 02 '15 at 05:50
  • Problem resolved. trim the space of paragraph text. Then it works fine. – Sherin May 04 '15 at 10:29