I have a huge XML document 200MB in size containing textual information. The data was earlier stored in pagemaker file with 2 Columns. After tagging I found that certain text is having hyphen. This is because the word(s) which were unable to fit the format were broken down in 2 words separated by hyphen. Also this XML document use hyphen for another reason. To separated short sentences (for Notes).
I want to find out those hyphens which are in between the words. I have noticed that the hyphen which I want to find an remove have a standard pattern. For Example.
The first use of hyphen - (Which I want to find and replace)
question
is ques-tion
answer
would be ans-wer
The other use of hyphen is - (Not to be found)
Pattern matchin - Regex Expressions - ...
So the standard format for both is -
space-space
letter-letter
How can I use XQuery to find all these , ie the second one... Or any other way to find them... As finding and replacing these in huge XML file ... my god ..