So I have a long list of words like this and based on the first space I want to split the words into word-meaning. Basically I am using Apache POI
for this as I have to read the docx file and then fetch the data from it.
abash humiliate, embarrass
abdicate relinquish power or position
aberrant abnormal
abet aid, encourage (typically of crime)
abeyance postponement
aboriginal indigenous
abridge shorten
abstemious moderate
...
So what regex would suit my purpose so that I can display it like:
word :abash
meaning : humiliate, embarrass
...
MY code is :
public class WordFileReader {
/**
* @param args
*/
public static void main(String[] args) {
try {
FileInputStream fis = new FileInputStream("E:\\important.docx");
org.apache.poi.xwpf.extractor.XWPFWordExtractor oleTextExtractor = new XWPFWordExtractor(new XWPFDocument(fis));
System.out.print(oleTextExtractor.getText());
} catch (Exception e) {
e.printStackTrace();
}
}
}
--Edit-- Based on a suggested answer I am using this
public static void main(String[] args) {
try {
FileInputStream fis = new FileInputStream("E:\\Words.docx");
org.apache.poi.xwpf.extractor.XWPFWordExtractor oleTextExtractor = new XWPFWordExtractor(new XWPFDocument(fis));
//System.out.print(oleTextExtractor.getText());
Scanner sc = new Scanner(oleTextExtractor.getText());
while(sc.hasNextLine()) {
String line = sc.nextLine();
int i = line.indexOf(' ');
String word = line.substring(0, i);
String meaning = line.substring(i).trim();
System.out.println("word "+word);
System.out.println("meaning "+meaning);
}
} catch (Exception e) {
e.printStackTrace();
}
}
But i get
java.lang.StringIndexOutOfBoundsException: String index out of range: -1
at java.lang.String.substring(Unknown Source)
at WordFileReader.main(WordFileReader.java:25)