-3

I am working in Java. I have list of parameters stored in a string which is coming form excel. I want to split it only at starting hyphen of every new line. This string is stored in every excel cell and I am trying to extract it using Apache poi. The format is as below:

String text =
        "- I am string one\n" +
        "-I am string two\n" +
        "- I am string-three\n" +
        "with new line\n" +
        "-I am string-four\n" +
        "- I am string five";

What I want

array or arraylist which looks like this

[I am string one, 
I am string two,
I am string-three with new line,
I am string-four,
I am string five]

What I Tried

I tried to use split function like this:

String[] newline_split = text.split("-");

but the output I get is not what I want

My O/P

[,  I am string one,
 I am string two,
 I am string, // wrong  
 three // wrong
   with new line, // wrong
 I am string, // wrong! 
 four,        // wrong! 
 I am string five]  

I might have to tweak split function a bit but not able to understand how, because there are so many hyphens and new lines in the string.

P.S.

If i try splitting only at new line then the line - I am string-three \n with new line breaks into two parts which again is not correct.

EDIT:

Please know that this data inside string is incorrectly formatted just like what is shown above. It is coming from an excel file which I have received. I am trying to use apache poi to extract all the content out of each excel cell in a form of a string.

I intentionally tried to keep the format like what client gave me. For those who are confused about description inside A, I have changed it because I cannot post the contents on here as it is against privacy of my workplace.

Nimantha
  • 6,405
  • 6
  • 28
  • 69
Radheya
  • 779
  • 1
  • 11
  • 41
  • 1
    I can't read your description of `A`. Please post real code. – Elliott Frisch Feb 23 '18 at 16:35
  • So basically (1) you want to remove line separators if they don't have `-` after them, (2) and then split on remaining line separators. – Pshemo Feb 23 '18 at 16:41
  • @Pshemo yes that's right – Radheya Feb 23 '18 at 16:44
  • @ElliottFrisch Some of the data in files I am working with is meant for privacy so thats why I didnt pasted the real data here. but the format which I took is just as shown here and it comes from an excel cell – Radheya Feb 23 '18 at 16:51
  • @DhruvJ That's fine, but I can't follow the description. Give me an actual `String A` constant. What is all this `// no space between '-' and 'I'`? – Elliott Frisch Feb 23 '18 at 16:56
  • @ElliottFrisch those are comments which I typed so that it becomes easy to understand. just a tip to a reader so that he/she doesn't think that the format is not correct of string – Radheya Feb 23 '18 at 17:00
  • @DhruvJ Is [this](https://stackoverflow.com/a/48952681/2970947) reading of your description correct? – Elliott Frisch Feb 23 '18 at 17:01
  • 1
    @Pshemo yes i understand. I will also reformat my question now. I guess i forgot to add some syntax inside. – Radheya Feb 23 '18 at 17:17
  • 1
    All the downvotes here really wind me up, it's an honest question, maybe could be improved with better data, but why downvote rather than suggest? So much arrogance here at times – Steven Feb 23 '18 at 17:41

3 Answers3

1

This is how I would do:

import java.util.*;

public class MyClass {
    public static void main(String args[]) {
        String A = "- I am string one \n" +
        "    -I am string two\n" +
        "    - I am string-three \n" +
        "    with new line\n" +
        "    -I am string-four\n" +
        "- I am string five";

        String[] s2 = A.split("\r?\n");
        List<String> lines = new ArrayList<String>();
        String line = "";
        for (int i = 0; i < s2.length; i++) {
            String ss = s2[i].trim();
            if (i == 0) { // first line MUST start with "-"
                line = ss.substring(1).trim();
            } else if (ss.startsWith("-")) {
                lines.add(line);
                ss = ss.substring(1).trim();
                line = ss;
            } else {
                line = line + " " + ss;
            }
        }
        lines.add(line);

        System.out.println(lines.toString());
    }
}

I hope it helps.

A little explanation:

I will process line by line, trimming each one. If it starts with '-' it means the end of the previous line, so I include it in the list. If not, I concatenate with the previous line.

Rafael Paulino
  • 570
  • 2
  • 9
  • Thank you for this, I understood your logic well. However I must accept answer of Pshemo because it fits really well to my code. Also one advantage of using his technique is it takes too less space and I got to learn some more regular expressions – Radheya Feb 23 '18 at 17:15
1

You can

  1. remove line separators (replace it with space) if they don't have - after it (in next line): .replaceAll("\\R(?!-)", " ") should do the trick
    • \R (written as "\\R" in string literal) since Java 8 can be used to represent line separators
    • (?!...) is negative-look-ahead mechanism - ensures that there is no - after place in which it was used (will not include it in match so we will not remove potential - which ware matched by it)
  2. then remove - placed at start of each line (lets also include followed whitespaces to trim start of the string). In other words replace - placed

    • after line separators: can be represented by "\\R"
    • after start of string: can be represented by ^

    This should do the trick: .replaceAll("(?<=\\R|^)-\\s*","")

  3. split on remaining line separtors: .split("\\R")

Demo:

String text =
        "- I am string one\n" +
        "-I am string two\n" +
        "- I am string-three\n" +
        "with new line\n" +
        "-I am string-four\n" +
        "- I am string five";

String[] split = text.replaceAll("\\R(?!-)", " ") 
                     .replaceAll("(?<=\\R|^)-\\s*","") 
                     .split("\\R");
for (String s: split){
    System.out.println("'"+s+"'");
}

Output (surrounded with ' to show start and end of results):

'I am string one'
'I am string two'
'I am string-three with new line'
'I am string-four'
'I am string five'
Pshemo
  • 122,468
  • 25
  • 185
  • 269
  • thank you very much. finally the problem is solved. It was rough ride asking this question. Probably I must reformat my string in question. Your solution is working well for me. Sorry for all the confusion – Radheya Feb 23 '18 at 17:16
0

looks as if you are splitting the FIRST - of each line, so you need to remove every instance of a "newline -"

 str.replace("\n-", '\n')

then Remove the initial "-"

str = str.substring(1);
Swisstone
  • 1
  • 1