3

I have following 2 rows in a file:

16.1 14.3 8.8 7.0 7.85 13.29 18.75 13.08 13.10

6.7 5.4 6.39

I am able to split 1st row by using "\\s+" regex. But I cannot split 2nd row. I want to split above strings in such a way that I will get following output:

row[1] = [16.1, 14.3, 8.8, 7.0, 7.85, 13.29, 18.75, 13.08, 13.10]
row[2] = [6.7, 5.4, null, null, 6.39, null, null, null, null]

Below is the screenshot of what I have to parse :

enter image description here

Youcef LAIDANI
  • 55,661
  • 15
  • 90
  • 140
Pranit More
  • 496
  • 5
  • 13
  • 4
    Not sure that edit just now was a good idea because it definitely changed the formatting of the input. It now has a defined number of whitespaces as before it did not have. Could the original poster please confirm whether the number of whitespaces between your entries follows some definition? – Ben Mar 15 '18 at 12:30
  • 1
    Maybe to further elaborate, how would the file look if we replace the `18.75` in the first row with a `23132.3312`? Is that even possible? To say it simple: When there is no definition how your file is going to look like there isn't a way to parse it. – Ben Mar 15 '18 at 12:33
  • 2
    `"\\\s+"` is not a valid RegEx. It should be `"\\s+"` – Saif Ahmad Mar 15 '18 at 12:33
  • This seems to work: https://regex101.com/r/OFWVUP/1 – jrtapsell Mar 15 '18 at 12:34
  • What is the maximum length of single number? If it can grow to more than distance between two numbers, you won't be able to use regular expressions for the split. – vasek Mar 15 '18 at 12:35
  • @Ben, That edit was not a good idea because my original answer don't have specific number of spaces. I will re-edit it. – Pranit More Mar 15 '18 at 12:35
  • Okay. Maybe also add what definitions there are. Is the number of columns fixed? The spacing? The size of the numbers? Encoding? Stuff like this would be helpful to find some way to parse this. – Ben Mar 15 '18 at 12:37
  • That screenshot doesn't contain "empty" values. Anyway it looks like we can assume that columns contain some fixed amount of characters (including spaces). But is it same amount of characters for each column? – Pshemo Mar 15 '18 at 12:40
  • @Pshemo, I have replaced with the correct image. – Pranit More Mar 15 '18 at 12:41
  • is the data `TAB` spaced? – Saif Ahmad Mar 15 '18 at 12:58
  • @saifahmad, No data is not `TAB` spaced. Please look for @YCF_L's solution. It is the correct solution. – Pranit More Mar 15 '18 at 13:02
  • But his solution depends on fixed 7 spaces. – Saif Ahmad Mar 15 '18 at 13:05
  • @saifahmad, At first I didn't recognize that all the columns have specific length of characters, i.e. 7. That's why YCF_L's solution is correct. – Pranit More Mar 15 '18 at 14:25

4 Answers4

2

It seems that your inputs has a fixed length (7) between the start of the first number to the next start number :

16.1   14.3    8.8    7.0    7.85  13.29  18.75  13.08   13.10
^^^^^^^--------(7)

In this case you can split your input using this regex (?<=\\G.{7}) take a look at this :

String text1 = "16.1   14.3    8.8    7.0    7.85  13.29  18.75  13.08   13.10";
String text2 = "6.7    5.4                   6.39                             ";

String[] split1 = text1.split("(?<=\\G.{7})");
String[] split2 = text2.split("(?<=\\G.{7})");

Outputs

[16.1   , 14.3   ,  8.8   ,  7.0   ,  7.85  , 13.29  , 18.75  , 13.08  ,  13.10]
[6.7    , 5.4    ,        ,        ,  6.39  ,        ,        ,        ,       ]

Better Solution

If you want to get null instead of empty you can use :

List<String> result = Arrays.asList(text2.split("(?<=\\G.{7})"))
        .stream()
        .map(input -> input.matches("\\s*") ? null : input.trim())
        .collect(toList());

Outputs

[16.1, 14.3, 8.8, 7.0, 7.85, 13.29, 18.75, 13.08, 13.10]
[6.7, 5.4, null, null, 6.39, null, null, null, null]
Youcef LAIDANI
  • 55,661
  • 15
  • 90
  • 140
0

You can use streams and split rows then cells, resulting in a list of lists:

List<List<String>> matrix = Arrays.asList(text.split("\n"))
            .stream()
            .map(line -> Arrays.asList(line.split("\\s+")))
            .collect(Collectors.toList())

This gives you a 2D array/list of the values.

When tested with:

String text = "16.1   14.3    8.8    7.0    7.85  13.29  18.75  13.08   13.10\n" + " 6.7    5.4                   6.39";

That outputs:

[[16.1, 14.3, 8.8, 7.0, 7.85, 13.29, 18.75, 13.08, 13.10], [, 6.7, 5.4, 6.39]]
ernest_k
  • 44,416
  • 5
  • 53
  • 99
0

Use Guava's Splitter.fixedLength(int)

String[] rows = {
    "16.1   14.3    8.8    7.0    7.85  13.29  18.75  13.08   13.10",
    "6.7    5.4                   6.39                             "
  };
Splitter splitter = Splitter.fixedLength(7);
for(String row: rows) {
  List<String> data = splitter.splitToList(row);
  for (int i = 0; i < data.size(); i++) {
    System.out.printf("Column %d: %s%n", i+1, data.get(i));
  }
}
Olivier Grégoire
  • 33,839
  • 23
  • 96
  • 137
-1

TO me this seems like a fixed width file .

Please try following regex

.{7}

You can change value within curly braces depending on width of column,

.{column_width_goes_here}

Sample https://regex101.com/r/SZZxbB/1

Vaibhav Patil
  • 132
  • 1
  • 6