1

I have a csv file that is in this format:

22/09/2011 15:15:11 "AT45 - Km 2 +300   Foo " "PL - 0460" 70 096 123456_110922_151511_000001M.jpg 123456 "DBx 4U02" 428008 100 95 "AB123CD"
22/09/2011 15:15:16 "AT45 - Km 2 +300   Foo " "PL - 0460" 70 087 123456_110922_151516_000002M.jpg 123456 "DBx 4U02" 428008 100 95 "EF456GH"
22/09/2011 15:16:30 "AT45 - Km 2 +300   Foo " "PL - 0460" 70 079 123456_110922_151630_000005M.jpg 123456 "DBx 4U02" 428008 200 96 "LM789NP"

And I need a regex to split each value correctly, for example the first line would be:

22/09/2011
15:15:11
"AT45 - Km 2 +300   Foo "
"PL - 0460"
70 096 123456_110922_151511_000001M.jpg
123456
"DBx 4U02"
428008
100
95
"AB123CD"

I have found this regex: ([^,"]+|"([^"]|)*"), but it doesn't do the job quite well.

Can somebody give me a good hint?

lch
  • 2,028
  • 2
  • 25
  • 46
  • 6
    This shouldn't be done by regex, but by CSV parser. – Pshemo Apr 08 '19 at 15:42
  • 1
    See https://stackoverflow.com/questions/18144431/regex-to-split-a-csv bearing in mind your data is space delimited not comma delimiter. – Alex K. Apr 08 '19 at 15:45
  • This stuff is easy to solve if you iterate character by character and within the quotes you add the spaces to the current element and outside of quotes you iterate which element you're working on. The duplicate mark should redirect to a java question rather than python. – Tatarize Apr 08 '19 at 16:15
  • 1
    Did you try `("[^"]*"|[^\s]+)`? – dnep Apr 08 '19 at 16:34

1 Answers1

2

This kind of tasks are better handled with CSV parser. One of them is http://opencsv.sourceforge.net/ which allows us to specify your own separator (and many other things).

String csv =
        "22/09/2011 15:15:11 \"AT45 - Km 2 +300   Foo \" \"PL - 0460\" 70 096 123456_110922_151511_000001M.jpg 123456 \"DBx 4U02\" 428008 100 95 \"AB123CD\"\n" +
        "22/09/2011 15:15:16 \"AT45 - Km 2 +300   Foo \" \"PL - 0460\" 70 087 123456_110922_151516_000002M.jpg 123456 \"DBx 4U02\" 428008 100 95 \"EF456GH\"\n" +
        "22/09/2011 15:16:30 \"AT45 - Km 2 +300   Foo \" \"PL - 0460\" 70 079 123456_110922_151630_000005M.jpg 123456 \"DBx 4U02\" 428008 200 96 \"LM789NP\"";

CSVParser parser = new CSVParserBuilder().withSeparator(' ').build();

CSVReader reader = new CSVReaderBuilder(new StringReader(csv))
        .withCSVParser(parser)
        .build();

for (String[] row : reader){
    for (String str : row){
        System.out.println(str);
    }
    System.out.println("----");
}

Output (at least its beginning):

22/09/2011
15:15:11
AT45 - Km 2 +300   Foo 
PL - 0460
70
096
123456_110922_151511_000001M.jpg
123456
DBx 4U02
428008
100
95
AB123CD
----
Pshemo
  • 122,468
  • 25
  • 185
  • 269