0

I have CSV file which contains following line.

INPUT:

No,NAme,ID,Description
1,Stack,232,"ABCDEFGHIJKLMNO
 -- Jiuaslkm asdasdasd"
2,Queue,454,"PQRSTUVWXYZ
 -- Other 
 words here"
3,Que,4343,"sdfwerrew"

OUTPUT EXPECTED:

No,NAme,ID,Description
1,Stack,232,"ABCDEFGHIJKLMNO \n -- Jiuaslkm asdasdasd"
2,Queue,454,"PQRSTUVWXYZ \n -- Other \n  words here"
3,Que,4343,"sdfwerrew"

or

No,NAme,ID,Description
1,Stack,232,"ABCDEFGHIJKLMNO -- Jiuaslkm asdasdasd"
2,Queue,454,"PQRSTUVWXYZ -- Other  words here"
3,Que,4343,"sdfwerrew"

Is there any java regex pattern available to find and merge the lines based starting double quotes and end quotes?

Mister X
  • 3,406
  • 3
  • 31
  • 72
  • You could try with `"[\s\S]*?"` http://regexr.com/3fa0e – BackSlash Feb 15 '17 at 09:19
  • 1
    If all the input is in one character sequence and you enable multiline regex this should work. Try some regex and if you fail show us what you've tried. One side question though: you don't allow escaped double quotes in those strings, do you? – Thomas Feb 15 '17 at 09:20
  • 1
    @BackSlash `[\s\S]` means whitespace and non-whitespace which should thus be the same as `.` (anything). – Thomas Feb 15 '17 at 09:21
  • @BackSlash i can match regex but i need to merge those into single line – Mister X Feb 15 '17 at 09:22
  • once you have the right regex, all you have to do is to remove the new line characters (`\r\n`or`\n`) or replace it with their litteral form by escaping the backslash (`\\n`) – jhamon Feb 15 '17 at 09:23
  • Maybe are you seeking a way to replace those end of line which are inside of quotes into `\"` string before use that csv? – SerCrAsH Feb 15 '17 at 09:25
  • @jhamon newline characters \r\n for replace not worked – Mister X Feb 15 '17 at 09:27
  • @SerCrAsH \" worked in search value but replacement value doesn't match (\r\n) – Mister X Feb 15 '17 at 09:29
  • @Thomas No. `.` does not include newline characters by default. http://ideone.com/XcP5vL - You could also specify `Pattern.DOTALL` though. – BackSlash Feb 15 '17 at 09:29
  • One possible trick could be to replace linefeeds that aren't followed by a number : `str = str.replaceAll("\\r?\\n(?!\\d)", "")` – LukStorms Feb 15 '17 at 09:33
  • @BackSlash ah you're right, I'm so used to `DOTALL` that I completely forgot about that :) – Thomas Feb 15 '17 at 09:39
  • @Thomas,@ BackSlash,@LukStorms i have tried in below link.It doesn't replaced correctly.http://regexr.com/3fa0k – Mister X Feb 15 '17 at 09:41
  • Your expression works just fine, i.e. it matches the multiline text. If you want to replace any linebreaks in that match just apply a new epxression on the match. But I'm with GhostCat: you're probably better off using a regular CSV parser since those support escaped double quotes and other things as well. That's more flexible and less error prone than you messing with regex. – Thomas Feb 15 '17 at 10:34
  • Thanks for the accept! – GhostCat Feb 17 '17 at 04:53
  • @GhostCat CSVParsing is only worked for me so accepted as answer – Mister X Feb 17 '17 at 05:25

1 Answers1

3

You are going down the wrong path. Not everything should be solved using regular expressions. CSV parsing is one of those things.

Seriously: you are about to re-invent the wheel. And the wheel you are about to create will be deficient, and prone to break over and over again.

The sane approach: there are many existing CSV parsers for Java out there. They deal perfectly with multi-line values. So: use one of them (see here as starting point for the many choices you have)

There is a nice rule of thumb: when your regex becomes so complicated that you can't write it down yourself; then consider doing things differently. You are the person who owns this code; you will have to maintain and maybe enhance it - not those folks here that are able to write down a regex that solves this one flavor of CSV example input.

Community
  • 1
  • 1
GhostCat
  • 137,827
  • 25
  • 176
  • 248