2

I have a messages in file like below and I am using com.univocity.parsers.csv.CsvParser to split the string based on delimiter(in this case its -)

1-bc-"name"-def-address

1-abc-"name-def-address

I create my CsvParser object like

private val settings = new CsvParserSettings()
settings.getFormat.setDelimiter('-')
settings.setIgnoreLeadingWhitespaces(true)
settings.setIgnoreTrailingWhitespaces(true)
settings.setReadInputOnSeparateThread(false)
settings.setNullValue("")
settings.setMaxCharsPerColumn(-1)
val parser = new CsvParser(settings)

and parse the input message like :

    for (line <- Source.fromFile("path\\test.txt").getLines) {
  println(parser.parseLine(line).toList)
}

and the output is:

List(1, bc, name, def, address)
List(1, abc, name-def-address)

If you see the output you can see that for 1st message the string was split properly however for second message it takes everything as a value after first double quote. Does anyone know why the behavior is like this and how can I get the desired output? I am reading every message as a string to it should simple treat a quote/double quote as a character.

Community
  • 1
  • 1
Explorer
  • 1,491
  • 4
  • 26
  • 67

1 Answers1

3

Author of this library here. When the quote is found after your - delimiter, the parser will try to find a closing quote.

The easiest way around this is to make the parser simply ignore quotes with:

settings.getFormat().setQuote('\0');

Hope it helps.

Jeronimo Backes
  • 6,141
  • 2
  • 25
  • 29
  • Thanks for creating a wonderful utility :) Can you please explain what that `\0` stands for ? – Explorer Mar 01 '19 at 04:31
  • Glad to help. `\0` is the null character. – Jeronimo Backes Mar 01 '19 at 04:34
  • I see one more issue here, I have a record like: `10 0` i.e. values are separated with group separator, `1^]0^]^@^@^@^@^@^@^]0^]` however I have char "^@" i.e. ascii "000" and when I add the setting `settings.getFormat().setQuote('\0');`and tries to split the string it ignores the delimiter after "^@" and if I removes the setting it properly splits the string. Can you please help me here? – Explorer Mar 13 '19 at 16:12
  • 1
    Use a non-character then: '\uFFFF' instead of '\0' – Jeronimo Backes Mar 13 '19 at 16:15
  • so I need to have `settings.getFormat().setQuote('\uFFFF');` ? I need to ignore quote and non-character both and I tried adding both the settings but still facing same issue. – Explorer Mar 13 '19 at 16:22
  • never mind, Thanks for your help `'\uFFFF'` this solves both. – Explorer Mar 13 '19 at 16:29