4

Possible Duplicate:
Parsing CSV input with a RegEx in java

I have an input file in which each line has input values of string of the following form:

   "  ab  cd  " ,    "  efgh,ijk.",  4,"lmno"

i.e.,

  1. The words are either in quotes or they have no quotes.
  2. The space before and after the start and end word respectively is not allowed.

EDIT: 3. It can have inputs just separated by commas.(abc,"Hi Mary,Joe",5)

Using .Split() in java, I need a regular expression to output this:

ab  cd
efgh,ijk.
4
lmno

I tried this:

[^",]*[\",]

But this doesn't work on "efgh,ijk."

Here is a link for regex testing: http://regexpal.com/ I need some assistance on this. Please help. Thank you

Community
  • 1
  • 1
Crocode
  • 3,056
  • 6
  • 26
  • 31
  • 4
    That's the [CSV format](http://tools.ietf.org/html/rfc4180). There are [plenty of](https://www.google.com/search?q=java+csv+parser) parsers for this format. You could even easily [write one](http://stackoverflow.com/a/2241950) yourself. Don't try to use regex for non-regular patterns, it'll only hurt you. – BalusC Nov 06 '12 at 21:17
  • How is `5, ebo"eu ooeu" euoe, oeuou` supposed to be handled? – durron597 Nov 06 '12 at 21:17
  • @durron597 - That violates OP's condition for what the input looks like. Values should either have no quotes or be _surrounded_ by quotes. – Ted Hopp Nov 06 '12 at 21:23
  • @TedHopp his program (if it's going to be in production) should handle illegal input. The answer to my question could be "throw an exception" – durron597 Nov 06 '12 at 21:24
  • @BalusC - looks like a regular expression to me. How not so? – Ed Staub Nov 06 '12 at 21:49

4 Answers4

2

DEMO

Regex pattern: (?:\s*(?:\"([^\"]*)\"|([^,]+))\s*,?)+?

Update for null values: (?:\s*(?:\"([^\"]*)\"|([^,]+))\s*,?|(?<=,)(),?)+? DEMO

An example of it working, I know it's kinda CSV Format but as long as you dont write really really weird things it'll match all of them.

Matcher ma = Pattern.compile("(?:\\s*(?:\\\"([^\\\"]*)\\\"|([^,]+))\\s*,?)+?").matcher("   \"  ab  cd  \" ,    \"  efgh,ijk.\",  4,\"lmno\"");
while (ma.find()) {
    if (ma.group(1) == null) {
        System.out.println(ma.group(2));
    } else {
        System.out.println(ma.group(1));
    }
}

Edit, btw if you wanted us to give the code for you, don't tell us about a regex online tester, if you do so it's because you know how to handle regex, if you have no idea of how to do that, ask it too.

Javier Diaz
  • 1,791
  • 1
  • 17
  • 25
2

I suggest to find matches and then trim them to get final results.

Matcher m = Pattern.compile("\\s*(?:\"[^\"]*\"|(?:^|(?<=,))[^,]*)").matcher(s);
while (m.find()) {
  System.out.println(m.group().replaceAll("^\\s*\"?\\s*(.*?)\\s*\"?\\s*$", "$1"));
}

See this demo.

Ωmega
  • 42,614
  • 34
  • 134
  • 203
1

Try calling split() with (?:^\s*"\s*|\s*"\s*$|\s*"?\s*,\s*"?\s*) (demo).

This will match the comma contained in the one String as well, which is wrong in your case. But it's the only way if you are going to use split(). You could introduce some way of escaping the contained comma (like \,), which could easily be added to the regex.

Else you will have to use some other means of disscting the String, and split() won't help you.

barfuin
  • 16,865
  • 10
  • 85
  • 132
0

In case you don't want to use a regex, as a regex implies it is a 'regular' expression. "i think there is a pattern here" does not a regular expression make. They're good, they're fast, and I only use them when I completely control the input being fed into the regex.

//no development environment up, there may be compilation errors.
private static String[] csv(final String input){
  String[] inputArray = input.split(",");
  for(int i =0;i < inputArray.length;i++){
//org.apache.commons.lang.StringUtils
    String value = StringUtils.removeEnd(inputArray[i],"\"");
    value = StringUtils.removeStart(value,"\"");
    value = StringUtils.trim(value);

    inputArray[i] = value;
  }
  return inputArray;
}
DefyGravity
  • 5,681
  • 5
  • 32
  • 47