2

Here i am trying to find String within double quotes .

   List<String> getList(String value){
    String regex = "\"[^\"]*\"|[^,]+";
    List<String> allMatches = new ArrayList<String>();
    if (StringUtils.isNotBlank(value)) {
        Matcher m = Pattern.compile(regex).matcher(value);
        while (m.find() && StringUtils.isNotBlank(m.group())) {
            String str=m.group().replaceAll("^\"|\"$", "");
            allMatches.add(str.trim());
        }
    }
    return allMatches;
  }

  result = getList(400,test,\"don't split, this\",15);
  result have [400,test,don't split, this,15] all comma seperated string except inside quotes.

It is working well for pattern "" but not for “” . "foo,bar", is different than "foo,bar" here is not working regex

Emma
  • 27,428
  • 11
  • 44
  • 69

3 Answers3

0

If the different quotes should match but should not be mixed, you could use an alternation to match either of the formats. If you don't want to match newlines, that can be added to the negated character class.

(?:“[^“”]+”|"[^"]+"|(?<=,|^)[^“”,"]+(?=(?:,|$)))

Explanation

  • (?: Non capturing group
    • “[^“”]+” Match variant , then not or and match variant
    • | Or
    • "[^"]+" Match, ", then not " and again "
    • | Or
    • (?<=,|^) Asert what on the left is comma or start of string
    • [^“”,"]+ Match any char that is not in the character class
    • (?=(?:,|$)) Assert what on the right is a comma or end of the string
  • ) Close non capturing group

Regex demo | Java demo

The whole pattern is an alternation which has 3 options. The first 2 options match from an opening till a closing quote.

The third option matches all except any of the type of quotes or comma, but makes sure that at the start and the end of the match there is either a comma or the start or end of the string.

The fourth bird
  • 154,723
  • 16
  • 55
  • 70
  • 1
    Yes, the pattern captures the `""` and `“”` quotes correctly! – ecle Jun 01 '19 at 09:18
  • @The fourth bird now its misses those strings which are seperated by comma – Vivek Sahni Jun 01 '19 at 12:53
  • Do you have an example: – The fourth bird Jun 01 '19 at 12:54
  • @Thefourthbird https://stackoverflow.com/questions/18893390/splitting-on-comma-outside-quotes have a look at this this is exactly what i want but in my case `RegEx` unable to to do so in some cases – Vivek Sahni Jun 01 '19 at 13:06
  • Did you try the updated pattern `"[^",]+"|“[^“”,]+”` In that case I think you could add those cases for which it does not work to your question. – The fourth bird Jun 01 '19 at 13:10
  • @Thefourthbird it is also not working see the link in my edited question. – Vivek Sahni Jun 01 '19 at 13:16
  • @Thefourthbird its working but very hard to understand – Vivek Sahni Jun 01 '19 at 17:03
  • I have added an explanation in the answer. Which part is not yet clear? Maybe I can explain it. The whole pattern is an alternation which has 3 options. The first 2 options match from an opening till a closing quote. The third option matches all except any of the type of quotes or comma, but makes sure that at the start and the end there is either a comma or the start or end of the string. – The fourth bird Jun 01 '19 at 17:04
0

You can do a CSV style hybrid using your Java code, but the regex has to be changed.

Java

import java.util.*;
import java.lang.*;
import java.io.*;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
/* Name of the class has to be "Main" only if the class is public. */
class Ideone
{

    public static List<String> getList(String value)
    {
        String regex = "(?:(?:^|,|\\r?\\n)\\s*)(?:(?:(\"[^\"\\\\]*(?:\\\\[\\S\\s][^\"\\\\]*)*\"|“[^“”\\\\]*(?:\\\\[\\S\\s][^“”\\\\]*)*”))(?:\\s*(?:(?=,|\\r?\\n)|$))|([^,]*)(?:\\s*(?:(?=,)|$)))"; 
        List<String> allMatches = new ArrayList<String>();
        if ( value.length() > 0  )
        {
            Matcher m = Pattern.compile( regex ).matcher( value );
            while ( m.find() ) {
                String str = m.group(2);
                if ( str == null ) {
                    str = m.group(1);
                    str = str.replaceAll( "^[\"“”]|[\"“”]$", "" );
                }
                allMatches.add(str.trim());
            }
        }
        return allMatches;
    }


    public static  void main (String[] args) throws java.lang.Exception
    {
        List<String>  result = getList("400,test,\"QT_don't split, this_QT\",15");
        System.out.println( result );

        result = getList("500,test,“LQT_don't split, this_RQT”,15");
        System.out.println( result );

        result = getList("600,test,\"QT_don't split, this_QT\",15");
        System.out.println( result );

    }
}

https://ideone.com/b8Wnz9

Output

[400, test, QT_don't split, this_QT, 15]
[500, test, LQT_don't split, this_RQT, 15]
[600, test, QT_don't split, this_QT, 15]

Regex Expanded

 (?:
      (?: ^ | , | \r? \n )          # Delimiter comma or newline
      \s*                           # leading optional whitespaces
 )
 (?:                           # Double Quoted field
      (?:
           "                             # Quoted string field ""
           (                             # (1), double quoted string data
                [^"\\]* 
                (?: \\ [\S\s] [^"\\]* )*
           )
           "

        |                              # or

           “                             # Quoted string field Left/right double quotes “”   
           (                             # (2), double quoted string data
                [^“”\\]* 
                (?: \\ [\S\s] [^“”\\]* )*
           )
           ”
      )
      (?:
           \s*                           # trailing optional whitespaces
           (?:
                (?= , | \r? \n )              # Delimiter ahead, comma or newline
             |  $ 
           )
      )
   |                              # OR
      ( [^,]* )                     # (3), Non quoted field
      (?:
           \s*                           # trailing optional whitespaces 
           (?:
                (?= , )                       # Delimiter ahead, comma
             |  $ 
           )
      )
 )
-1

Try this:

Pattern regex = Pattern.compile("[\"\u201C](.*)[\"\u201D]");
List<> allMatches = new ArrayList<String>();
Matcher m = regex.matcher(value);
while (m.find()) {
    allMatches.add(m.group(1).trim());
}

A lot simpler and does exactly what you want (matches things in either normal quotes, or 'nice' quotes, but not if you mix them or if you fail to start or close them).

rzwitserloot
  • 85,357
  • 5
  • 51
  • 72