2

I'm struggling with finding the right regex for parsing a string containing key/value pairs. The string should be split on space when not surrounded by double quotes.

Example string:

2013-10-26    15:16:38:011+0200 name="twitter-message" from_user="MyUser" in_reply_to="null" start_time="Sat Oct 26 15:16:21 CEST 2013" event_id="394090123278974976" text="Some text" retweet_count="1393"

Desired output should be

2013-10-26
15:16:38:011+0200
name="twitter-message"
from_user="MyUser" 
in_reply_to="null" 
start_time="Sat Oct 26 15:16:21 CEST 2013" 
event_id="394090123278974976" 
text="Some text" 
retweet_count="1393"

I found this answer to get me near the desired result Regex for splitting a string using space when not surrounded by single or double quotes with regex :

Matcher m = Pattern.compile("[^\\s\"']+|\"[^\"]*\"|'[^']*'").matcher(str);
        while (m.find())
            list.add(m.group());

This gives a list of:

2013-10-26
15:16:38:011+0200
name=
"twitter-message"
from_user=
"MyUser"
in_reply_to=
"null"
start_time=
"Sat Oct 26 15:16:21 CEST 2013"
event_id=
"394090123278974976"
text=
"Some text"
retweet_count=
"1393"

It splits on = sign so there is still something missing to get to the desired output.

Community
  • 1
  • 1
Preben
  • 63
  • 2
  • 9

3 Answers3

0

Try: Matcher m = Pattern.compile("(?:[^\\s\"']|\"[^\"]*\"|'[^']*')+").matcher(str);

Your original regex could be understood as "match either a series of non-whitespace characters, or a quoted string". This one is "match a series of either non-whitespace characters or quoted strings".

pobrelkey
  • 5,853
  • 20
  • 29
0

Try maybe with this

[^\\s=]+(=\"[^\"]+\")?
  • [^\\s=]+ will find everything that is not space or = so for start_time="Sat Oct 26 15:16:21 CEST 2013" it will match start_time part.
  • (=\"[^\"]+\")? is optional and it will match ="zzz" part (where z can't be ")

Example

Matcher m = Pattern.compile("[^\\s=]+(=\"[^\"]+\")?").matcher(str);
while (m.find())
    System.out.println(m.group());

Output:

2013-10-26
15:16:38:011+0200
name="twitter-message"
from_user="MyUser"
in_reply_to="null"
start_time="Sat Oct 26 15:16:21 CEST 2013"
event_id="394090123278974976"
text="Some text"
retweet_count="1393"
Pshemo
  • 122,468
  • 25
  • 185
  • 269
0

This should work for you:

// if your string is str

// split on space if followed by even number of quotes
String[] arr = str.split(" +(?=(?:([^\"]*\"){2})*[^\"]*$)");
for (String s: arr)
   System.out.printf("%s%n", s);

OUTPUT:

2013-10-26
15:16:38:011+0200
name="twitter-message"
from_user="MyUser" 
in_reply_to="null" 
start_time="Sat Oct 26 15:16:21 CEST 2013" 
event_id="394090123278974976" 
text="Some text" 
retweet_count="1393"
anubhava
  • 761,203
  • 64
  • 569
  • 643