4

I am new to the forum. I am currently trying to take this string:

65101km,Sedan,Manual,18131A,FWD,Used,5.5L/100km,Toyota,camry,SE,{AC,Heated Seats, Heated Mirrors, Keyless Entry},2010

and split it in order to get this:

65101km
Sedan
Manual
18131A
FWD
Used
5.5L/100km
Toyota
camry
SE
{AC, Heated Seats, Heated Mirrors, Keyless Entry}
2010

I have the following regex:

data_from_file.split(/[{},]+/)

But I am having a hard time keeping the set.

Any ideas?

Andres V.
  • 73
  • 4
  • Maybe this answer will be useful: https://stackoverflow.com/questions/42475528/split-a-ruby-string-by-colon-except-inside-parenthesis-using-regex will hel – vovan Nov 16 '18 at 15:57
  • In future, please ensure all values in examples are valid Ruby objects. Here that would mean putting the string in quotes and displaying the output as an array of strings (`["65101km", "Sedan",..., "2010"]`). Here your intent is clear, but if your array had been an input every reader who wanted to use it in code would have to convert it to a valid object. Also, it's helpful to assign a variable to all inputs (here just one) in your example (`str = "65101km,..."`), so readers can refer to those variables in answers and comments. In case you didn't know, you can upvote answers you checkmark. – Cary Swoveland Nov 17 '18 at 20:23

2 Answers2

1
str = "65101km,Sedan,Manual,18131A,FWD,Used,5.5L/100km,Toyota,camry,SE,{AC,Heated Seats, Heated Mirrors, Keyless Entry},2010"

r = /
    (?<=\A|,)  # match the beginning of the string or a comma in a positive lookbehind
    (?:        # begin a non-capture group
      {.*?}    # match an open brace followed by any number of characters,
               # lazily, followed by a closed brace
      |        # or
      .*?      # match any number of characters, lazily 
    )          # close non-capture group
    (?=,|\z)   # match a comma or the end of the string in a positive lookahead
    /x         # free-spacing regex definition mode

str.scan r
  #=> ["65101km", "Sedan", "Manual", "18131A", "FWD", "Used", "5.5L/100km", "Toyota",
  #    "camry", "SE", "{AC,Heated Seats, Heated Mirrors, Keyless Entry}", "2010"]

Two notes follow. I'll illustrate these with a simpler string.

str = "65101km,Sedan,{AC,Heated Seats},2010"

1. {.*?} must precede .*? in (?:{.*?}|.*?)

If

r = /(?<=\A|,)(?:.*?|{.*?})(?=,|\z)/

then

str.scan r
  #=> ["65101km", "Sedan", "{AC", "Heated Seats}", "2010"]

2. The matches .* must be lazy (aka non-greedy)

If

r = /(?<=\A|,)(?:{.*?}|.*)(?=,|\z)/

then

str.scan r
  #=> ["65101km,Sedan,{AC,Heated Seats},2010"]

If

r = /(?<=\A|,)(?:{.*}|.*?)(?=,|\z)/

then

"65101km,Sedan,{AC,Heated Seats},2010,{starter motor, pneumatic tires}".scan r
  #=> ["65101km", "Sedan", "{AC,Heated Seats},2010,{starter motor, pneumatic tires}"]
Cary Swoveland
  • 106,649
  • 6
  • 63
  • 100
  • Hey Cary, for some reason when I put this on regex101, it doesn't read the 2010 at the end. I thank you for the answer ! – Andres V. Nov 16 '18 at 16:03
  • 1
    What can I say? Ruby matches `"2010"`. Did you perchance test with a string that contains a space between the last comma and "2010"? – Cary Swoveland Nov 16 '18 at 16:21
  • just tried it out its perfect! Thanks guys. I had put a space like Gary said. – Andres V. Nov 17 '18 at 17:37
1

You may use

s.scan(/(?:{[^{}]*}|[^,])+/)

See the Rubular and Regex.101 demos.

Pattern details

  • (?: - start of a non-capturing group:
    • {[^{}]*} - {, 0 or more chars other than { and } and then }
  • | - or
    • [^,] - any 1 char other than ,
  • )+ - repeated 1 or more times.
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563