0

I want to split my string by (',') but ignore ',' if they are inside quotes. For example

" 2,2,4,'hello', 'world', 'hi, there' "

I want to split it so 'hi, there' will not be split into two different array elements in ruby. How can I do that? Probably use some regex?

EDIT: IF I use this, (from link to possible dublicate)

values = values.split(',(?=([^\"]*\"[^\"]*\")*[^\"]*$)', -1)

my string is split correctly, but now I can not use .delete_at() method on my array. Before I could do:

values.delete_at(20)
yerassyl
  • 2,958
  • 6
  • 42
  • 68
  • This is a great question. Why did you close it? – Cary Swoveland Jun 15 '15 at 05:17
  • 1
    @CarySwoveland: I assume because it is a duplicate, as the linked question has a regular expression that fits this question (even if the rest of it is in Java), particularly since the title explicitly asks for regexp. (If it wasn't, I would have been tempted to mention `CSV.parse_line`, but it's malformed for CSV.) – Amadan Jun 15 '15 at 05:19
  • 2
    Somewhat different. 1) Java needs to escape double quotes in regexps. Ruby does not. 2) The linked question had double-quoted substrings; you are using single quotes. 3) `-1` is unnecessary. 4) Ruby's `split` will split on a literal string or a regexp, you are using the wrong one. 5) Ruby's `split` will give you values of capture groups if you have any. `values.split(/,(?=(?:[^']*'[^']*')*[^']*$)/)` – Amadan Jun 15 '15 at 05:35
  • It is not an answer, it is a clarification of the answer in the duplicated question. (One can't post answers on closed questions.) – Amadan Jun 15 '15 at 05:38
  • @Amadan I just reopened this with my golden hammer in case you want to provide an answer. Java and Ruby are different enough that I don't think this was a dup. – mu is too short Jun 15 '15 at 06:15
  • @muistooshort one can easily understand the solution for this ques from the linked question.. What if the same question is asked in python or perl or php or ...? Regex is same. But in java, we have to escape backslash one more time because it treats a single baclskash char as escape sequence. Is there any solution other than @Amadan's? – Avinash Raj Jun 15 '15 at 06:20

2 Answers2

2

Very well. Taking inspiration from this answer, the regular expression you are looking for is:

values.split(/,(?=(?:[^']*'[^']*')*[^']*$)/)

This will not work if you have escaped quotes, for example (e.g. "'O\'Reilly\'s car'").

However, this looks a bit like an XY problem. If you want to parse CSV, as it seems, and if this was a compliant CSV, you could use CSV.parse or CSV.parse_line. It is not, due to extra spaces between column separators. Using standard formats and standard parsers is, if possible, almost always preferable to home-grown solutions.

Community
  • 1
  • 1
Amadan
  • 191,408
  • 23
  • 240
  • 301
  • You also use this split function `str.split(/('[^']*')|\s*,\s*|^\s+|\s+$/)` – Avinash Raj Jun 15 '15 at 06:28
  • @Avinash, there's a problem with that regex: `"'''".split(/('[^']*')|\s*,\s*|^\s+|\s+$/) #=> ["", "''", "'"]`. It should just return an array containing the receiver (as Amadan's does). – Cary Swoveland Jun 15 '15 at 06:59
1

Here's a non-regex solution:

str = " 2,2,4,'hello', 'world', 'hi, there' "

first_quote_read = false 
str.each_char.with_object (['']) do |c,a|
  if c == ?, && !first_quote_read
    a << ''
  else
    a[-1] << c
    first_quote_read = !first_quote_read if c == ?'      
  end
end
  #=> [" 2", "2", "4", "'hello'", " 'world'", " 'hi, there' "] 
Cary Swoveland
  • 106,649
  • 6
  • 63
  • 100
  • 1
    I failed to close this question as duplicate because of people like you should come with a non-regex solution.. – Avinash Raj Jun 15 '15 at 07:58