3

I am working in elisp and I have a string that represents a list of items. The string looks like

"apple orange 'tasty things' 'my lunch' zucchini 'my dinner'"

and I'm trying to split it into

("apple" "orange" "tasty things" "my lunch" "zucchini" "my dinner")

This is a familiar problem. My obstacles to solving it are less about the regex, and more about the specifics of elisp.

What I want to do is run a loop like :

  • (while (< (length my-string) 0) do-work)

where that do-work is:

  • applying the regex \('[^']*?'\|[[:alnum:]]+)\([[:space:]]*\(.+\) to my-string
  • appending \1 to my results list
  • re-binding my-string to \2

However, I can't figure out how to get split-string or replace-regexp-in-string to do that.

How can I split this string into values I can use?

(alternatively: "which built-in emacs function that does this have I not yet found?")

Community
  • 1
  • 1
Brighid McDonnell
  • 4,293
  • 4
  • 36
  • 61

4 Answers4

5

Something similar, but w/o regexp:

(defun parse-quotes (string)
  (let ((i 0) result current quotep escapedp word)
    (while (< i (length string))
      (setq current (aref string i))
      (cond
       ((and (char-equal current ?\ )
             (not quotep))
        (when word (push word result))
        (setq word nil escapedp nil))
       ((and (char-equal current ?\')
             (not escapedp) 
             (not quotep))
        (setq quotep t escapedp nil))
       ((and (char-equal current ?\')
             (not escapedp))
        (push word result)
        (setq quotep nil word nil escapedp nil))
       ((char-equal current ?\\)
        (when escapedp (push current word))
        (setq escapedp (not escapedp)))
       (t (setq escapedp nil)
        (push current word)))
      (incf i))
    (when quotep
      (error (format "Unbalanced quotes at %d"
                     (- (length string) (length word)))))
    (when word (push result word))
    (mapcar (lambda (x) (coerce (reverse x) 'string))
            (reverse result))))

(parse-quotes "apple orange 'tasty things' 'my lunch' zucchini 'my dinner'")
("apple" "orange" "tasty things" "my lunch" "zucchini" "my dinner")

(parse-quotes "apple orange 'tasty thing\\'s' 'my lunch' zucchini 'my dinner'")
("apple" "orange" "tasty thing's" "my lunch" "zucchini" "my dinner")

(parse-quotes "apple orange 'tasty things' 'my lunch zucchini 'my dinner'")
;; Debugger entered--Lisp error: (error "Unbalanced quotes at 52")

Bonus: it also allows escaping the quotes with "\" and will report it if the quotes aren't balanced (reached the end of the string, but didn't find the match for the opened quote).

  • Oh hey that's properly _parsing_ things, now. This answer was educational. :) – Brighid McDonnell Oct 11 '12 at 15:02
  • Improved my knowledge and was the only answer that met the answer-spec fully - *so accepted.* Thank you. :D – Brighid McDonnell Oct 11 '12 at 17:36
  • +1 for this answer three years later - I can't find this anywhere else on the whole internet :D Is there a package anywhere that provides this? There really should be. EDIT: apparently ```split-string-and-unquote``` pretty much does this but I can't make it do single quotes as well... hmm. – Commander Coriander Salamander May 16 '15 at 23:09
3

Here is a straightforward way to implement your algorithm using a temporary buffer. I don't know if there would be a way to do this using replace-regexp-in-string or split-string.

(defun my-split (string)
  (with-temp-buffer
    (insert string " ")     ;; insert the string in a temporary buffer
    (goto-char (point-min)) ;; go back to the beginning of the buffer
    (let ((result nil))
      ;; search for the regexp (and just return nil if nothing is found)
      (while (re-search-forward "\\('[^']*?'\\|[[:alnum:]]+\\)\\([[:space:]]*\\(.+\\)\\)" nil t)
        ;; (match-string 1) is "\1"
        ;; append it after the current list
        (setq result (append result (list (match-string 1))))
        ;; go back to the beginning of the second part
        (goto-char (match-beginning 2)))
      result)))

Example:

(my-split "apple orange 'tasty things' 'my lunch' zucchini 'my dinner'")
  ==> ("apple" "orange" "'tasty things'" "'my lunch'" "zucchini" "'my dinner'")
François Févotte
  • 19,520
  • 4
  • 51
  • 74
3

You might like to take a look at split-string-and-unquote.

Stefan
  • 27,908
  • 4
  • 53
  • 82
0

If you manipulate strings often, you should install s.el library via package manager, it introduces a huge load of string utility functions under a constistent API. For this task you need function s-match, its optional 3rd argument accepts starting position. Then, you need a correct regexp, try:

(concat "\\b[a-z]+\\b" "\\|" "'[a-z ]+'")

\| means matching either sequence of letters that constitute a word (\b means a word boundary), or sequence of letters and space inside quotes. Then use loop:

;; let s = given string, r = regex
(loop for start = 0 then (+ start (length match))
      for match = (car (s-match r s start))
      while match 
      collect match)

For an educational purpose, i also implemented the same functionality with a recursive function:

;; labels is Common Lisp's local function definition macro
(labels
    ((i
      (start result)
      ;; s-match searches from start
      (let ((match (car (s-match r s start))))
        (if match
            ;; recursive call
            (i (+ start (length match))
               (cons match result))
          ;; push/nreverse idiom
          (nreverse result)))))
  ;; recursive helper function
  (i 0 '()))

As Emacs lacks tail call optimization, executing it over a big list can cause stack overflow. Therefore you can rewrite it with do macro:

(do* ((start 0)
      (match (car (s-match r s start)) (car (s-match r s start)))
      (result '()))
    ((not match) (reverse result))
  (push match result)
  (incf start (length match)))
Mirzhan Irkegulov
  • 17,660
  • 12
  • 105
  • 166
  • While that's helpful, it's also important to note that s.el is a package that doesn't come with emacs. The better answers, especially the accepted one, accomplish the task with an equal or lesser level of code complexity but without involving third-party packages. Your proposed answer is more complex on multiple axes. – Brighid McDonnell Mar 21 '14 at 16:26
  • `s-match` + `loop` macro is not complex. Please see the updated answer, i tried to clarify. – Mirzhan Irkegulov Mar 21 '14 at 16:48