Lisp - Splitting Input into Separate Strings

Question

I'm trying to take user input and storing it in a list, only instead of a list consisting of a single string, I want each word scanned in to be its own string. Example:

> (input)
This is my input. Hopefully this works

would return:

("this" "is" "my" "input" "hopefully" "this" "works")

Taking note that I don't want any spaces or punctuation in my final list.

Any input would be greatly appreciated.

Checkout http://cl-cookbook.sourceforge.net/strings.html they have a bunch of common use case functions one of which is a simple space split which you could modify to remove punctuation and the like. — Daniel Williams, Mar 13 '13 at 18:48
The Cookbook continues here: https://lispcookbook.github.io/cl-cookbook/strings.html — Ehvince, Nov 26 '19 at 09:56

sds · Answer 1 · 2014-06-23T13:58:43.023

23

split-sequence is the off-the-shelf solution.

you can also roll your own:

(defun my-split (string &key (delimiterp #'delimiterp))
  (loop :for beg = (position-if-not delimiterp string)
    :then (position-if-not delimiterp string :start (1+ end))
    :for end = (and beg (position-if delimiterp string :start beg))
    :when beg :collect (subseq string beg end)
    :while end))

where delimiterp checks whether you want to split on this character, e.g.

(defun delimiterp (c) (or (char= c #\Space) (char= c #\,)))

or

(defun delimiterp (c) (position c " ,.;/"))

PS. looking at your expected return value, you seem to want to call string-downcase before my-split.

PPS. you can easily modify my-split to accept :start, :end, :delimiterp &c.

PPPS. Sorry about bugs in the first two versions of my-split. Please consider that an indicator that one should not roll one's own version of this function, but use the off-the-shelf solution.

edited Jun 23 '14 at 13:58

answered Mar 13 '13 at 18:58

sds

58,617
29
161
278

I find plenty of material on split-sequence, but apparently I need to import the cl-utilities package, which I just can't figure out how to do =/ #imanewb – Sean Evans Mar 13 '13 at 19:18
2

@SeanEvans: careful! `import` is a CL function which you do *not* want here! what you need is *install* the package using, e.g., *quicklisp*: `(ql:quickload "split-sequence")` – sds Mar 13 '13 at 19:31
@sds: Your edit broke your code (for instance, test with `""` and `"a"`). – MicroVirus Jun 23 '14 at 09:19
To clarify, the first code can't handle strings that end with a delimiter (e.g. `"abc "`), and the second version most of the times fails to get the last token (e.g. `"ab cd" -> ("ab")`). – MicroVirus Jun 23 '14 at 10:34
I think I fixed the code now. Sorry about the bugs. – sds Jun 23 '14 at 13:59
I don't know the details but I have to change delimiterp into #'delimiterp for the code to work. – LLS Jun 23 '14 at 18:05
This is useful. However, when I try to install "split-sequence", it seems to run successfully, but then complains that this function is unknown. What could cause this? (I am using Aquamacs with SBCL and SLIME). – ArtforLife Nov 25 '16 at 00:42
Also, if one uses "my-split" function from above, is it possible to split on an empty character? That is, is it possible to do a character-by-character split? – ArtforLife Nov 25 '16 at 00:43
char-by-char split is easier done by `coerce` to `list` – sds Nov 25 '16 at 03:49
if you have problems installing `split-sequence`, you should ask for support from the vendor, not here. e.g., a separate question would be fine. – sds Nov 25 '16 at 03:49

hsrv · Answer 2 · 2019-03-28T10:16:48.363

9

For that task in Common-Lisp I found useful (uiop:split-string str :separator " ") and the package uiop, in general, has a lot of utilities, take a look at the docs https://common-lisp.net/project/asdf/uiop.html#index-split_002dstring.

edited Mar 28 '19 at 10:16

answered Jan 14 '19 at 13:12

hsrv

1,372
10
22

Ehvince · Answer 3 · 2019-11-26T09:59:56.530

There's cl-ppcre:split:

* (split "\\s+" "foo   bar baz
frob")
("foo" "bar" "baz" "frob")

* (split "\\s*" "foo bar   baz")
("f" "o" "o" "b" "a" "r" "b" "a" "z")

* (split "(\\s+)" "foo bar   baz")
("foo" "bar" "baz")

* (split "(\\s+)" "foo bar   baz" :with-registers-p t)
("foo" " " "bar" "   " "baz")

* (split "(\\s)(\\s*)" "foo bar   baz" :with-registers-p t)
("foo" " " "" "bar" " " "  " "baz")

* (split "(,)|(;)" "foo,bar;baz" :with-registers-p t)
("foo" "," NIL "bar" NIL ";" "baz")

* (split "(,)|(;)" "foo,bar;baz" :with-registers-p t :omit-unmatched-p t)
("foo" "," "bar" ";" "baz")

* (split ":" "a:b:c:d:e:f:g::")
("a" "b" "c" "d" "e" "f" "g")

* (split ":" "a:b:c:d:e:f:g::" :limit 1)
("a:b:c:d:e:f:g::")

* (split ":" "a:b:c:d:e:f:g::" :limit 2)
("a" "b:c:d:e:f:g::")

* (split ":" "a:b:c:d:e:f:g::" :limit 3)
("a" "b" "c:d:e:f:g::")

* (split ":" "a:b:c:d:e:f:g::" :limit 1000)
("a" "b" "c" "d" "e" "f" "g" "" "")

http://weitz.de/cl-ppcre/#split

For common cases there is the (new, "modern and consistent") cl-str string manipulation library:

(str:words "a sentence    with   spaces") ; cut with spaces, returns words
(str:replace-all "," "sentence") ; to easily replace characters, and not treat them as regexps (cl-ppcr treats them as regexps)

You have cl-slug to remove non-ascii characters and also punctuation:

 (asciify "Eu André!") ; => "Eu Andre!"

as well as str:remove-punctuation (that uses cl-change-case:no-case).

score 1 · Answer 4 · edited Mar 05 '15 at 00:14

; in AutoLisp usage (splitStr "get off of my cloud" " ") returns (get off of my cloud)

(defun splitStr (src delim / word letter)

  (setq wordlist (list))
  (setq cnt 1)
  (while (<= cnt (strlen src))

    (setq word "")

    (setq letter (substr src cnt 1))
    (while (and (/= letter delim) (<= cnt (strlen src)) ) ; endless loop if hits NUL
      (setq word (strcat word letter))
      (setq cnt (+ cnt 1))      
      (setq letter (substr src cnt 1))
    ) ; while

    (setq cnt (+ cnt 1))
    (setq wordlist (append wordlist (list word)))

  )

  (princ wordlist)

  (princ)

) ;defun

score -1 · Answer 5 · answered Jun 08 '17 at 14:44

-1

(defun splitStr (src pat /)
    (setq wordlist (list))
    (setq len (strlen pat))
    (setq cnt 0)
    (setq letter cnt)
    (while (setq cnt (vl-string-search pat src letter))
        (setq word (substr src (1+ letter) (- cnt letter)))
        (setq letter (+ cnt len))
        (setq wordlist (append wordlist (list word)))
    )
    (setq wordlist (append wordlist (list (substr src (1+ letter)))))
)

answered Jun 08 '17 at 14:44

Henadzi Siarchenia

1

2

While this may answer the question, it is always good to provide an explanation of your code and any references that may be helpful. Check out [answer] for details on answering questions. – Tim Hutchison Jun 08 '17 at 14:54

Lisp - Splitting Input into Separate Strings

5 Answers5

Linked