0

I have a file named test.txt, it contains

"hello this is a test file"

I want to read it from the file so that every word represents lists of character and every paragraph represents lists of words means that I want to store them into a nested list like:

(list(list (h e l l o)) (list(t h i s))(list(i s)) (list(a)) (list(t e s t)) (list(f i l e))))

I am totally new in lisp and have a lot of confusion about this problem.

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
Ashraf Yawar
  • 857
  • 1
  • 7
  • 15
  • 2
    Please don't mind adding details about what exactly is confusing? Also, have a look at [14. Files and File I/O](http://www.gigamonkeys.com/book/files-and-file-io.html) in PCL (and *the preceding chapters too first*). Right here there is a confusion between characters, written for example `#\A` or `#\Space` in Common Lisp (strings are made of such characters) , and symbols (objects that have a name, which is a string). Here `h`, `e`, etc. are symbols; you'll have to split your problem into subproblems, like: (1) how to read one character at a time from a file, and – coredump Oct 18 '19 at 12:22
  • 2
    (2) how to recognize words and paragraph from a stream of characters, etc. (a state machine?), etc. StackOverflow is however best suited at answering precise questions; maybe edit the question to clarify what is preventing you right now to try out things (bad environment setup? lack of information about where to find documentation? how to run programs? etc) – coredump Oct 18 '19 at 12:22

1 Answers1

0

Solution without any dependencies

(defun split (l &key (separators '(#\Space #\Tab #\Newline)) (acc '()) (tmp '()))
  (cond ((null l) (nreverse (if tmp (cons (nreverse tmp) acc) acc)))
        ((member (car l) separators)
         (split (cdr l) :separators separators 
                        :acc (if tmp (cons (nreverse tmp) acc) acc)
                        :tmp '()))
        (t 
         (split (cdr l) :separators separators
                        :acc acc
                        :tmp (cons (car l) tmp)))))

(defun read-file-lines (file-path)
  (with-open-file (f file-path :direction :input)
    (loop for line = (read-line f nil)
          while line
          collect line)))

(defun read-file-to-word-characters (file-path)
  (mapcan (lambda (s) (split (coerce s 'list))) 
          (read-file-lines file-path)))

(read-file-to-word-characters "~/test.lisp.txt")
;; ((#\h #\e #\l #\l #\o) (#\t #\h #\i #\s) (#\i #\s) (#\a) (#\t #\e #\s #\t)
;; (#\f #\i #\l #\e))

Convert the characters to one-letter strings:

;; apply to elements of nested list (= a tree) the conversion function `string`
(defun map-tree (fn tree)
  (cond ((null tree) '())
        ((atom tree) (funcall fn tree))
        (t (mapcar (lambda (branch) (map-tree fn branch)) tree))))

(map-tree #'string (read-file-to-word-characters "~/test.lisp.txt"))
;; (("h" "e" "l" "l" "o") ("t" "h" "i" "s") ("i" "s") ("a") ("t" "e" "s" "t")
;;  ("f" "i" "l" "e"))

Content of "~/test.lisp.txt":

hello this
is a test file

Solution using cl-ppcre (Edi Weitz's congenial regex package)

;; look here in an answer how to use cl-ppcre:split
;; https://stackoverflow.com/questions/15393797/lisp-splitting-input-into-separate-strings
(ql:quickload :cl-ppcre)

(defun read-file-lines (file-path)
  (with-open-file (f file-path :direction :input)
    (loop for line = (read-line f nil)
          while line
          collect line)))

(defun string-to-words (s) (cl-ppcre:split "\\s+" s))
(defun to-single-characters (s) (coerce s 'list))

(defun read-file-to-character-lists (file-path)
  (mapcan (lambda (s) 
            (mapcar #'to-single-characters
                    (string-to-words s)))
          (read-file-lines file-path)))

(read-file-to-character-lists "~/test.lisp.txt")
;; ((#\h #\e #\l #\l #\o) (#\t #\h #\i #\s) (#\i #\s) (#\a) (#\t #\e #\s #\t)
;;  (#\f #\i #\l #\e))

;; or use above's function:
(map-tree #'string (read-file-to-character-lists "~/test.lisp.txt"))
;; (("h" "e" "l" "l" "o") ("t" "h" "i" "s") ("i" "s") ("a") ("t" "e" "s" "t")
;;  ("f" "i" "l" "e"))


;; or:
(defun to-single-letter-strings (s) (cl-ppcre:split "\\s*" s))

(defun read-file-to-letter-lists (file-path)
  (mapcan (lambda (s) 
            (mapcar #'to-single-letter-strings
                    (string-to-words s)))
          (read-file-lines file-path)))

(read-file-to-letter-lists "~/test.lisp.txt")
;; (("h" "e" "l" "l" "o") ("t" "h" "i" "s") ("i" "s") ("a") ("t" "e" "s" "t")
;; ("f" "i" "l" "e"))
Gwang-Jin Kim
  • 9,303
  • 17
  • 30