11

So I've read the theory, now trying to parse a file in Haskell - but am not getting anywhere. This is just so weird...

Here is how my input file looks:

        m n
        k1, k2...

        a11, ...., an
        a21,....   a22
        ...
        am1...     amn

Where m,n are just intergers, K = [k1, k2...] is a list of integers, and a11..amn is a "matrix" (a list of lists): A=[[a11,...a1n], ... [am1... amn]]

Here is my quick python version:

def parse(filename):
    """
    Input of the form:
        m n
        k1, k2...

        a11, ...., an
        a21,....   a22
        ...
        am1...     amn

    """

    f = open(filename)
    (m,n) = f.readline().split()
    m = int(m)
    n = int(n)

    K = [int(k) for k in f.readline().split()]

    # Matrix - list of lists
    A = []
    for i in range(m):
        row = [float(el) for el in f.readline().split()]
        A.append(row)

    return (m, n, K, A)

And here is how (not very) far I got in Haskell:

import System.Environment
import Data.List

main = do
    (fname:_) <- getArgs
    putStrLn fname --since putStrLn goes to IO ()monad we can't just apply it
    parsed <- parse fname
    putStrLn parsed

parse fname = do
    contents <- readFile fname
    -- ,,,missing stuff... ??? how can I get first "element" and match on it?

    return contents

I am getting confused by monads (and the context that the trap me into!), and the do statement. I really want to write something like this, but I know it's wrong:

firstLine <- contents.head
(m,n) <- map read (words firstLine)

because contents is not a list - but a monad.

Any help on the next step would be great.

So I've just discovered that you can do:

 liftM lines . readFile

to get a list of lines from a file. However, still the example only only transforms the ENTIRE file, and doesn't use just the first, or the second lines...

Community
  • 1
  • 1
Andriy Drozdyuk
  • 58,435
  • 50
  • 171
  • 272

5 Answers5

10

The very simple version could be:

import Control.Monad (liftM)

-- this operates purely on list of strings
-- and also will fail horribly when passed something that doesn't 
-- match the pattern
parse_lines :: [String] -> (Int, Int, [Int], [[Int]])
parse_lines (mn_line : ks_line : matrix_lines) = (m, n, ks, matrix)
    where [m, n] = read_ints    mn_line
          ks     = read_ints    ks_line
          matrix = parse_matrix matrix_lines

-- this here is to loop through remaining lines to form a matrix
parse_matrix :: [String] -> [[Int]]
parse_matrix lines = parse_matrix' lines []
    where parse_matrix' []       acc = reverse acc
          parse_matrix' (l : ls) acc = parse_matrix' ls $ (read_ints l) : acc

-- this here is to give proper signature for read
read_ints :: String -> [Int]
read_ints = map read . words

-- this reads the file contents and lifts the result into IO
parse_file :: FilePath -> IO (Int, Int, [Int], [[Int]])
parse_file filename = do
    file_lines <- (liftM lines . readFile) filename
    return $ parse_lines file_lines

You might want to look into Parsec for fancier parsing, with better error handling.

*Main Control.Monad> parse_file "test.txt"
(3,3,[1,2,3],[[1,2,3],[4,5,6],[7,8,9]])
Cat Plus Plus
  • 125,936
  • 27
  • 200
  • 224
10

An easy to write solution

import Control.Monad (replicateM)

-- Read space seperated words on a line from stdin
readMany :: Read a => IO [a]
readMany = fmap (map read . words) getLine

parse :: IO (Int, Int, [Int], [[Int]])
parse = do
    [m, n] <- readMany
    ks     <- readMany
    xss    <- replicateM m readMany
    return (m, n, ks, xss)

Let's try it:

*Main> parse
2 2
123 321
1 2
3 4
(2,2,[123,321],[[1,2],[3,4]])

While the code I presented is quite expressive. That is, you get work done quickly with little code, it has some bad properties. Though I think if you are still learning haskell and haven't started with parser libraries. This is the way to go.

Two bad properties of my solution:

  • All code is in IO, nothing is testable in isolation
  • The error handling is very bad, as you see the pattern matching is very aggressive in [m, n]. What happens if we have 3 elements on the first line of the input file?
Tarrasch
  • 10,199
  • 6
  • 41
  • 57
9

liftM is not magic! You would think it does some arcane thing to lift a function f into a monad but it is actually just defined as:

liftM f x = do
  y <- x
  return (f y)

We could actually use liftM to do what you wanted to, that is:

[m,n] <- liftM (map read . words . head . lines) (readFile fname)

but what you are looking for are let statements:

parseLine = map read . words

parse fname = do
  (x:y:xs) <- liftM lines (readFile fname)
  let [m,n]  = parseLine x
  let ks     = parseLine y
  let matrix = map parseLine xs
  return (m,n,ks,matrix)

As you can see we can use let to mean variable assignment rather then monadic computation. In fact let statements are you just let expressions when we desugar the do notation:

parse fname = 
   liftM lines (readFile fname) >>= (\(x:y:xs) ->
   let [m,n]  = parseLine x
       ks     = parseLine y  
       matrix = map parseLine xs
   in return matrix )
HaskellElephant
  • 9,819
  • 4
  • 38
  • 67
  • So... __let__ allows me to do "non-monad"-like stuff, in the monad-like context? – Andriy Drozdyuk Dec 03 '11 at 22:32
  • @drozzy - erm...basically, yes. You might want to check out this recent question about `let`: http://stackoverflow.com/questions/8274650/in-haskell-when-do-we-use-in-with-let – Dan Burton Dec 04 '11 at 00:10
8

A Solution Using a Parsing Library

Since you'll probably have a number of people responding with code that parses strings of Ints into [[Int]] (map (map read . words) . lines $ contents), I'll skip that and introduce one of the parsing libraries. If you were to do this task for real work you'd probably use such a library that parses ByteString (instead of String, which means your IO reads everything into a linked list of individual characters).

import System.Environment
import Control.Monad
import Data.Attoparsec.ByteString.Char8
import qualified Data.ByteString as B

First, I imported the Attoparsec and bytestring libraries. You can see these libraries and their documentation on hackage and install them using the cabal tool.

main = do
    (fname:_) <- getArgs
    putStrLn fname
    parsed <- parseX fname
    print parsed

main is basically unchanged.

parseX :: FilePath -> IO (Int, Int, [Int], [[Int]])
parseX fname = do
    bs <- B.readFile fname
    let res = parseOnly parseDrozzy bs
    -- We spew the error messages right here
    either (error . show) return res

parseX (renamed from parse to avoid name collision) uses the bytestring library's readfile, which reads in the file packed, in contiguous bytes, instead of into cells of a linked list. After parsing I use a little shorthand to return the result if the parser returned Right result or print an error if the parser returned a value of Left someErrorMessage.

-- Helper functions, more basic than you might think, but lets ignore it    
sint = skipSpace >> int
int = liftM floor number

parseDrozzy :: Parser (Int, Int, [Int], [[Int]])
parseDrozzy = do
   m <- sint
   n <- sint
   skipSpace
   ks  <- manyTill sint endOfLine
   arr <- count m (count n sint)              
   return (m,n,ks,arr)

The real work then happens in parseDrozzy. We get our m and n Int values using the above helper. In most Haskell parsing libraries we must explicitly handle whitespace - so I skip the newline after n to get to our ks. ks is just all the int values before the next newline. Now we can actually use the previously specified number of rows and columns to get our array.

Technically speaking, that final bit arr <- count m (count n sint) doesn't follow your format. It will grab n ints even if it means going to the next line. We could copy Python's behavior (not verifying the number of values in a row) using count m (manyTill sint endOfLine) or we could check for each end of line more explicitly and return an error if we are short on elements.

From Lists to a Matrix

Lists of lists are not 2 dimensional arrays - the space and performance characteristics are completely different. Let's pack our list into a real matrix using Data.Array.Repa (import Data.Array.Repa). This will allow us to access the elements of the array efficiently as well as perform operations on the entire matrix, optionally spreading the work among all the available CPUs.

Repa defines the dimensions of your array using a slightly odd syntax. If your row and column lengths are in variables m and n then Z :. n :. m is much like the C declaration int arr[m][n]. For the one dimensional example, ks, we have:

fromList (Z :. (length ks)) ks

Which changes our type from [Int] to Array DIM1 Int.

For the two dimensional array we have:

let matrix = fromList (Z :. m :. n) (concat arr)

And change our type from [[Int]] to Array DIM2 Int.

So there you have it. A parsing of your file format into an efficient Haskell data structure using production-oriented libraries.

Thomas M. DuBuisson
  • 64,245
  • 7
  • 109
  • 166
  • Cool! Will this Array type allow me to query for things like 1st column or 2nd row? – Andriy Drozdyuk Dec 03 '11 at 22:30
  • It sure does. One question about that operation is [here](http://stackoverflow.com/questions/6170008/how-to-take-an-array-slice-with-repa-over-a-range). The general form, assuming you want column `i`, is `slice arr (Z :. All :. i)`. – Thomas M. DuBuisson Dec 04 '11 at 00:17
5

What about something simple like this?

parse :: String -> (Int, Int, [Int], [[Int]])
parse stuff = (m, n, ks, xss)
        where (line1:line2:rest) = lines stuff
              readMany = map read . words
              (m:n:_) = readMany line1
              ks = readMany line2
              xss = take m $ map (take n . readMany) rest

main :: IO ()
main = do
        stuff <- getContents
        let (m, n, ks, xss) = parse stuff
        print m
        print n
        print ks
        print xss 
lbolla
  • 5,387
  • 1
  • 22
  • 35
  • Are you sure this is correct? `(line1:line2:rest) = lines stuff` I thought it was supposed to be `[line1:line2:rest] = lines stuff` – Andriy Drozdyuk Dec 03 '11 at 22:37
  • Yes, I'm sure: [check this out](http://www.haskell.org/tutorial/patterns.html). Note that `[1:2:[3]] == [[1,2,3]]`, but `(1:2:[3]) == [1,2,3]`. – lbolla Dec 04 '11 at 00:00
  • @drozzy: `[foo]` is sugar for `(foo:[])`, and `[foo, bar]` is sugar for `(foo:bar:[])`. Pattern matching always uses parens, although with list sugar you don't actually see the parens until it gets desugared. – Dan Burton Dec 04 '11 at 00:12