Get Column in Haskell CSV and infer the column type

Question

I'm exploring a csv file in an interactive ghci session (in a jupyter notebook):

import Text.CSV
import Data.List
import Data.Maybe

dat <- parseCSVFromFile "/home/user/data.csv"
headers = head dat
records = tail dat

-- define a way to get a particular row by index
indexRow :: [[Field]] -> Int -> [Field]
indexRow csv index = csv !! index

indexRow records 1
-- this works! 

-- Now, define a way to get a particular column by index
indexField :: [[Field]] -> Int -> [Field]
indexField records index = map (\x -> x !! index) records

While this works if I know in advance the type of column 3:

map (\x -> read x :: Double) $ indexField records 3

How can I ask read to infer what the type might be when for example my columns could contain strings or num? I'd like it to try for me, but:

map read $ indexField records 3

fails with

Prelude.read: no parse

I don't care whether they are string or num, I just need that they are all the same and I am failing to find a way to specify that generally with the read function at least.

Weirdly, if I define a mean function like so:

mean :: Fractional a => [a] -> Maybe a
mean [] = Nothing
mean [x] = Just x
mean xs = Just (sum(xs) / (fromIntegral (length xs)))

This works:

mean $ map read $ indexField records 2
Just 13.501359655240003

But without the mean, this still fails:

map read $ indexField records 2
Prelude.read: no parse

try `(map read $ indexField records 2)::(Read a, Fractional a)=>[a]`. Moreover, if your columns could contain strings or num, you cannot put them in list, since the element of list must be same type. — assembly.jc, Dec 28 '18 at 10:21
@assembly.jc, I expect data to be consistent within a column, so there won't be mixed types, but I do expect each column may be different from each other column--column 1 is integers, columns 2 is doubles, column 3 is strings, etc. (Or if there is a mix within a column, you'd have to infer a list of strings.) — Mittenchops, Dec 28 '18 at 10:35
Even though the data type is consistent within a column, you still need to specify the type for `read`, or it can be inferred from context, for example: `sum $ map read $ indexField records 2`, compiler know you want numeric type and try to pick one of numeric type for you, in above example, it will try `Integer`. — assembly.jc, Dec 28 '18 at 11:22

score 2 · Accepted Answer · answered Dec 28 '18 at 10:37

Unfortunately, read is at the end of its wits when it comes to situations like this. Let's revisit read:

read :: Read a => String -> a

As you can see, a doesn't depend on the input, but solely on the output, and therefore of the context of our function. If you use read a + read b, then the additional Num context will limit the types to Integer or Double due to default rules. Let's see it in action:

> :set +t
> read "1234"
*** Exception: Prelude.read: no parse
> read "1234" + read "1234"
2468
it :: (Num a, Read a) => a

Ok, a is still not helpful. Is there any type that we can read without additional context? Sure, unit:

> read "()"
()
it :: Read a => a

That's still not helpful at all, so let's enable the monomorphism restriction:

> :set -XMonomorphismRestriction
> read "1234" + read "1234"
2468
it :: Integer

Aha. In the end, we had an Integer. Due to +, we had to decide on a type. Now, with the MonomorphismRestriction enabled, what happens on read "1234" without additional context?

> read "1234"
<interactive>:20:1
   No instance for (Read a0) arising from a use of 'read'
   The type variable 'a0' is ambiguous

Now GHCi doesn't pick any (default) type and forces you to chose one. Which makes the underlying error much more clear.

So how do we fix this? As CSV can contain arbitrary fields at run-time and all types are determined statically, we have to cheat by introducing something like

data CSVField = CSVString String | CSVNumber Double | CSVUnknown

and then write

parse :: Field -> CSVField

After all, our type needs to cover all possible fields.

However, in your case, we can just restrict read's type:

myRead :: String -> Double
myRead = read

But that's not wise, as we can still end up with errors if the column doesn't contain Doubles to begin with. So instead, let's use readMaybe and mapM:

columnAsNumbers :: [Field] -> Maybe [Double]
columnAsNumbers = mapM readMaybe

That way, the type is fixed, and we're forced to check whether we have Just something or Nothing:

mean <$> columnAsNumbers (indexFields records 2)

If you find yourself often using columnAsNumbers create an operator, though:

(!!$) :: [[Field]] -> Maybe [Double]
records !!$ index = columnAsNumbers $ indexFields records index

Get Column in Haskell CSV and infer the column type

1 Answers1

Linked