15

Trying to write a module which returns the external IP address of my computer. Using Network.Wreq get function, then applying a lense to obtain responseBody, the type I end up with is Data.ByteString.Lazy.Internal.ByteString. As I want to filter out the trailing "\n" of the result body, I want to use this for a regular expression subsequently. Problem: That seemingly very specific ByteString type is not accepted by regex library and I found no way to convert it to a String.

Here is my feeble attempt so far (not compiling).

{-# LANGUAGE OverloadedStrings #-}

module ExtIp (getExtIp) where
import Network.Wreq
import Control.Lens
import Data.BytesString.Lazy
import Text.Regex.Posix

getExtIp :: IO String
getExtIp = do
    r <- get "http://myexternalip.com/raw"
    let body = r ^. responseBody
    let addr = body =~ "[^\n]*\n"
    return (addr)

So my question is obviously: How to convert that funny special ByteString to a String? Explaining how I can approach such a problem myself is also appreciated. I tried to use unpack and toString but have no idea what to import to get those functions if they exist.

Being a very sporadic haskell user, I also wonder if someone could show me the idiomatic haskell way of defining such a function. The version I show here does not account for possible runtime errors/exceptions, after all.

BitTickler
  • 10,905
  • 5
  • 32
  • 53
  • 2
    I think it is overkill to use regex in this case and go for a version that still uses `ByteString` you can implement something like `trim` trim easily with `Data.Char.isSpace` and `dropWhile` and `reverse`. – epsilonhalbe Jun 01 '16 at 14:54
  • to get an idea where you find functions - use goo… uhm no - [hoogle](https://www.haskell.org/hoogle/?hoogle=isspace) – epsilonhalbe Jun 01 '16 at 14:58
  • @epsilonhalbe So in my case, given I already imported ``Data.ByteString.Lazy`` it would be ``let body = Char8.unpack (r ^. responseBody)``? For me it yields: extip.hs:6:1: error: Failed to load interface for `Data.BytesString.Lazy' Perhaps you meant Data.ByteString.Lazy (from bytestring-0.10.8.1) Data.ByteString.Lens (from lens-4.14) Data.ByteString.Char8 (from bytestring-0.10.8.1) Use -v to see a list of the files searched for. Failed, modules loaded: none. – BitTickler Jun 01 '16 at 15:00
  • typo in import statement ("BytesString" -> "ByteString") - starts to look a bit better by now. – BitTickler Jun 01 '16 at 15:12
  • 1
    What a mess lol. Now I get: ``ghci> :t (unpack (r ^. responseBody))`` -> ``(unpack (r ^. responseBody)) :: [GHC.Word.Word8]`` Still not a String. – BitTickler Jun 01 '16 at 15:20
  • `ByteString` is made of bytes, not characters! There's some module in the package that helps fake it a bit, but I think only for strict bytestrings. – dfeuer Jun 01 '16 at 15:39
  • 2
    terrible situation on the string front. module system so needed here.. – nicolas Nov 24 '16 at 12:01
  • @nicolas One would think a string specific type class could sort out the mess but no one ever bothered to write one. – BitTickler Nov 25 '16 at 11:15

2 Answers2

19

Short answer: Use unpack from Data.ByteString.Lazy.Char8

Longer answer:

In general when you want to convert a ByteString (of any variety) to a String or Text you have to specify an encoding - e.g. UTF-8 or Latin1, etc.

When retrieving an HTML page the encoding you are suppose to use may appear in the Content-type header or in the response body itself as a <meta ...> tag.

Alternatively you can just guess at what the encoding of the body is.

In your case I presume you are accessing a site like http://whatsmyip.org and you only need to parse out your IP address. So without examining the headers or looking through the HTML, a safe encoding to use would be Latin1.

To convert ByteStrings to Text via an encoding, have a look at the functions in Data.Text.Encoding

For instance, the decodeLatin1 function.

ErikR
  • 51,541
  • 9
  • 73
  • 124
  • Given: ``let b1 = r ^. responseBody``, ``decodeLatin1 b1`` yields: :119:14: error: * Couldn't match expected type `Data.ByteString.ByteString' with actual type `Data.ByteString.Lazy.ByteString' NB: `Data.ByteString.Lazy.ByteString' is defined in `Data.ByteString.Lazy.Internal' `Data.ByteString.ByteString' is defined in `Data.ByteString.Internal' * In the first argument of `decodeLatin1', namely `b1' In the expression: decodeLatin1 b1 In an equation for `it': it = decodeLatin1 b1 – BitTickler Jun 01 '16 at 16:10
  • It's not easy to see from the docs, but decodeLatin1 requires a _strict_ ByteString. Use `toStrict` to convert the lazy string first. – ErikR Jun 01 '16 at 16:26
5

I simply do not understand why you insist on using Strings, when you have already a ByteString at hand that is the faster/more efficient implementation. Importing regex gives you almost no benefit - for parsing an ip-address I would use attoparsec which works great with ByteStrings.

Here is a version that does not use regex but returns a String - note I did not compile it for I have no haskell setup where I am right now.

{-# LANGUAGE OverloadedStrings #-}

module ExtIp (getExtIp) where
import Network.Wreq
import Control.Lens
import Data.ByteString.Lazy.Char8 as Char8
import Data.Char (isSpace)

getExtIp :: IO String
getExtIp = do
    r <- get "http://myexternalip.com/raw"
    return $ Char8.unpack $ trim (r ^. responseBody)
  where trim = Char8.reverse . (Char8.dropWhile isSpace) . Char8.reverse . (Char8.dropWhile isSpace)
epsilonhalbe
  • 15,637
  • 5
  • 46
  • 74
  • 3
    What I want is to return a canonical type which does not produce extra work (such as this SO question here). Had Wreq returned String I would not have the trouble I am facing. If the language has no canonical string type, whenever libraries are used, such conversion problems arise. Call me old fashioned...but I prefer to stick to one canonical type at interface boundaries. – BitTickler Jun 01 '16 at 15:58
  • extip.hs:14:41: error: * Couldn't match type `GHC.Word.Word8' with `Char' Expected type: GHC.Word.Word8 -> Bool Actual type: Char -> Bool * In the first argument of `B.dropWhile', namely `isSpace' In the first argument of `(.)', namely `(B.dropWhile isSpace)' In the second argument of `(.)', namely `(B.dropWhile isSpace) . Char8.reverse . (B.dropWhile isSpace)' ... – BitTickler Jun 01 '16 at 16:24
  • I think I fixed it - have no ghc at hand (as I already said). – epsilonhalbe Jun 01 '16 at 16:31