6

I have just finished reading a "Learn You a Haskell for Great Good!" book so my question can be very naive. What I don't understand is how to call "impure" IO functions from the pure code.

Here is a working example written in C#. In our business logic we plan some actions based on weather. We do it in usual C# manner.

interface IWeatherForecast
{
    WeatherData GetWeather(Location location, DateTime timestamp);
}

// concrete implementation reading weather from DB
class DbWeather : IWeatherForecast
{
    public override WeatherData GetWeather(Location location, DateTime timestamp)
    {...}
}

class WeatherFactory
{
    public IWeatherForecast GetWeatherProvider()
    {...}
}

// Business logic independent from any DB
class MaritimeRoutePlanner
{
    private IWeatherForecast weatherProvider = weatherFactory.GetWeatherProvider();

    public bool ShouldAvoidLocation(Location location, DateTime timestamp)
    {
        WeatherData weather = weatherProvider.GetWeather(location, timestamp);
        if(weather.Beaufort > 8)
            return true;
        else...
            ...
    }
}

How do I implement this logic in Haskell?

In reality "pure logical" MaritimeRoutePlanner calls weatherProvider.GetWeather() which is "impure IO" stuff.

Is it possible in Haskell? How would you model this in Haskell?

HTNW
  • 27,182
  • 1
  • 32
  • 60
Stefan Dorn
  • 569
  • 2
  • 13
  • 4
    It is possible but only by breaking the fundamental expectations of the language, compiler and programming model. See e.g. https://stackoverflow.com/questions/19371636/am-i-abusing-unsafeperformio – Yann Vernier Aug 29 '19 at 06:19
  • 3
    I highly recommend Mark Seemann's article series on how to get from thinking in terms of DI to structuring your program in a functional way: https://blog.ploeh.dk/2017/01/27/from-dependency-injection-to-dependency-rejection/ – Jonas Høgh Aug 29 '19 at 06:22
  • Yann, so how would you model this MaritimeRoutePlanner in Haskell in a clean way? – Stefan Dorn Aug 29 '19 at 06:22
  • Is the DB ever updated while the program is running? If it does not, does `ShouldAvoidLocation()` always return the same boolean result for the same arguments? – chi Aug 29 '19 at 07:55
  • It's updated 4 times/hour. – Stefan Dorn Aug 29 '19 at 08:07
  • 2
    If the DB changes while the program is running, `ShouldAvoidLocation()` is not a pure function, so it can not have type `... -> Bool`. It can have type `... -> IO Bool` or some other similar monadic type which indicate the "impurity" of `ShouldAvoidLocation()`. – chi Aug 29 '19 at 11:00
  • Related: https://stackoverflow.com/questions/41522491/is-there-a-way-to-place-some-impure-code-inside-pure-functions . – atravers Nov 17 '20 at 05:56

3 Answers3

12

The general question (how to call an impure function from a pure function) is a FAQ. See e.g. this question and its answers: How to return a pure value from a impure method

How you structure code in a more functional manner depends, like any other topic related to software architecture, on circumstances. What sort of program are you writing? A REST API? A smartphone app? A console program? A batch job? An add-in?

In many cases, you can get away with what I call an impure-pure-impure sandwich:

  1. Gather all required data from impure sources
  2. Call a pure function with that data
  3. Do something impure with the return value from the pure function

In Haskell you can do this because entry points are always impure. Here's a simple sketch of the weather decision problem. You start by defining the data on which you'll work. Here, I only include the beaufort value, but I assume that WeatherData would include more data than that (which is why I define it as data and not as a newtype).

data WeatherData = WeatherData { beaufort :: Int } deriving (Eq, Show)

You can now write the decision logic as a pure function:

shouldAvoidLocation :: WeatherData -> Bool
shouldAvoidLocation weather = beaufort weather > 8

Loading data is an entirely concrete operation:

readWeatherFromDb :: Location -> LocalTime -> IO WeatherData
readWeatherFromDb location timestamp = -- implementation goes here...

There's no explicit abstraction in place here. This function reads data and returns impure data. That could be the first (impure) step in the impure-pure-impure sandwich.

The entry point of the application can now be structured according to that architecture:

main :: IO ()
main = do
  w <- readWeatherFromDb Berlin $ LocalTime (fromGregorian 2019 8 29) (TimeOfDay 8 55 8)
  if shouldAvoidLocation w
    then putStrLn "Avoid"
    else putStrLn "Go"

The call to shouldAvoidLocation is the good, pure stuff in the middle of the sandwich, which is then followed by impure putStrLn calls.

Mark Seemann
  • 225,310
  • 48
  • 427
  • 736
  • The problem here is that we cannot gather all data in advance. This shouldAvoidLocation is buried deeply in the logic and it can be that shouldAvoidLocation is not called at all - for example the harbour rejects all vessels so we even don't need weather data. That's why we need data-on-demand inside because the business logic dictates us which data we need in each particular case. – Stefan Dorn Aug 29 '19 at 07:48
  • 1
    @StefanDorn First things first: there are ways to model [pure interactions](https://blog.ploeh.dk/2017/07/10/pure-interactions) if, for whatever reason, an impure-pure-impure sandwich isn't possible. – Mark Seemann Aug 29 '19 at 08:13
  • @StefanDorn This is, however, a case where thinking explicitly about architecture could help. I don't know the details of the software you'd like to develop, but I often hear that sort of counter-argument. It often turns out, however, that either the data is small or gets cached anyway, in which case it might not be a problem fetching it even if it isn't always used. After all, what problem are you trying to solve by sticking strictly to data-on-demand? – Mark Seemann Aug 29 '19 at 08:18
  • I would say it like this - algorithm for the business logic depends on data from IO (in this case DB). if(get IO a == something) then if(get IO b == something) then if(get IO c == something)... and it's nested many, many levels. Each step depends on some data from DB. – Stefan Dorn Aug 29 '19 at 10:42
  • @StefanDorn I understand that, but it's often the case that it might *still* be more efficient to just fetch all of the data in bulk. Again, I may be wrong on this particular occasion. I've never done any work with meteorological data, but I suppose that the data sets could be truly prohibitively big. In that case, I'd reach for a free monad as outlined in the above link. – Mark Seemann Aug 29 '19 at 10:48
  • @StefanDorn I do, however, fairly often run into questions like this, and once we start analysing how large the involved data sets are, it turns out that fetching everything up front (even the stuff you don't need) might be all of 1 MB. This should be contrasted with the size of average web pages or image sizes on said web pages. I don't know if your context is like that, but I often meet engineers who needlessly complicate their architecture to save fetching 100 kB of data... – Mark Seemann Aug 29 '19 at 10:52
  • @StefanDorn My overall point is this: the *impure-pure-impure sandwich* architecture is much simpler than free monads, so if there's any sort of trick I can pull to enable that architecture, I'll do that. If I truly can't, I'll go with free monads, but my experience is that I rarely need to do that. – Mark Seemann Aug 29 '19 at 10:56
  • thank you very much for interesting point! I will take a look at your link! – Stefan Dorn Aug 29 '19 at 10:57
5

In short, you don't pull data from an impure "function" (aka action); you push your pure function into a new action.

data WeatherData = WeatherData { beaufort :: Int, ... }

-- getWeather is a pure function
-- getWeather someLocation someDate is an action
getWeather :: Location -> DateTime -> IO WeatherData
getWeather l d = ...


-- badWeather is a pure function
badWeather :: WeatherData -> Bool
badWeather wd = beaufort wd > 8

-- avoidLocation is a pure function
-- avoidLocation someLocation someDate is an action
-- We can simply use fmap to lift (or wrap) the pure function badWeather
-- into a new action.
avoidLocation :: Location -> DateTime -> IO Bool
avoidLocation l d = fmap badWeather (getWeather l d)

avoidLocation doesn't actually produce a Boolean value; it creates an action that, when finally executed, uses badWeather to produce a Boolean value.

chepner
  • 497,756
  • 71
  • 530
  • 681
  • 1
    The abstraction in the original code can also be included easily by way of a type alias: `type WeatherSource = Location -> DateTime -> IO WeatherData`. Then `getWeatherProvider` may be an `IO WeatherSource`, or better, a pure function such as `Options -> WeatherSource`, and it may return `dbWeather` or another function. When you would add a method to `IWeatherForecast`, you can make `WeatherSource` a `data` type and add a field to it. – Jon Purdy Aug 30 '19 at 17:27
4

If the intertwining between effects and pure logic is too complex for a "sandwich"-based solution, one option is to parameterize your dependencies with the monad in which their effects take place, and then make your logic polymorphic over all monads.

For example, here's an approximate translation of your code:

{-# LANGUAGE ExplicitForAll #-}

data WeatherData = WeatherData -- dummy type
data Location = Location       -- dummy type
data DateTime = DateTime       -- dummy type

newtype WeatherForecast m = 
    WeatherForecast { getWeather :: Location -> DateTime -> m WeatherData }

-- simply a monadic action that creates a forecast
type WeatherFactory m = m (WeatherForecast m)

-- A concrete factory that works in the IO monad
aWeatherFactory :: WeatherFactory IO
aWeatherFactory = 
    do putStrLn "I'm effectfully allocating a WeatherForecast!"
       return 
            WeatherForecast {
                getWeather = \_ _ -> 
                    do putStrLn "I'm connecting to the Internet!"
                       return WeatherData
            }

newtype MaritimeRoutePlanner m =
    MaritimeRoutePlanner { shouldAvoidLocation :: m Bool }

-- The logic only knows that m is a monad and nothing more. 
makeMaritimeRoutePlanner :: forall m. Monad m 
                         => WeatherFactory m -> MaritimeRoutePlanner m
makeMaritimeRoutePlanner forecastFactory =
    MaritimeRoutePlanner {
        shouldAvoidLocation =
            do forecast <- forecastFactory
               WeatherData <- getWeather forecast Location DateTime
               return False
    }

Both WeatherForecast and WeatherFactory have a type parameter for the monad in which their methods have effects. In particular, aWeatherFactory returns a WeatherFactory that works over IO.

But notice the forall in the signature of makeMaritimeRoutePlanner. It forces the logic to work over all possible monads, which means it can't use functionality particular to any concrete monad.

An example of use:

*Main> let planner = makeMaritimeRoutePlanner aWeatherFactory
*Main> shouldAvoidLocation planner
I'm effectfully allocating a WeatherForecast!
I'm connecting to the Internet!
False

Passing your effectful dependencies as parameters (or as the environment of a Reader monad) is relatively common. The further trick of making the logic polymorphic over the monad is less popular, I think. Ultimately, living in IO might be too convenient to forgo, or at least not problematic enough to bother putting the "veil of polymorphism" in place.

(Of course, there are other possible solutions like free/freer monads and the like.)

danidiaz
  • 26,936
  • 4
  • 45
  • 95
  • 2
    There's also an `mtl`-style option where you offer `class Monad m => WeatherMonad m where ...` that offers the capacity to get weather data but not to perform arbitrary `IO`. This approach is particularly useful for mocking. – dfeuer Aug 29 '19 at 19:14
  • @dfeuer Yeah, I get the impression that the record-of-functions-in-IO, free/freer and MTL-like approaches are more used in practice than this polymorphic trick. I do like it though, at least conceptually, because it feels like a somewhat "natural" evolution of the record-of-functions-in-IO approach. – danidiaz Aug 29 '19 at 20:13
  • To my mind, the "factory" approach only makes sense if there can be two *different* monads: `m (WeatherForecast n)`, separating the effect of connecting to a weather server from the event of retrieving forecasts. – dfeuer Aug 29 '19 at 20:20