Breadth First Traversal in Haskell

Question

I need to find all the paths in a tree-like structure.

I recently defined Deep First traversal (or iterate) as follows:

dfIterate:: (a -> [a]) -> a -> [[a]]
dfIterate f a = map (a:) ([] : concatMap (dfIterate f) (f a))

It takes a seed a and a function a -> [a] (from each "a", you can get to multiple a's). The result is a list of paths starting from the seed. It works well:

ghci> let f a = if a == 1 then [2, 3] else if a == 2 then [4] else []
ghci> dfIterate f 1
[[1],[1,2],[1,2,4],[1,3]]

My function dfIterate iterates correctly and shows me all the paths. The function f simulates the tree above.

But how to make a "Breadth First Traversal"? The result in this case should be: [[1],[1,2],[1,3],[1,2,4]]. My first attempt:

bfIterate :: (a -> [a]) -> [a] -> [[a]]
bfIterate _ [] = [[]]
bfIterate f (a:as) = map (a:) (bfIterate f (as ++ (f a)))

I tried to use the second argument as a queue. I know I'm quite far from the result... Thanks

EDIT: This link gives interesting lines of approach: Breadth-First Search using State monad in Haskell. However, my question is about finding all paths (i.e. [[a]]), while that question is for finding single solutions (i.e. [a])

Does this answer your question? [Breadth-First Search using State monad in Haskell](https://stackoverflow.com/questions/28573287/breadth-first-search-using-state-monad-in-haskell) — alias, Nov 04 '21 at 12:00
@alias, thanks, your link gives interesting lines of approach. However, my question is about finding all paths (i.e. `[[a]]`), while that question is for finding single solutions (i.e. `[a]`). — cdupont, Nov 04 '21 at 12:20
I see no searching here, I see a traversal of an entire tree. — molbdnilo, Nov 04 '21 at 12:29

Will Ness · Accepted Answer · 2021-11-10T08:50:50.227

1

Correctness first, efficiency later.

bfPaths :: (a -> [a]) -> a -> [[a]]
bfPaths step seed  =  go [ (seed, [seed]) ]
 where
 go []  =  []
 go ((s, path) : q) = path :
   go (q ++ [ (x, path ++ [x]) | x <- step s ])

indeed maintaining a queue and adding the nodes to it, level by level.

go's argument is a list, used as a "first in first out" queue. It contains pairs of (node_value, that_node's_path). The initial node value is seed, and its path is thus [seed].

At each step, the queue is deconstructed into it head pair (s, path) and the rest of the queue, q. Then that path is returned, followed by the result of processing the rest of the q, with the next pairs produced by step function from this seed s, appended after the q: (q ++ [....]).

Appending at the end of the list at each step produces a queue, while prepending would have produced a "last in first out" stack.

This list is thus used as a "to-do" list, an "agenda", or a frontier of exploration of this implicit graph. With queue the exploration is breadth-first; with stack it is depth-first.

Trying it:

> bfPaths f 1
[[1],[1,2],[1,3],[1,2,4]]

Now you can look for ways to make it more efficient. In particular, repeated appending of singleton lists to build a result list leads to a quadratic behavior. The execution time will grow as the square of the input size.

The simplest improvement is to build the paths in reverse and either return them reversed (or reverse only on final processing, if you must), thus avoiding the quadratic slowdown:

bfPathsR :: (a -> [a]) -> a -> [[a]]
bfPathsR step seed  =  go [ (seed, [seed]) ]
 where
 go []  =  []
 go ((s, path) : q) = path :
   go (q ++ [ (x, [x] ++ path) | x <- step s ])

Looks like it's the best, too, since it allows for maximum sharing of structure while the paths are being built (in reverse) and of course adding new value at the front is O(1), so on the whole it will be linear (execution time growing as the size of the input).

If you want to only output the completed paths, i.e. such that have no continuation, you just add this condition to the code:

bfPathsRC :: (a -> [a]) -> a -> [[a]]
bfPathsRC step seed  =  go [ (seed, [seed]) ]
 where
 go []  =  []
 go ((s, path) : q) = [path | null next] ++
   go (q ++ [ (x, [x] ++ path) | x <- next ])
  where
  next = step s

-- bfPathsRC f 1  =  [[3,1],[4,2,1]]

edited Nov 10 '21 at 08:50

answered Nov 04 '21 at 13:33

Will Ness

70,110
9
98
181

the simplest improvement is to build paths in reverse and either return them reversed, or reverse only on output, if you must. another approach is to fuse in the `++` like in `bflist` [here](https://stackoverflow.com/a/60561480/849891). (but that refers to the `++` in the queue building, not paths, unfortunately). – Will Ness Nov 04 '21 at 14:25
Speaking of simple, I would really like to see the tree made explicit, separating the tree formation from the traversal. Reversing lists is fine, but I wonder how performance compares to using scheduled banker's queue or something like that. I would like to see more appends fused. – dfeuer Nov 06 '21 at 05:51
@dfeuer the problem is that `[a]` paths can not share structure (can't have shared tails). so the final reversing of each path is unavoidable. and building them in reverse does not require a queue. moreover, maintaining paths in reverse allows for maximal sharing of structure between them (shared prefixes, while still reversed). a win-win. thanks for the feedback, I've edited the answer. – Will Ness Nov 06 '21 at 06:35
I certainly agree that there needs to be some restructuring of the queue at the end. But list reversal is a bit harsh: it introduces a pause linear in the list length and it smashes the cache. So I think it's worth comparing it to something a little gentler. I'm not saying a fancy queue will turn out to be better, but it's not *obviously* worse. – dfeuer Nov 06 '21 at 06:41
@dfeuer there are two appending pipelines going on here -- in the paths and in the queue. are you talking about the latter? I've addressed it in the first comment. in regards this queue there;s no reversing. so I don't see exactly what you're concerned with. :) if you're concerned about the final reversings of the paths, it can only be avoided by changing their representation to "open lists" `([a], Int)`, but then we can only share structure for paths along the same path to the leaf. so it'll come at the expense of more space usage. maybe it's worth it.... – Will Ness Nov 06 '21 at 06:45
The final reversings can be made gentler with an altogether different queue representation. – dfeuer Nov 06 '21 at 06:48
@dfeuer ah I see what you mean. you want it amortized along the building process. – Will Ness Nov 06 '21 at 06:51
I also wonder if the `q++` can be removed by passing in the tail or something. – dfeuer Nov 06 '21 at 06:52
More *deamortized*, but yeah. – dfeuer Nov 06 '21 at 06:53
@dfeuer I address this (`q++`) in the top comment. and perhaps it's not that critical an issue anyway. – Will Ness Nov 06 '21 at 06:54
@dfeuer so, "spreading it out, intertwining with the building" but that will come at a cost of more space usage, less sharing. it might have its impact on performance too... – Will Ness Nov 06 '21 at 06:56
@dfeuer I don't think any tricks with `[]` can help there. if we have `(front, rev_back)`, there's still the final `(front ++ reverse rev_back)` which retraces the `front` as well as `rev_back`. even switching to something like `Seq`, converting such sequence to a list in the end will be again linear, so that long pause comes back again. so I don't see any way around this atm if the paths are to be `[]`. keeping the paths as `Seq`s _in the function's result_ might be the only option. so, not `[[a]]` but `[Seq a]`. – Will Ness Nov 06 '21 at 07:15
Hi @WillNess, I implemented this in my code and it works well. Thank you so much. In order to understand your algo, I "ran" it by hand. However it's still very difficult to understand. So, if you have time, it would be nice if you can explain it more... E.g. what are in `s`, `path` and `q` in `go ((s, path) : q)`? How the algorithm moves in the graph? – cdupont Nov 09 '21 at 21:25
Regarding performance, from your comments it's not clear to me if this version is best? – cdupont Nov 09 '21 at 21:30
I also wonder if there if a version that outputs only "completed" paths, e.g. `[[1,3],[1,2,4]]` – cdupont Nov 09 '21 at 21:33
don't have time atm, will address your comments later. but with complete paths only you run the risk of unproductive looping when there's a loop in the implicit graph. with the partial paths included as well, at least the looping ill be productive, i.e. will produce more and more partial paths. in real graphs you'd employ graph node identities, not just values contained in them, and so could detect the loops in the graph and cut them off. and *you've specifically asked for the partial paths in the question.* /later/ – Will Ness Nov 10 '21 at 04:56

Breadth First Traversal in Haskell

1 Answers1