I will use three separate tricks to solve your question.
- Trick 1: Use the
pipes
library to stream file names concurrent with traversing the tree.
- Trick 2: Use the
StateT (Seq FilePath)
transformer to achieve a breadth-first traversal.
- Trick 3: Use the
MaybeT
transformer to avoid manual recursion when writing the loop and exit.
The following code combines these three tricks in one monad transformer stack.
import Control.Monad
import Control.Monad.Trans
import Control.Monad.Trans.Maybe
import Control.Monad.State.Lazy
import Control.Pipe
import Data.Sequence
import System.FilePath.Posix
import System.Directory
loop :: (Monad m) => MaybeT m a -> m ()
loop = liftM (maybe () id) . runMaybeT . forever
quit :: (Monad m) => MaybeT m a
quit = mzero
getUsefulContents :: FilePath -> IO [FilePath]
getUsefulContents path
= fmap (filter (`notElem` [".", ".."])) $ getDirectoryContents path
permissible :: FilePath -> IO Bool
permissible file
= fmap (\p -> readable p && searchable p) $ getPermissions file
traverseTree :: FilePath -> Producer FilePath IO ()
traverseTree path = (`evalStateT` empty) $ loop $ do
-- All code past this point uses the following monad transformer stack:
-- MaybeT (StateT (Seq FilePath) (Producer FilePath IO)) ()
let liftState = lift
liftPipe = lift . lift
liftIO = lift . lift . lift
liftState $ modify (|> path)
forever $ do
x <- liftState $ gets viewl
case x of
EmptyL -> quit
file :< s -> do
liftState $ put s
liftPipe $ yield file
p <- liftIO $ doesDirectoryExist file
when p $ do
names <- liftIO $ getUsefulContents file
-- allowedNames <- filterM permissible names
let namesfull = map (path </>) names
liftState $ forM_ namesfull $ \name -> modify (|> name)
This creates a generator of breadth-first file names that can be consumed concurrent with the tree traversal. You consume the values using:
printer :: (Show a) => Consumer a IO r
printer = forever $ do
a <- await
lift $ print a
>>> runPipe $ printer <+< traverseTree path
<Prints file names as it traverses the tree>
You can even choose to not demand all the values:
-- Demand only 'n' elements
take' :: (Monad m) => Int -> Pipe a a m ()
take' n = replicateM_ n $ do
a <- await
yield a
>> runPipe $ printer <+< take' 3 <+< traverseTree path
<Prints only three files>
More importantly, that last example will only traverse the tree as much as necessary to generate the three files and then it will stop. This prevents wastefully traversing the entire tree when all you wanted was 3 results!
To learn more about the pipes
library trick, consult the pipes tutorial at Control.Pipes.Tutorial
.
To learn more about the loop trick, read this blog post.
I couldn't find a good link for the queue trick for breadth first traversal, but I know it's out there somewhere. If somebody else knows a good link for this, just edit my answer to add it.