1

OK, this has been hurting my brain (if any) for some time now – yes, recursive functions are hard!

What I'm trying to achieve: Create an object that simulate a directory structure containing sub directories and files where directories becomes the key for an object containing filenames as keys with the corresponding file content as values for those keys (see fig. 2)

If I have a directory structure that looks like this:

Fig 1

LEVEL_1
    LEVEL_2
    |   LEVEL_3_1
    |   |   FILE_3_1_1
    |   |   FILE_3_1_2
    |   LEVEL_3_2
    |   |   FILE_3_2_1
    |   |   FILE_3_2_2
    |   |   LEVEL_4
    |   |   |   FILE_4_1
    |   |   |   FILE_4_2
    |   |   |   ... this could go on forever ...
    |   FILE_2_1
    |   FILE_2_2
    FILE_1_1
    FILE_1_2

I'd like to get an object that looks like this (the object itself represent LEVEL_1):

Fig 2

{
    LEVEL_2 : {
        LEVEL_3_1 : {
            FILE_3_1_1 : "FILE CONTENT",
            FILE_3_1_2 : "FILE CONTENT"
        },
        LEVEL_3_2 : {
            FILE_3_2_1 : "FILE CONTENT",
            FILE_3_2_2 : "FILE CONTENT"
            LEVEL_4 : {
                FILE_4_1 : "FILE CONTENT",
                FILE_4_2 : "FILE CONTENT"
            }
        },
        FILE_1_1 : "FILE CONTENT",
        FILE_2_1 : "FILE CONTENT"
    }
}

So, basically all DIRS become objects, and all containing files become keys on that object, and the file content becomes corresponding values.

I've managed to get this far, but have issues dynamically creating the nested objects based on this recursive function (basically, how do I check if a deeply nested object already exits and add another object to it):

    let views_dir = config.root + '/views/',
        vo = {};

    var walkSync = function( dir, filelist ) {
        var fs = fs || require('fs'),
            files = fs.readdirSync(dir);

        filelist = [];

        files.forEach(function( file ) {
            if ( fs.statSync( dir + file ).isDirectory() ) {

                /**
                 * Create nested object of namespaces in some dynamic fashion
                 * Check for current dir in object and add it as namespace in the right structure in vo (object) …
                 */

                vo[file] = {};

                filelist = walkSync(dir + file + '/', filelist);

                 filelist.forEach(function ( filename ) {
                    vo[file][filename.split('.')[0]] = "FILE CONTENT"; <-- I shouldn't have to be doing this in here since files are handled in the else clause below ... but, I told you, recursion makes my head spin.
                });
            } else {
                filelist.push(file);

                /**
                 * Add file to current namespace if any
                 */
                vo[file.split('.')[0]] = "FILE CONTENT";
            }
        });

        return filelist;
    };

    return walkSync( views_dir );

Now, I'm looking for some sort of way to dynamically add nested 'namespaces' to an object. I've been around creating arrays from dirs, and then trying to concatenate them into dot syntax, and all sort of other weird stuff ... now my brain just hurts and I need some help.

And, I've found hundreds of recursive functions online, that does everything else than what I need ...

  • 1
    I don't know about you, but many folders on my machine end up with dots (`.`s) in their names. A dot syntax would not work to describe my folder structures. – Scott Sauyet Aug 23 '19 at 14:30
  • Well, in this case there will be no file names with a . in it. In the end I simply do a check for what is returned ... and go from there. –  Aug 24 '19 at 16:01

2 Answers2

2

To verify any of this works, we first recreate the directory structure in the original question. I'm using unique file contents so we can verify file contents are properly matched with their corresponding keys -

$ mkdir -p level_1/level_2/level_3_1 level_1/level_2/level_3_2/level_4
$ echo "file_1_1 content" > level_1/file_1_1
$ echo "file_1_2 content" > level_1/file_1_2
$ echo "file_3_1_1 content" > level_1/level_2/level_3_1/file_3_1_1
$ echo "file_3_1_2 content" > level_1/level_2/level_3_1/file_3_1_2
$ echo "file_3_2_1 content" > level_1/level_2/level_3_2/file_3_2_1
$ echo "file_3_2_2 content" > level_1/level_2/level_3_2/file_3_2_2
$ echo "file_4_1 content" > level_1/level_2/level_3_2/level_4/file_4_1
$ echo "file_4_2 content" > level_1/level_2/level_3_2/level_4/file_4_2

Now our function, dir2obj which makes an object representation of a file system, starting with a root path -

const { readdir, readFile, stat } =
  require ("fs") .promises

const { join } =
  require ("path")

const dir2obj = async (path = ".") =>
  (await stat (path)) .isFile ()
    ? String (await readFile (path))
    : Promise
        .all
          ( (await readdir (path))
              .map
                ( p => 
                    dir2obj (join (path, p))
                      .then (obj => ({ [p]: obj }))
                )
          )
        .then (results => Object.assign(...results))

// run it
dir2obj ("./level_1")
  .then (console.log, console.error)

If your console is truncating the output object, you can JSON.stringify it to see all keys and values -

// run it
dir2obj ("./level_1")
  .then (obj => JSON.stringify (obj, null, 2))
  .then (console.log, console.error)

Here's the output -

{
  "file_1_1": "file_1_1 content\n",
  "file_1_2": "file_1_2 content\n",
  "level_2": {
    "level_3_1": {
      "file_3_1_1": "file_3_1_1 content\n",
      "file_3_1_2": "file_3_1_2 content\n"
    },
    "level_3_2": {
      "file_3_2_1": "file_3_2_1 content\n",
      "file_3_2_2": "file_3_2_2 content\n",
      "level_4": {
        "file_4_1": "file_4_1 content\n",
        "file_4_2": "file_4_2 content\n"
      }
    }
  }
}

Refactor with generics

The program above can be simplified by extracting out a common function, parallel -

// parallel : ('a array promise, 'a -> 'b promise) -> 'b array promise
const parallel = async (p, f) =>
  Promise .all ((await p) .map (f))

// dir2obj : string -> object
const dir2obj = async (path = ".") =>
  (await stat (path)) .isFile ()
    ? String (await readFile (path))
    : parallel // <-- use generic
        ( readdir (path) // directory contents of path
        , p =>           // for each descendent path as p ...
            dir2obj (join (path, p))
              .then (obj => ({ [p]: obj }))
        )
        .then (results => Object.assign(...results))

Including the root object

Notice the output does not contain the "root" object, { level_1: ... }. If this is desired, we can change the program like so -

const { basename } =
  require ("path")

const dir2obj = async (path = ".") =>
  ( { [basename (path)]: // <-- always wrap in object
      (await stat (path)) .isFile ()
        ? String (await readFile (path))
        : await parallel
            ( readdir (path)
            , p => dir2obj (join (path, p)) // <-- no more wrap
            )
            .then (results => Object.assign(...results))
    }
  )

dir2obj ("./level_4") .then (console.log, console.error)

The root object now contains the original input path -

{
  "level_4": {
    "file_4_1": "file_4_1 content\n",
    "file_4_2": "file_4_2 content\n"
  }
}

This version of the program has a more correct behavior. The result will always be an object, even if the input path is a file -

dir2obj ("./level_1/level_2/level_3_2/level_4/file_4_2")
  .then (obj => JSON.stringify (obj, null, 2))
  .then (console.log, console.error)

Still returns an object -

{
  "file_4_2": "file_4_2 content\n"
}

Rewrite using imperative style without async-await

In a comment you remark on the "unreadable" style above, but I find boilerplate syntax and verbose keywords highly unpalatable. In a style I suspect you'll recognize as more familiar, take notice of all the added chars -

const dir2obj = function (path = ".") {
  return stat(path).then(stat => {
    if (stat.isFile()) {
      return readFile(path).then(String)
    }
    else {
      return readdir(path)
        .then(paths => paths.map(p => dir2obj(join(path, p))))
        .then(Promise.all.bind(Promise))
        .then(results => Object.assign(...results))
    }
  }).then(value => {
    return { [basename(path)]: value }
  })
}

Our variables are more difficult to see because we have words like "function", "return", "if", "else", and "then" interspersed through the entire program. Countless {} are added just so the keywords can even be used. It costs more to write more — let that digest for a moment.

It's slightly better with the parallel abstraction, but not much, imo -

const parallel = function (p, f) {
  return p
    .then(a => a.map(f))
    .then(Promise.all.bind(Promise))
}

const dir2obj = function (path = ".") {
  return stat(path).then(stat => {
    if (stat.isFile()) {
      return readFile(path).then(String)
    }
    else {
      return parallel
        ( readdir(path)
        , p => dir2obj(join(path, p))
        )
        .then(results => Object.assign(...results))
    }
  }).then(value => {
    return { [basename(path)]: value }
  })
}

When we look back at the functional-style program, we see each character printed on the screen as representative of some program semantic. p ? t : f evaluates to t if p is true, otherwise f. We don't need to write if (...) { ... } else { ... } every time. x => a takes x and returns a because that's what arrow functions do, so we don't need function (x) { ... } or "return" every time.

I originally learned C-style languages having {} everywhere was a familiar feeling. Over time, I can look at p ? t : f or x => a and instantly understand exactly what the mean and I've come to appreciate not having all the other words and arcane symbols in my way.

There's an added benefit to writing program's in an expression-based style, though, too. Expressions are so powerful because they can be composed with one another to create more complex expressions. We begin to blur the lines between program and data, where everything is just pieces that can be combined like Lego. Even functions (sub-programs) become ordinary data values that we manipulate and combine, just like any other data.

Imperative programs rely on side-effects and imperative statements cannot be combined with one another. Instead, more variables are created to represent intermediate state, which means even more text on the screen and more cognitive load in the programmer's mind. In imperative style, we're forced to think about programs, functions, statements, and data as different kinds of things, and so there is no uniform way to manipulate and combine them.

Related: async and await are not statements

Still, both variants have the exact same behavior as the functional-style program. Ultimately the program's style is left to you, the programmer. Choose any style that you like best.


Similar problem

To gain more intuition on how to solve this kind of problem, please see this related Q&A

Mulan
  • 129,518
  • 31
  • 228
  • 259
  • Thank you very very much, really appreciate it. And it does indeed seem to produce 'something' along the lines I'm looking for. But I have a few issues, first of all I don't get what's going on. This is due to these 'fairly' new features (on my part) like Promises, async/await and what not. And the code is very compact (close to being a one liner), and that's probably good for keeping line count down, performance etc. But, it's not that good for humans. And it also returns an array – not a big deal, though. I'll dig in and see if I can manage to get the final result. Thanks again. ; ) –  Aug 23 '19 at 07:55
  • OK, I have a couple of challenges with this one, the first being readability. I'm now trying to 'expand' this into a more human readable form, and my third challenge is: how do I expand the: `(path = ".") => (await stat (path)) .isFile` into a normally formatted curly braces method for better readability? I need to be able to do more stuff within each function that fits into a one liner, and I'm having a seriously hard time converting this into something that gives me more flexibility in terms of being able to manipulate the incoming and outgoing data of there 'functional' methods ; ) –  Aug 23 '19 at 08:35
  • I've managed to do some of what I need. But I'm having trouble passing a Handlebars evaluated template as the file content back through the 'chain'. How do I change: ? `{ [basename (path)]: await readFile (path) }` into this: `eval( ( template ) => 'Handlebars.template(' + Handlebars.precompile( htmlclean( template ) ) + ')' )` so that it actually return the precompiled template? See my need of an ordinary if / else statement? –  Aug 23 '19 at 10:43
  • @Sam: please note that the style here is perhaps unusual, but it's not at all obscure. It is not an attempt toward minimalism but rather a way of working entirely with expressions rather than statements. It takes some getting used to, but I find that the work to understand this style pays off handsomely. It leads to a better understanding of how to break down a problem into meaningful component parts and how to describe a solution in terms of how the parts fit together rather than how to sequence some steps to get to a solution. I find it's more than worth the effort. – Scott Sauyet Aug 23 '19 at 14:41
  • @Sam I was able to get a working example put together in my edited answer. The one big thing I missed was forgetting to wrap `await readFile (path)` (which returns a Buffer) in `String(...)` (which converts a Buffer to a String). i also included an imperative-style rewrite so you understand the program in a way that's currently familiar to you. Both programs are identical in terms of what they output and how they arrive at that output. I hope you can translate from one to the other and see how the first revision of the program offers a different way to think about the problem. – Mulan Aug 23 '19 at 15:41
  • 1
    Nicely done as always. I believe there's a school of thought that says we simply see beyond the clutter when reading code, that `function` and `return` and `if-else` and their like disappear into the background. I find that school wrong. I don't espouse any particular minimalism, but the less noise introduced into our code, the easier it is to read. For this, writing with expressions shines. – Scott Sauyet Aug 23 '19 at 15:58
  • 1
    Thanks @Scott, and I agree. Writing with expressions isn't about making minimal programs per se, though often they are more compact than their imperative translations. What I particularly like about writing with expressions is their composability - data is a value is an expression; a function is a value is an expression; a program is a function is a value is an expression; everything is an expression is a value is an expression is a value... Btw, do you like recursion? ^_^ – Mulan Aug 23 '19 at 16:29
  • I do get your points, and I kind of agree ... I just tend to still have real humans – on various skill and faith levels working together on the same projects – in mind. But either way, thank you for your help. ; ) –  Aug 24 '19 at 15:05
  • So, now – out of the blue – I suddenly get this: `TypeError: Cannot destructure property `readdir` of 'undefined' or 'null'.` from this part: `const { readdir, readFile, stat } = require ("fs").promises` ??? –  Aug 31 '19 at 07:58
  • Removing `.promises` from `fs` seems to fix this ... this must no longer be an experimental feature I guess. But now I get problems with `isFile()` ... `DeprecationWarning: Calling an asynchronous function without callback is deprecated.` where did I miss a callback? –  Aug 31 '19 at 08:05
  • Harh, all this shit hit me, as Node obviously was set back to version 8.9.4 … sigh! Forgot to do: nvm alias default … –  Aug 31 '19 at 08:18
  • No problem, Sam. It sounds like the problem is fixed with a current version of node then? If so, we can delete these comments as not to confuse future readers. – Mulan Aug 31 '19 at 14:18
-1

OK, after soooome fiddling around I did manage to get it to work like it should. Thanks to @user633183 for the kick off …

Changed what it returned when a file and a bit of other stuff ... like now I know I can have a fairly complex method in a ternary operator ; ). Just not sure I would write code this way, as I find it way to hard to understand and therefore maintain ... not even thinking about how other devs would feel about it. Well, nevermind. Always good to learn something new. And if other find use for it, here's the final version; which return an object of precompiled Handlebars templates easily accesible through the folder structure of your views, like:

let template = [ global.view ].path.to.view.based.on.dir.structure.using.dot.syntax

In this case I've attached the output to a global view, and from there I can access all templates.

const dir2obj = async ( path = "." ) => ( await stat ( path ) )
.isFile()
? readFile( path )
.then(function ( template ) {
    let tpl = 'Handlebars.template(' + Handlebars.precompile( htmlclean( template.toString() ) ) + ')';
    return eval( tpl );
})
.catch(function ( err ) {
    console.log("Error", err);
})
: Promise.all( ( await readdir( path ) )
    .map( p => 
        dir2obj ( join ( path, p ) )
        .then( ( obj ) => {
            return { [ p.split('.')[0] ] : obj }
        })
    )
)
.then ( function ( results ) {
    return Object.assign(...results);
})

// Use
dir2obj ( dir )
.then( console.log )