Create structured JS object based on unstructured JS object

Question

I have this js object data:

const content = 
  [ { h1 : 'This is title number 1'          } 
  , { h2 : 'Description'                     }
  , { p  : 'Description unique content text' }
  , { h2 : 'Content'                         }
  , { p  : 'Content unique text here 1'      }
  , { ul : [ 
           'string value 1',
           'string value 2'
       ]                       
  , } 
  , { p  : 'Content unique text here 2'      }
  , { h2 : 'CTA message'                     }
  , { p  : 'CTA message unique content here' }
  , { h2 : 'CTA button'                      }
  , { p  : 'CTA button unique content here'  }
  , { p  : ''                                }
  , { h1 : 'This is title number 2'          } 
  , { h2 : 'Description'                     }
  , { p  : 'Description unique content text' }
  , { h2 : 'Content'                         }
  , { p  : 'Content unique text here 1'      }
  , { h2 : 'CTA message'                     }
  , { p  : 'CTA message unique content here' }
  , { h2 : 'CTA button'                      }
  , { p  : 'CTA button unique content here'  }
  , { p  : ''                                }
  ]

h1 indicates a new object, the ul is optional

I want to map it to such structure:

interface Content = {
  title: string;
  description: string;
  content: any[];
  cta: {
    message: string;
    button: string;
  }
}

I am wondering what is the best way of doing that?

I think I have to loop through the items and just populate a new JSON object based on my interface. The first element is always title, then just checking if "description" then the next item is description value.

    const json = Content[];
    content.forEach(element => {
    if(element.h1) {
        // add props to new object
        // add object to json array
    }
});

I just wonder how wold you create multiple Content objects based on that original content JSON object?

Here is the result I am expecting:

json = [
  {
    title: 'This is title number 1',
    description: 'Description unique content text',
    content: [
      {
        p: 'Content unique text here 1',
      },
      {
        ul: [
            'string value 1',
            'string value 2'
        ]
      },
      {
        p: 'Content unique text here 2'
      } 
    ],
   cta: {
      message: 'CTA message unique content here'
      button: 'CTA button unique content here'
   }
  },
  ...
]

UPDATE: Based on comments below I am looking for top down parser solution. The solution should be easily extensible in case if input array will be changed a bit by introducing new unique h2+p or h2+p+ul, etc elements.

You're effectively building a parser. Should be relatively simple with a top-down greedy approach. — Bergi, Jun 17 '22 at 01:53
@Bergi currently looking into this library https://github.com/brunoimbrizi/array-unflat/blob/main/index.js, thinking of splitting the original array into groups and then use map function. Do you think would it be an optimal path? — Sergino, Jun 17 '22 at 01:56
No. That `unflat` function you linked (better known as [`chunk`](https://stackoverflow.com/q/8495687/1048572)) works only if each group has the same size, which is not the case with your data. You need to group by other factors. Have you written a parser before? — Bergi, Jun 17 '22 at 02:01
@Bergi right i c. Haven't being dealing with flat data like that in js/ts. Can you point out to the right direction? May be there is a lib that can help or a code snippet to look at? — Sergino, Jun 17 '22 at 02:03
Maybe https://11l-lang.org/archive/simple-top-down-parsing/ and https://en.wikipedia.org/wiki/Parsing#Computer_languages can help — Bergi, Jun 17 '22 at 02:12
@Bergi Oh now I clearly see that I haven't being written any parsers like that before. Found another one here https://github.com/codebox/top-down-parser but not so helpful yet due to my knowledge gap in that area — Sergino, Jun 17 '22 at 02:33
Try to find one that doesn't take a grammar as input but one that lets you build rules in code. And you'll need one that takes in an array of elements, not a string. — Bergi, Jun 17 '22 at 02:36
Or look at the output (the generated code, not the source code) of a parser generator to get an idea — Bergi, Jun 17 '22 at 02:37
Here's some articles that might be more helpful: https://blog.mgechev.com/2017/09/16/developing-simple-interpreter-transpiler-compiler-tutorial/ https://depth-first.com/articles/2019/01/22/scanner-driven-parser-development/ - very concrete parsers, no abstract grammars — Bergi, Jun 17 '22 at 02:42
I think there is a problem with the approach here. You might think that `{p: "abc"}` can represent the html tag `
abc
`, but it doesn't. `{p: "abc"}` means ` — Rashad Saleh, Jun 21 '22 at 13:39
@sreginogemoh ... does the OP expect each of the provided solutions to come with a detailed explanation? Are 2 to 3 introducing sentences and readable code good enough? — Peter Seliger, Jun 23 '22 at 10:53
@PeterSeliger yeah, that is file, forgot to mention that algorithm should be extensible in case if input data will be changed slightly — Sergino, Jun 23 '22 at 15:30
@NinaScholz all the `h2` becomes props names at some sort. `CTA message` + `CTA button` => `cta: { message: '..', button: '...'}`. `Description` => `description: '..'` — Sergino, Jun 26 '22 at 07:01
this is an awful data structure with random keys and control data inside strings of data. — Nina Scholz, Jun 26 '22 at 12:58

Debug Diva · Answer 1 · 2022-06-22T12:57:26.800

As per my understanding we will have same format for each set of objects in the content array properties from top to bottom. If Yes, I just spent some time to work on this requirement and come up with this solution. I know we can optimize it little more to make it more dynamic. Can you please have a look and confirm if it works as per your expectation then I will work on the refactoring and code optimization part.

Note : All the steps has been mentioned as a descriptive comments in the below code snippet itself.

Demo :

    // Input Array
    const content = [
      { h1 : 'This is title number 1' },
      { h2 : 'Description' },
      { p  : 'Description unique content text' },
      { h2 : 'Content' },
      { p  : 'Content unique text here 1' },
      { ul : [ 
        'string value 1',
        'string value 2'
      ]},
      { p  : 'Content unique text here 2' },
      { h2 : 'CTA message' },
      { p  : 'CTA message unique content here' },
      { h2 : 'CTA button' },
      { p  : 'CTA button unique content here' },
      { p  : '' },
      { h1 : 'This is title number 2' },
      { h2 : 'Description' },
      { p  : 'Description unique content text' },
      { h2 : 'Content' },
      { p  : 'Content unique text here 1' },
      { h2 : 'CTA message' },
      { p  : 'CTA message unique content here' },
      { h2 : 'CTA button' },
      { p  : 'CTA button unique content here' },
      { p  : '' }
    ];

    // Variables
    const chunkEndIndexArr = []; // get the chunk ending index for each set of objects.
    const chunkArr = []; // This will contain the nested arrays containging each chunks. For ex: [[], []]
    let startIndex = 0; // This is the start index to loop through the content array.
    let splittedStr = []; // This variable is used to split the CTA message and CTA button strings which helps while building the final result.

    // Getting index to categorize the chunks seprately for each object based on the object { "p": ""} which is like a breakup for each objects.
    content.forEach(obj => {
      if (Object.hasOwn(obj, 'p') && !obj.p) {
        chunkEndIndexArr.push(content.indexOf(obj) - 1)
      }
    });

    // This set of code is used to create an array of each set of objects seperately which will help in building the algorithm.
    chunkEndIndexArr.forEach((elem, index) => {
      const innerArr = [];
      for (var i = startIndex; i <= chunkEndIndexArr[index]; i++) {
        innerArr.push(content[i])
      }
      // pushing each set of objects in a seperate array.
      chunkArr.push(innerArr);
      // resetting the startindex for the next set of records. 
      startIndex = chunkEndIndexArr[index] + 2
    });

    // This set of code is used to build the desired output from the chunked array of each set of objects.
    const res = chunkArr.map(chunk => {
      const innerObj = {};
      chunk.forEach(obj => {
        // Property assignment for Title
        if (Object.hasOwn(obj, 'h1')) {
          innerObj.title = obj.h1
        }
        // Property assignment for Description
        if (Object.hasOwn(obj, 'h2') && obj.h2 === 'Description') {
          innerObj[obj.h2.toLowerCase()] = Object.values(chunk[chunk.indexOf(obj) + 1])[0]
        }
        // Property assignment for Content
        if (Object.hasOwn(obj, 'h2') && obj.h2 === 'Content') {
          innerObj[obj.h2.toLowerCase()] = [];
          if (Object.hasOwn(chunk[chunk.indexOf(obj) + 1], 'p')) {
            innerObj[obj.h2.toLowerCase()].push(chunk[chunk.indexOf(obj) + 1])
          }
          if (Object.hasOwn(chunk[chunk.indexOf(obj) + 2], 'ul')) {
            innerObj[obj.h2.toLowerCase()].push(chunk[chunk.indexOf(obj) + 2])
          }
          if (Object.hasOwn(chunk[chunk.indexOf(obj) + 3], 'p') && chunk[chunk.indexOf(obj) + 3].p.includes('Content')) {
            innerObj[obj.h2.toLowerCase()].push(chunk[chunk.indexOf(obj) + 3])
          }
        }
        // Property assignment for CTA message.
        if (Object.hasOwn(obj, 'h2') && obj.h2.includes('CTA message')) {
          splittedStr = obj.h2.toLowerCase().split(' ');
          innerObj[splittedStr[0]] = {};
          innerObj[splittedStr[0]].message = Object.values(chunk[chunk.indexOf(obj) + 1])[0];
        }
        
        // Property assignment for CTA button.
        if (Object.hasOwn(obj, 'h2') && obj.h2.includes('CTA button')) {
          innerObj[splittedStr[0]].button = Object.values(chunk[chunk.indexOf(obj) + 1])[0];
        }
      });
      return innerObj;
    });

document.getElementById("result").innerText = JSON.stringify(res, null, 2); // Final result

<pre id="result"></pre>

give me about 24 hours to play with the code and come back on that with a proper comment — Sergino, Jun 23 '22 at 15:31

score 2 · Answer 2 · edited Jun 22 '22 at 20:29

Here is a working solution:

const content = 
  [ { h1 : 'This is title number 1'          } 
  , { h2 : 'Description'                     }
  , { p  : 'Description unique content text' }
  , { h2 : 'Content'                         }
  , { p  : 'Content unique text here 1'      }
  , { ul : [ 
           'string value 1',
           'string value 2'
       ]                       
  , } 
  , { p  : 'Content unique text here 2'      }
  , { h2 : 'CTA message'                     }
  , { p  : 'CTA message unique content here' }
  , { h2 : 'CTA button'                      }
  , { p  : 'CTA button unique content here'  }
  , { p  : ''                                }
  , { h1 : 'This is title number 2'          } 
  , { h2 : 'Description'                     }
  , { p  : 'Description unique content text' }
  , { h2 : 'Content'                         }
  , { p  : 'Content unique text here 1'      }
  , { h2 : 'CTA message'                     }
  , { p  : 'CTA message unique content here' }
  , { h2 : 'CTA button'                      }
  , { p  : 'CTA button unique content here'  }
  , { p  : ''                                }
  ]


function getChunks (content){
   const chunks = {}
  let chunkIndex = -1
  content.forEach(entry=> {
    const entries = Object.entries(entry)[0]
    const key = entries[0]
     const value = entries[1]
    if(key=== "h1"){
      chunkIndex++
    }
    if(chunks[chunkIndex]){
       return chunks[chunkIndex].push({key, value})
    }
    return chunks[chunkIndex] = [{key, value}]
  })
  return Object.values(chunks)
}

function formatChunk(chunk){
   const interface = {
  title: "",
  description: "",
  content: [],
  cta: { message:"", button: "" }
}
    chunk.forEach(({key, value})=> {
     if(value.includes("CTA button")){
       interface.cta.button = value
       return
     }
     if(value.includes("CTA message")){
       interface.cta.message = value
       return
     }
     if(key==="h1"){
       interface.title = value
     }
     if(key==="h2"){
       interface.description=value
     }
     if(key==="ul" | key==="p"){
       interface.content.push({[key]:value})
     }
   })
  return interface
}

function parseObj(content){
  const chunks = getChunks(content)
  return chunks.map(chunk=> formatChunk(chunk))
}

console.log(parseObj(content))

Peter Seliger · Accepted Answer · 2022-06-24T14:38:12.533

The next provided approach features a generically implemented reducer function which is custom configurable for ...

an item's key upon which the creation of a new structured content type is decided.
an object based lookup/map/index which features key based implementations for either creating or aggregating a (key specific) content type.

From one of the OP's above comments ...

"... forgot to mention that algorithm should be extensible in case if input data will be changed slightly – sreginogemoh"

Though I wouldn't go that far naming the approach "top-down parser" as others already did, the parser analogy helps.

The advantage of the approach comes with the (generically implemented) reducer which roughly fulfills the task of a main tokenizer by processing an array/list from top to bottom (or left to right).

Upon a match of a custom provided property name (or key-word) and a currently processed item's (token's) sole entry's key the reducer does create a new structured (data) item. Non matching item-keys do signal an aggregation task.

Both task types (creation and aggregation) have in common that they too, always have to be custom implemented/provided as methods of an object based lookup/map/index.

The aggregation tasks can be manifold, depending on whether a to be merged sub content type gets hinted explicitly (by e.g. other specific entry-keys) or not. What they have in common is the passing of always the same arguments signature of (predecessor, merger, key, value).

This four parameters present the sufficient information (neither less nor more data) it needs, in order to reliably aggregate any sub content type (based on key, value and if necessary on predecessor) at the base/main content type which was passed as merger.

FYI ... In terms of the top-down parser analogy one should notice that with the predecessor item/token one actually uses a top-down/lookbehind approach (but top-down from the little theory I know is supposed to come with lookahead).

The reducer approach allows both the adaption to other source items and the creation of other target structures by changing the to be passed initial value's properties ... newItemKey and aggregators ... accordingly.

The two folded solution of reducer and custom tasks got implemented in a way that the reducer does not mutate source items by making (actively) use of structuredClone for more complex sub contents whereas a task's arguments signature (passively) prevents the mutation of source items.

// gnenerically implemented and custom configurable reducer.
function createAndAggregateStructuredContent(
  { aggregators = {}, miscsAggregationKey = 'miscs', newItemKey, result = [] },
  item, itemIdx, itemList,
) {
  const [itemKey, itemValue] = Object.entries(item)[0];
  const createOrAggregateContentType =
    aggregators[itemKey] ?? aggregators[miscsAggregationKey];

  if ('function' === typeof createOrAggregateContentType) {
    if (itemKey === newItemKey) {
      // create and collect a new content type.
      result
        .push(
          createOrAggregateContentType(itemValue)
        );
    } else {
      // aggregate an existing content type.
      createOrAggregateContentType(
        itemList[itemIdx - 1], // - predecessor item from provided list.
        result.slice(-1)[0],   // - currently aggregated content type.
        itemKey,
        itemValue,
      );
    }
  }
  return { aggregators, miscsAggregationKey, newItemKey, result };
}

// poor man's fallback for environments
// which do not support `structuredClone`.
const cloneDataStructure = (
  ('function' === typeof structuredClone) && structuredClone ||
  (value => JSON.parse(JSON.stringify(value)))
);

// interface Content = {
//   title: string;
//   description: string;
//   content: any[];
//   cta: {
//     message: string;
//     button: string;
//   }
// }

// object based lookup/map/index for both
// content-type creation and aggregation
// according to the OP's `Content` interface.
const aggregators = {
  // creation.
  h1: value => ({ title: String(value) }),

  // aggregation.
  h2: (predecessor, merger, key, value) => {
    key = value.trim().toLowerCase();
    if ((key === 'description') || (key === 'content')) {
      merger[key] = null;
    } else if ((/^cta\s+message|button$/).test(key)) {
      merger.cta ??= {};
    }
  },
  // aggregation.
  miscs: (predecessor, merger, key, value) => {
    const contentType = String(predecessor.h2)
      .trim().toLowerCase();
    const ctaType = (/^cta\s+(message|button)$/)
      .exec(contentType)?.[1] ?? null;

    if ((contentType === 'description') && (merger.description === null)) {

      merger.description = String(value);

    } else if ((ctaType !== null) && ('cta' in merger)) {

      Object.assign(merger.cta, { [ ctaType ]: String(value) });

    } else if (value) {
      // fallback ...
      // ... default handling of various/varying non empty content.
      (merger.content ??= []).push({ [ key ]: cloneDataStructure(value) });
    }
  },
};

const content = 
  [ { h1 : 'This is title number 1'          } 
  , { h2 : 'Description'                     }
  , { p  : 'Description unique content text' }
  , { h2 : 'Content'                         }
  , { p  : 'Content unique text here 1'      }
  , { ul : [ 
           'string value 1',
           'string value 2'
       ]                       
  , } 
  , { p  : 'Content unique text here 2'      }
  , { h2 : 'CTA message'                     }
  , { p  : 'CTA message unique content here' }
  , { h2 : 'CTA button'                      }
  , { p  : 'CTA button unique content here'  }
  , { p  : ''                                }
  , { h1 : 'This is title number 2'          } 
  , { h2 : 'Description'                     }
  , { p  : 'Description unique content text' }
  , { h2 : 'Content'                         }
  , { p  : 'Content unique text here 1'      }
  , { h2 : 'CTA message'                     }
  , { p  : 'CTA message unique content here' }
  , { h2 : 'CTA button'                      }
  , { p  : 'CTA button unique content here'  }
  , { p  : ''                                }
  ];
const structuredContent = content
  .reduce(
    createAndAggregateStructuredContent, {
      aggregators,
      newItemKey: 'h1',
      result: [],
    },
  ).result;

console.log({ structuredContent, content });

.as-console-wrapper { min-height: 100%!important; top: 0; }

FYI ... questions and approaches similar to the very topic here ...

Create structured JS object based on unstructured JS object

3 Answers3