-2
let data = "<p>Size: 5 cm</p><p>Weight: 30 g</p><p>Allows you to collect your hair easily.</p><p><br />Holds your hair, does not come out.</p><p>No more fussing with rubber buckles.</p>";

My goal is to create an array of attributes. attributes means size, weight etc

result = [{Size: "5 cm"}, {Weight: "30 g"}]

Please let me know the script using javascript

Unknown
  • 107
  • 5
  • https://stackoverflow.com/a/1732454/1207539 – Matt Jun 29 '21 at 06:49
  • Without using the DOM parser, you can either use Regex (which if you read my link above you'll see that you might as well be cursing your soul), or maybe you could get really fancy with .split but I think it would be unreadable. – Matt Jun 29 '21 at 06:54
  • @Parthavi before asking for a working solution, please share what you have already tried in your question. – rags2riches-prog Jun 29 '21 at 07:01

3 Answers3

2

Not sure if this is the most efficient way to do this but here is what I managed to do.

let data = "<p>Size: 5 cm</p><p>Weight: 30 g</p><p>Allows you to collect your hair easily.</p><p><br />Holds your hair, does not come out.</p><p>No more fussing with rubber buckles.</p>";

// regex to match content between the tags
const regex = /(?<=\>)(.*?)(?=\<)/g;

// found matches stored in array
let found = data.match(regex);

// final result will be stored here
let newData = {};

// removes empty strings
found = found.filter(item => item);

// check if index contains ":" then splits it and stores in a dictionary
for(let i=0; i<found.length; i++){
    if(found[i].includes(":")){
    let temp = found[i].split(':');
    newData[temp[0].trimStart()] = temp[1].trimStart();
  }
}

console.log(newData);
ahsan
  • 1,409
  • 8
  • 11
0

Here's a quick and simple function that uses an array of metrics to lookup

let data = "<p>Size: 5 cm</p><p>Weight: 30 g</p><p>Allows you to collect your hair easily.</p><p><br />Holds your hair, does not come out.</p><p>No more fussing with rubber buckles.</p>";

let metrics = ['Size', 'Weight']
const result = Object.assign(...metrics.map(m =>  data.split(m)[1].split("</p>")[0]).map((m, i) => ({[metrics[i]]: m.replace(':', '').trim()})))
console.log(result);
Kinglish
  • 23,358
  • 3
  • 22
  • 43
0

You could solve this with one regular expression that searches for all paragraphs containing "a label: digits unit"

The regex could be something like this: /<p>([^:]+)\s*:\s*([\d.]+\s+\w+)<\/p>/g
Test it here: https://regex101.com/r/Q3ng6U/1/

Explanation:

  • <p> searches for opening paragraphs. If you think that they could have some attributes, such as style, id, class then you could replace it by <p[^>]*> where [^>] means any char which is not ">" and the * means repeated zero or more times.

  • ([^:]+) is used to capture the label. It looks for any char which isn't the semicolon repeated one or several times.

  • \s* means spaces, tabs, etc, zero or several times.

  • :\s* means the semicolon char followed by some optional spaces.

  • [\d.]+ means digits and dots, at least once. This is because you may have something like "1.3 m".

  • ([\d.]+\s+\w+) will capture the quantity and unit, but only if it is separated by one ore several spaces. If you think you could have "20kg" instead of "20 kg" then replace \s+ by \s*. But you may need to split it again to re-inject a space so that all your properties have the same look.

  • <\/p> is the clothing paragraph tag. The slash is escaped because it is used to delimit the beginning and ending of the regular expression.

  • the g flag at the end makes the regular expression search for all matches instead of just stopping on the first match.

Now, for the JavaScript code, you could do something like this:

const regex = /<p>([^:]+)\s*:\s*([\d.]+\s+\w+)<\/p>/g;
const data = `<p>Packet width: 20 cm</p><p>Weight: 1.2 kg</p><p>Allows you to collect your hair easily.</p><p><br />Holds your hair, does not come out.</p><p>No more fussing with rubber buckles.</p>`;
let match;
let properties = [];

while ((match = regex.exec(data)) !== null) {
  // As a label could be several words but a JS object propery cannot have them
  // we'll just replace all consequent invalid chars by underscores.
  let label = match[1].replace(/\W+/g, '_').toLowerCase();;
  // Create an object so that we can add the property from the corrected label.
  let entry = {};
  entry[label] = match[2];
  // Put this object in the array of properties found.
  properties.push(entry);
}

console.log(properties);

This would fill properties with this:

[
  {packet_width: "20 cm"},
  {weight: "1.2 kg"}
]
Patrick Janser
  • 3,318
  • 1
  • 16
  • 18