You could solve this with one regular expression that searches for all paragraphs containing "a label: digits unit"
The regex could be something like this: /<p>([^:]+)\s*:\s*([\d.]+\s+\w+)<\/p>/g
Test it here: https://regex101.com/r/Q3ng6U/1/
Explanation:
<p>
searches for opening paragraphs. If you think that they could have some attributes, such as style, id, class then you could replace it by <p[^>]*>
where [^>]
means any char which is not ">" and the *
means repeated zero or more times.
([^:]+)
is used to capture the label. It looks for any char which isn't the semicolon repeated one or several times.
\s*
means spaces, tabs, etc, zero or several times.
:\s*
means the semicolon char followed by some optional spaces.
[\d.]+
means digits and dots, at least once. This is because you may have something like "1.3 m".
([\d.]+\s+\w+)
will capture the quantity and unit, but only if it is separated by one ore several spaces. If you think you could have "20kg" instead of "20 kg" then replace \s+
by \s*
. But you may need to split it again to re-inject a space so that all your properties have the same look.
<\/p>
is the clothing paragraph tag. The slash is escaped because it is used to delimit the beginning and ending of the regular expression.
the g
flag at the end makes the regular expression search for all matches instead of just stopping on the first match.
Now, for the JavaScript code, you could do something like this:
const regex = /<p>([^:]+)\s*:\s*([\d.]+\s+\w+)<\/p>/g;
const data = `<p>Packet width: 20 cm</p><p>Weight: 1.2 kg</p><p>Allows you to collect your hair easily.</p><p><br />Holds your hair, does not come out.</p><p>No more fussing with rubber buckles.</p>`;
let match;
let properties = [];
while ((match = regex.exec(data)) !== null) {
// As a label could be several words but a JS object propery cannot have them
// we'll just replace all consequent invalid chars by underscores.
let label = match[1].replace(/\W+/g, '_').toLowerCase();;
// Create an object so that we can add the property from the corrected label.
let entry = {};
entry[label] = match[2];
// Put this object in the array of properties found.
properties.push(entry);
}
console.log(properties);
This would fill properties with this:
[
{packet_width: "20 cm"},
{weight: "1.2 kg"}
]