Given input:
- string
"home/products/product_name_1/details/some_options"
Expected output:
- array
["home", "products", "product", "details", "some"]
- Note: ignore/exclude
name
, 1
, options
(because word occurs after 1st underscore).
Task:
- split URI by slash into a set of path-segments (words)
- (if the path-segment or word contains underscores) remove the part after first underscore
Regex to match
With a regex \/|_\w+
you could match the URL-path separator (slash) and excluded word-part (every word after an underscore).
Then use this regex
- either as separator to split the string into its parts(excluding the regex matches): e.g. in JS
split(/\/|_\w+/)
- or as search-pattern in replace to prepare a string that can be easily split: e.g. in JS
replaceAll(/\/|_\w+/g, ',')
to obtain a CSV row which can be easily split by comma `split(',')
Beware: The regular-expression itself (flavor) and functions to apply it depend on your environment/regex-engine and script-/programming-language.
Regex applied in Javascript
split by regex
For example in Javascript use url.split(/\/|_\w*/)
where:
/pattern/
: everything inside the slashes is the regex-pattern
\/
: a c slash (URL-path-separator)
|
: the alternate junction, interpreted as boolean OR
_\w*
: zero or more (*
) word-characters (w
, i.e. letter from alphabet, numeric digit or underscore) following an underscore
See also:
However, this returns also empty strings (as empty split-off second parts inside underscore-containing path-segments). We can remove the empty strings with a filter
where predicate s => s
returns true if the string is non-empty.
Demo to solve your task:
const url = "home/products/product_name_1/details/some_options";
let firstWordsInSegments = url.split(/\/|_\w*/).filter(s => s);
console.log(firstWordsInSegments);
const urlDuplicate = "home/products/product_name_1/details/some_options/_/home";
console.log(urlDuplicate.split(/\/|_\w*/).filter(s => s)); // contains duplicates in output array
replace into CSV, then split and exclude (map,replace,filter)
The CSV containing path-segments can be split by comma and resulting parts (path-segments) can be filtered or replaced to exclude unwanted sub-parts.
using:
replaceAll
to transform to CSV or remove empty strings. Note: global flag required when calling replaceAll with regex
map
to remove unwanted parts after underscore
filter(s => s)
to filter out empty strings
const url = "home/products/product_name_1/details/some_options";
// step by step
let pathSegments = url.split('/');
console.log('pathSegments:', pathSegments);
let firstWordsInSegments = pathSegments.map(s => s.replaceAll(/_\w*/g,''));
console.log(firstWordsInSegments);
// replace to obtain CSV and then split
let csv = "home/products/product_name_1/details/some_options/_/home".replaceAll(/\/|_\w+/g, ',');
console.log('csv:', csv);
let parts = csv.split(',');
console.log('parts:', parts); // contains empty parts
let nonEmptyParts = parts.filter(s => s);
console.log('nonEmptyParts:', nonEmptyParts); // filtered out empty parts
Bonus Tip
Try your regex online (e.g. regex101 or regexplanet). See the demo on regex101.