To avoid matching titles with v{num} and c{num} in them, I think you want something like this:
(\bc\d+)|\bv\d+(c\d+)
will catch chapters and (\bv\d+)|\bc\d+(v\d+)
will capture volumes
EDIT: To capture partial chapters like c2.5, simply replace \d+
with a slighly modified regex that captures floating points (?:[0-9]*[.])?[0-9]+
It looks for a word boundary followed by the letter (c or v) and then digits, OR in the case of v1c3, it looks for the correct prefix followed by the match.
Here are some examples:
const inputs = [
'hello v2c19 lorem',
'hello v2.5 c19 lorem',
'hello c19 lorem',
'v8 hello c19 lorem',
'hello lorem c01',
'novolume nav123',
'hello noch123pter',
];
const find = (str, regex) => {
let res = null;
const match = regex.exec(str);
if (match) {
res = match[1] || match[2];
}
return res;
};
const FLOAT = `(?:[0-9]*[.])?[0-9]+`;
const vRE = new RegExp(`(\\bv${FLOAT})|\\bc${FLOAT}(v${FLOAT})`);
const cRE = new RegExp(`(\\bc${FLOAT})|\\bv${FLOAT}(c${FLOAT})`);
const output = inputs.map((title) => {
const chapter = find(title, cRE);
const volume = find(title, vRE);
return {
title,
chapter,
volume
};
});
console.log(output);
It's possible to combine these into all of the combinations of only chapter, only volume, chapter space volume, volume chapter etc... but that gets confusing fast and these are simple enough regex's to do the job.