You need to cancel any match if preceded with a digit or digit + period.
Add (?<!\d)(?<!\d\.)
after or before the first lookbehind:
(?<![\u4E00-\u9FAF\u3040-\u3096\u30A1-\u30FA\uFF66-\uFF9D\u31F0-\u31FF])(?<!\d)(?<!\d\.)(\d+(?:\.\d+)?m2)
See the regex demo
The (?<!\d)
is a negative lookbehind that fails the match if there is a digit immediately to the left of the current location and (?<!\d\.)
fails when there is a digit and a dot right before.
The \d+(?:\.\d+)?
is a more precise pattern to match numbers like 30
or 30.5678
: 1 or more digits followed with an optional sequence of .
and 1+ digits.
NOTE that this regex will only work with the ES2018+ JS environments (Chrome, Node). You may capture an optional Japanese char into Group 1 and the number into Group 2, then check if Group 1 matched and if yes, fail the match, else, grab Group 2.
The regex is
/([\u4E00-\u9FAF\u3040-\u3096\u30A1-\u30FA\uFF66-\uFF9D\u31F0-\u31FF])?(\d+(?:\.\d+)?m2)/g
See usage example below.
JS ES2018+ demo:
const lst = ["110.94m2・129.24m2", "81.95m2(24.78坪)、うち2階車庫8.9m2", "80.93m2(登記)", "93.42m2・93.85m2(登記)", "81.82m2(実測)" , "81.82m2(実測)、うち1階車庫7.82m2", "90.11m2(実測)、うち1階車庫8.07m2"];
const regex = /(?<![\u4E00-\u9FAF\u3040-\u3096\u30A1-\u30FA\uFF66-\uFF9D\u31F0-\u31FF])(?<!\d)(?<!\d\.)(\d+(?:\.\d+)?m2)/g;
lst.forEach( s =>
console.log( s, '=>', s.match(regex) )
);
console.log("Another approach:");
lst.forEach( s =>
console.log(s, '=>', s.match(/(?<![\p{L}\d]|\d\.)\d+(?:\.\d+)?m2/gu))
)
JS legacy ES versions:
var lst = ["110.94m2・129.24m2", "81.95m2(24.78坪)、うち2階車庫8.9m2", "80.93m2(登記)", "93.42m2・93.85m2(登記)", "81.82m2(実測)" , "81.82m2(実測)、うち1階車庫7.82m2", "90.11m2(実測)、うち1階車庫8.07m2"];
var regex = /([\u4E00-\u9FAF\u3040-\u3096\u30A1-\u30FA\uFF66-\uFF9D\u31F0-\u31FF])?(\d+(?:\.\d+)?m2)/g;
for (var i=0; i<lst.length; i++) {
var m, res =[];
while (m = regex.exec(lst[i])) {
if (m[1] === undefined) {
res.push(m[2]);
}
}
console.log( lst[i], '=>', res );
}
Variations
If you plan to match a float/int number with m2
after it that is only preceded with whitespace or at the start of the string use
(?<!\S)\d+(?:\.\d+)?m2
If you plan to match it when not preceded with any letter use
- pcre java -
(?<![\p{L}\d]|\d\.)\d+(?:\.\d+)?m2
(also works in JS ES2018+ environments: /(?<![\p{L}\d]|\d\.)\d+(?:\.\d+)?m2/gu
)
- python -
(?<!\d\.)(?<![^\W_])\d+(?:\.\d+)?m2
Note you may add \b
word boundary after 2
to make sure there is a non-word char after it or end of string.