0

Expected Income/Output

  • Input: Longines, retailed by Barth, Zurich, ref. 22127, movement no. 5770083,
  • Desired Output: 5770083 Only digits from this I will build: {"Movement Number": 5770083}

I believe I will need to run multiple regexes against each string as I need to know the following:

  • Which title belongs to which string ie movement no.= 5770083 etc
  • Multiple different languages will be used for the same title, for example:
    • Movement number variations:
    • Movement no.
    • mouvement signés.Numérotée
    • no
    • MVT
    • jewels #
    • Werk-Nr.

Current regex: /movement no. ([^\s]+)/ With the above regex it will also pick up the ,.

It is also case insensitive.

Test String

Longines. A very fine and rare stainless steel water-resistant chronograph wristwatch with black dial and original box\nSigned Longines, retailed by Barth, Zurich, ref. 22127, movement no. 5770083, case no. 46, circa 1941\nCal. 13 ZN nickel-finished lever movement, 17 jewels, the black dial with Arabic numerals, outer railway five minute divisions and tachymetre scale, two subsidiary dials indicating constant seconds and 30 minutes register, in large circular water-resistant-type case with flat bezel, downturned lugs, screw back, two round chronograph buttons in the band, case and movement signed by maker, dial signed by maker and retailer\n37 mm. diam.

Test String French

MONTRE BRACELET D'HOMME CHRONOGRAPHE EN OR, PAR LONGINES\n\nDe forme ronde, le cadran noir à chiffres arabes, cadran auxiliaire pour les secondes à neuf heures et totalisateur de minutes à trois heures, mouvement mécanique 13 Z N, vers 1960, poids brut: 44.49 gr., monture en or jaune 18K (750)\n\nCadran Longines, mouvement no. 3872616, fond de boîte no. 5872616\nVeuillez noter que les bracelets de montre pouvant être en cuirs exotiques provenant d'espèces protégées, tels le crocodile, ils ne sont pas vendus avec les montre même s'ils sont exposés avec celles-ci. Christie's devra retirer et conserver ces bracelets avant leur collecte par les acheteur

Jamie Hutber
  • 26,790
  • 46
  • 179
  • 291

4 Answers4

2

You can use

\b((?:Movement|mouvement) no\.|mouvement signés\.Numérotée|no|MVT|jewels #|Werk-Nr\.) (\d+)

https://regex101.com/r/thL0wt/1

Start at a word boundary, then inside a capturing group, alternate between all the different possible phrases you want before a number - then, match a space, and capture numeric characters in another group. Your desired result will be in the first and second capturing groups.

const input = `Longines. A very fine and rare stainless steel water-resistant chronograph wristwatch with black dial and original box\nSigned Longines, retailed by Barth, Zurich, ref. 22127, movement no. 5770083, case no. 46, circa 1941\nCal. 13 ZN nickel-finished lever movement, 17 jewels, the black dial with Arabic numerals, outer railway five minute divisions and tachymetre scale, two subsidiary dials indicating constant seconds and 30 minutes register, in large circular water-resistant-type case with flat bezel, downturned lugs, screw back, two round chronograph buttons in the band, case and movement signed by maker, dial signed by maker and retailer\n37 mm. diam.

MONTRE BRACELET D'HOMME CHRONOGRAPHE EN OR, PAR LONGINES\n\nDe forme ronde, le cadran noir à chiffres arabes, cadran auxiliaire pour les secondes à neuf heures et totalisateur de minutes à trois heures, mouvement mécanique 13 Z N, vers 1960, poids brut: 44.49 gr., monture en or jaune 18K (750)\n\nCadran Longines, mouvement no. 3872616, fond de boîte no. 5872616\nVeuillez noter que les bracelets de montre pouvant être en cuirs exotiques provenant d'espèces protégées, tels le crocodile, ils ne sont pas vendus avec les montre même s'ils sont exposés avec celles-ci. Christie's devra retirer et conserver ces bracelets avant leur collecte par les acheteur`;
const matches = {};
let match;
const pattern = /\b((?:Movement|mouvement) no\.|mouvement signés\.Numérotée|no|MVT|jewels #|Werk-Nr\.) (\d+)/gmi;
while (match = pattern.exec(input)) {
  matches[match[1]] = match[2];
  // or, if you only want a single object:
  const obj = {
    [match[1]]: match[2]
  };
}
console.log(matches);
CertainPerformance
  • 356,069
  • 52
  • 309
  • 320
  • Fantastic answer, thank you. A bonus points question, I'm assuming these multiple matching would work exactly the same way as I need digits after a match. As I also need to match the Case Number too. I would just add these to the regex and the while will of course pick this up. – Jamie Hutber Mar 13 '19 at 22:24
  • I clearly don't do enough regex lol but `exec` is very nice indeed sir. Thank you – Jamie Hutber Mar 13 '19 at 22:27
  • Yep, just add `case no\.` to the alternation list – CertainPerformance Mar 13 '19 at 22:27
  • I'll need to procedurally generate the pattern, but that won't be hard, though it confuses me as its not a string. – Jamie Hutber Mar 13 '19 at 22:29
  • If you can't hard-code all the alternations, then use the RegExp constructor instead https://stackoverflow.com/questions/494035/how-do-you-use-a-variable-in-a-regular-expression – CertainPerformance Mar 13 '19 at 22:30
  • Yep, nice. Thank you again. This is going to be one very large string lol :) – Jamie Hutber Mar 13 '19 at 22:32
  • Last thing from me, I imagine I can't split this out into easier to read text and put things on new lines without it effecting the regex? I guess I will need to use the RegExp construtor if I wish to do it this way – Jamie Hutber Mar 13 '19 at 22:33
  • If you know the alternations in advance, you can put each on a new line by simulating the `x` modifier other engines have, [like this](https://stackoverflow.com/questions/15463257/commenting-regular-expressions#answer-53925033). If you don't know them in advance, and have to generate the phrases dynamically, then using the constructor is the only option. (make sure to [escape](https://stackoverflow.com/questions/3561493/is-there-a-regexp-escape-function-in-javascript) special characters first). – CertainPerformance Mar 13 '19 at 22:55
1

For movement no. specifically you'll want this regex to get rid of the comma:

movement no. ([^\s\W]+)

In regards to the languages, a set of if statements performing the appropriate term that you want to test against is the only way I can think of unless the RegExp object allows for string substitution. Sorry for not being more help in that area.

  • And then I just loop through all the different possible options ` Movement no. mouvement signés.Numérotée no MVT jewels # Werk-Nr.` basically? – Jamie Hutber Mar 13 '19 at 22:16
1

You are using negated character class [^\s]+, which matches everything except whitespace. So, if there's another character you don't want to match, i.e. comma ,, then add it to this class: [^\s,].

And you can follow same logic for any character you don't want to match.

Michał Turczyn
  • 32,028
  • 14
  • 47
  • 69
0
var input = "Longines, retailed by Barth, Zurich, ref. 22127, movement no. 5770083";
var output = input.match(/(?<=movement no. )\d+/)
oleedd
  • 388
  • 2
  • 15