1

Look at this example:

string str = "January 19934";

The Outcome should be

Jan 1993

I think I have created the right RegEx ([A-z]{3}).*([\d]{4}) to use in this case but I do not know what I should do now?

How can I extract what I am looking for, using RegEx? Is there a way like receiving 2 variables, the first one being the result of the first RegEx bracket: ([A-z]{3}) and the second result being 2nd bracket:[[\d]{4}]?

Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055
Zesa Rex
  • 412
  • 1
  • 4
  • 16
  • Show your actual [MCVE] please. Include regex usage. And lookup "capturing groups". – Lightness Races in Orbit Feb 15 '17 at 16:15
  • I cannot edit just 1 character, but i guess the 19934 is a typo. – KOB Feb 15 '17 at 16:20
  • On a side note, where is the input coming from? If it's user-entered, it could be either 1993 or 1994 (or maybe 1934), humans are unpredictable. – zenzelezz Feb 15 '17 at 16:21
  • It only takes about five lines of code to write a parser that handles this directly. There's no need for regular expressions here. – Pete Becker Feb 15 '17 at 20:31
  • @PeteBecker I guess you are right, I wanted to know a solution using RegEx though – Zesa Rex Feb 17 '17 at 08:10
  • @KOB actually "19934" was correct because "1994" would've been to easy to use RegEx on and users sometimes fail Input and it is not bad to include possibility in your code – Zesa Rex Feb 17 '17 at 08:11

2 Answers2

3

Your regex contains a common typo: [A-z] matches more than just ASCII letters. Also, the .* will grab all the string up to its end, and backtracking will force \d{4} match the last 4 digits. You need to use lazy quantifier with the dot, *?.

Then, use regex_search and concat the 2 group values:

#include <regex>
#include <string>
#include <iostream>
using namespace std;

int main() {
    regex r("([A-Za-z]{3}).*?([0-9]{4})");
    string s("January 19934");
    smatch match;
    std::stringstream res("");
    if (regex_search(s, match, r)) {
        res << match.str(1) << " " << match.str(2);
    }
    cout << res.str();  // => Jan 1993
    return 0;
}

See the C++ demo

Pattern explanation:

  • ([A-Za-z]{3}) - Group 1: three ASCII letters
  • .*? - any 0+ chars other than line break symbols as few as possible
  • ([0-9]{4}) - Group 2: 4 digits
Community
  • 1
  • 1
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • The output isn't "Jan 9934"? – KOB Feb 15 '17 at 16:24
  • @KOB: *The Outcome should be `Jan 1993`* – Wiktor Stribiżew Feb 15 '17 at 16:25
  • Instead of nongready maybe it is better to use `\D*` instead of `.*?` – Slava Feb 15 '17 at 16:26
  • @Slava: It depends on what kind of strings are expected. I suggested the minimal fix. If there are other digits, say, 1 or 2 or 3 digit chunks before the year, using `\D*` will prevent the match. – Wiktor Stribiżew Feb 15 '17 at 16:27
  • @WiktorStribiżew your example is working, do you know why it is not working anymore when I use Userinput string as `s` ? No Output is being shown even though the String should be the same as "January 19934" – Zesa Rex Feb 17 '17 at 08:16
  • Are you sure it is `std::string`? Check the [`regex_search` signatures](http://en.cppreference.com/w/cpp/regex/regex_search) and check what type of data you pass to `regex_search`. – Wiktor Stribiżew Feb 17 '17 at 08:19
  • @WiktorStribiżew yes i am using `std::string s;` `cin >> s;` instead of `string s("January 19934");` , I have tried debugging to see what exactly is passed to regex_search with Watches in CodeBlocks, couldnt find anything though – Zesa Rex Feb 17 '17 at 09:48
  • Aha, that is unrelated to the current question and the answer is [already provided here](http://stackoverflow.com/questions/5838711/stdcin-in-input-with-spaces). – Wiktor Stribiżew Feb 17 '17 at 09:52
  • 1
    @WiktorStribiżew oh god I am so stupid, forgot that cin cuts off whitespaces with string, need to use getline, thank you! – Zesa Rex Feb 17 '17 at 11:04
2

This could work.

([A-Za-z]{3})([a-z ])+([\d]{4})

Note the space after a-z is important to catch space.

bobble bubble
  • 16,888
  • 3
  • 27
  • 46
Drako
  • 773
  • 10
  • 22
  • Why the quantified group in middle? How about just `([A-Za-z]{3})[a-z ]+([\d]{4})` – bobble bubble Feb 15 '17 at 16:35
  • You are right @bobblebubble I might have some idea to make it group yesterday but today I even can not remember my reasoning so its sign - that was bad idea :) – Drako Feb 16 '17 at 10:16