33

I have a vector containing strings that follow the format of text_number-number

Eg: Example_45-3

I only want the first number (45 in the example) and nothing else which I am able to do with my current code:

std::vector<std::string> imgNumStrVec;
for(size_t i = 0; i < StrVec.size(); i++){
    std::vector<std::string> seglist;
    std::stringstream ss(StrVec[i]);
    std::string seg, seg2;
    while(std::getline(ss, seg, '_')) seglist.push_back(seg);
    std::stringstream ss2(seglist[1]);
    std::getline(ss2, seg2, '-');
    imgNumStrVec.push_back(seg2); 
}

Are there more streamlined and simpler ways of doing this? and if so what are they?

I ask purely out of desire to learn how to code better as at the end of the day, the code above does successfully extract just the first number, but it seems long winded and round-about.

fakeaccount
  • 933
  • 4
  • 13
  • 23
  • 10
    Did you think about regex? – senfen May 06 '15 at 10:22
  • @senfen I agree. If there aren't performance constraints (i.e. something like ten million times per second), just use a [one line regex](http://stackoverflow.com/a/30083146/52074). – Trevor Boyd Smith May 06 '15 at 17:04
  • 1
    @Trevor Boyd Smith, if we have to parse that milion times per second streams are also bad idea :) – senfen May 07 '15 at 06:29
  • Yay for solving homework questions... Next time look up [documentation](http://en.cppreference.com/w/cpp/string/basic_string). It's all there. – rr- May 07 '15 at 07:11
  • 2
    @rr- I'm teaching myself actually so I'm sorry if this simple question has annoyed you, but as its something I had not previously held any knowledge on prior to this week (shown by the long-winded code above), I felt I should ask. – fakeaccount May 07 '15 at 09:40

8 Answers8

28

You can also use the built in find_first_of and find_first_not_of to find the first "numberstring" in any string.

    std::string first_numberstring(std::string const & str)
    {
      char const* digits = "0123456789";
      std::size_t const n = str.find_first_of(digits);
      if (n != std::string::npos)
      {
        std::size_t const m = str.find_first_not_of(digits, n);
        return str.substr(n, m != std::string::npos ? m-n : m);
      }
      return std::string();
    }
user120242
  • 14,918
  • 3
  • 38
  • 52
Pixelchemist
  • 24,090
  • 7
  • 47
  • 71
  • But this would be less efficient. – Lingxi May 06 '15 at 10:45
  • 1
    @Lingxi: Compared to what? The `find("_")` + `find("-")` (with `"` or `'` in place) implies several things about the string: It must include `_` and `-` && `find("-") > find("_")`. My code does not restrict the layout of the string at all. – Pixelchemist May 06 '15 at 10:58
  • Granted this is a more general answer (and you can't really fault that), but Lingxi is correct in regards to the question in hand - "I have a `vector` containing `strings` that follow the format of `text_number-number`" – fakeaccount May 06 '15 at 11:00
  • @fakeaccount I guess that the performance difference for both approaches would not be easily measurable unless setting up a specific benchmark with a high number of consecutive executions. In my opinion this answer is a better match for OPs "desire to learn how to code better" since I' say that generic, clear and still efficient code is "better" or to prefer rather than writing an implementation fitted for a specific corner case. – Pixelchemist May 06 '15 at 11:17
18

This should be more efficient than Ashot Khachatryan's solution. Note the use of '_' and '-' instead of "_" and "-". And also, the starting position of the search for '-'.

inline std::string mid_num_str(const std::string& s) {
    std::string::size_type p  = s.find('_');
    std::string::size_type pp = s.find('-', p + 2); 
    return s.substr(p + 1, pp - p - 1);
}

If you need a number instead of a string, like what Alexandr Lapenkov's solution has done, you may also want to try the following:

inline long mid_num(const std::string& s) {
    return std::strtol(&s[s.find('_') + 1], nullptr, 10);
}
anatolyg
  • 26,506
  • 9
  • 60
  • 134
Lingxi
  • 14,579
  • 2
  • 37
  • 93
  • 3
    This solution only works for a very SPECIFIC pattern... i.e. the number MUST have a '_' character before it and a '-' character following the number. Strings like "asdfasdf 45 sdfsdf" or "X = 45, sdfsdf" would not work. – Trevor Boyd Smith May 06 '15 at 17:23
  • 4
    @TrevorBoydSmith yes, but according to the OP the data _will_ be in this pattern. – Baldrickk May 07 '15 at 09:00
15

updated for C++11

(important note for compiler regex support: for gcc. you need version 4.9 or later. i tested this on g++ version 4.9[1], and 9.2. cppreference.com has in browser compiler that i used.)

Thanks to user @2b-t who found a bug in the c++11 code!

Here is the C++11 code:

#include <iostream>
#include <string>
#include <regex>

using std::cout;
using std::endl;

int main() {
    std::string input = "Example_45-3";
    std::string output = std::regex_replace(
        input,
        std::regex("[^0-9]*([0-9]+).*"),
        std::string("$1")
        );
    cout << input << endl;
    cout << output << endl;
}

boost solution that only requires C++98

Minimal implementation example that works on many strings (not just strings of the form "text_45-text":

#include <iostream>
#include <string>
using namespace std;
#include <boost/regex.hpp>

int main() {
    string input = "Example_45-3";
    string output = boost::regex_replace(
        input,
        boost::regex("[^0-9]*([0-9]+).*"),
        string("\\1")
        );
    cout << input << endl;
    cout << output << endl;
}

console output:

Example_45-3
45

Other example strings that this would work on:

  • "asdfasdf 45 sdfsdf"
  • "X = 45, sdfsdf"

For this example I used g++ on Linux with #include <boost/regex.hpp> and -lboost_regex. You could also use C++11x regex.

Feel free to edit my solution if you have a better regex.


Commentary:

If there aren't performance constraints, using Regex is ideal for this sort of thing because you aren't reinventing the wheel (by writing a bunch of string parsing code which takes time to write/test-fully).

Additionally if/when your strings become more complex or have more varied patterns regex easily accommodates the complexity. (The question's example pattern is easy enough. But often times a more complex pattern would take 10-100+ lines of code when a one line regex would do the same.)


[1]

[1] Apparently full support for C++11 <regex> was implemented and released for g++ version 4.9.x and on Jun 26, 2015. Hat tip to SO questions #1 and #2 for figuring out the compiler version needing to be 4.9.x.

Trevor Boyd Smith
  • 18,164
  • 32
  • 127
  • 177
  • 1
    Other than specifying that for c++ you include just `` and change `boost::` to `std::` (some people aren't going to know what [boost](http://www.boost.org/) is) pretty solid answer. – fakeaccount May 07 '15 at 09:31
  • Why, exactly, would you account for strings of a different format? If they're supposed to be in that format then I'd assume allowing strings of a different format would be an error. – Mdev May 07 '15 at 21:34
  • If you would like an explanation of the regex syntax I would be happy to provide such an explanation. – Trevor Boyd Smith May 09 '15 at 02:10
  • @matthew the original title seems to me IMO to capture the intent of the question ... I.e. the question to me seems more to be "generally how would you extract a number from a string?" Instead of "how would you extract a number from a string that uses ONLY this ONE specific pattern?". Given that the OP has accepted this solution, maybe my interpretation was correct and maybe the question could clarify whether it wants a general solution like mine... Or a specific solution like Lingxi solution. – Trevor Boyd Smith May 09 '15 at 02:17
14

Check this out

std::string ex = "Example_45-3";
int num;
sscanf( ex.c_str(), "%*[^_]_%d", &num );
  • The OP want a `std::string`. – Lingxi May 06 '15 at 10:33
  • 2
    A C solution when the OP clearly seeks a C++ solution. – Peter May 06 '15 at 10:34
  • @Peter, it's not written that C++ solution is required. This code compiles and runs on C++, being faster that any other variant. – Alexander Lapenkov May 06 '15 at 10:35
  • Your idea is inspiring, anyway. Thumb up :-) But I don't think this method is any more efficient than the others. – Lingxi May 06 '15 at 10:37
  • This method is probably "more streamlined" like OP wanted, and maybe also "simpler" (depends on how much C experience you have) – anatolyg May 06 '15 at 10:42
  • @anatolyg thanks, mate. I also think that the simplier and one of the shortest solutions. – Alexander Lapenkov May 06 '15 at 10:43
  • @anatolyg To obtain a number instead of a string, this seems more streamlined: `long n = std::strtol(&s[s.find('-') + 1], nullptr, 10);` – Lingxi May 06 '15 at 10:55
  • 3
    @Alex - a post tagged as C++ and not C is a fair hint of wanting a C++ solution rather than C. And there is no basis for your claim that this is faster than alternatives. – Peter May 06 '15 at 11:39
  • @Peter so what, this is also a solution, tags don't matter, this works in C++ and this could help the author. Want more arguments? The answer is for the author, don't like it - don't use. – Alexander Lapenkov May 06 '15 at 12:14
  • 1
    @AlexandrLapenkov If you use `std::sscanf`, nobody will be able to object it isn't C++. – edmz May 06 '15 at 13:04
  • 2
    @black thanks for advice :) I think this is really very stupid here to shout "this is not c++!!!!". – Alexander Lapenkov May 06 '15 at 13:10
  • 2
    Has the major advantage that sscanf is already debugged by your compiler vendor - get the format expression right and your job is done. The other "C++" solutions (I agree that's a red herring) involve index manipulation/subtraction/off by {+1/-1/+2} fencepost errors, and will need to be debugged pretty well before you rely on it. Note that neither @lingxi's answer nor this one properly account for formatting errors in the string. This answer is fixed by checking the return value from sscanf, the other answer requires checking two values against std::string::npos before doing string::substr. – davidbak May 06 '15 at 16:51
  • It is equally true for C and C++ that compiler vendors debug their standard library implementations - not an advantage of C over C++ or vice versa. All of the responses to this question - including this one - have gotchas if the string is not formatted as expected, which can be addressed by better error checking and understanding the tools being used. Both C and C++ have their place, but using them effectively involves different mindsets. Applying a C mindset to C++ is just as bad in the long run as a C++ mindset in C. Recommending `sscanf()` (in `std` or not) is advocating a C mindset. – Peter May 08 '15 at 09:03
12

I can think of two ways of doing it:

  • Use regular expressions
  • Use an iterator to step through the string, and copy each consecutive digit to a temporary buffer. Break when it reaches an unreasonable length or on the first non-digit after a string of consecutive digits. Then you have a string of digits that you can easily convert.
Diogo Cunha
  • 1,194
  • 11
  • 23
9
std::string s = "Example_45-3";
int p1 = s.find("_");
int p2 = s.find("-");
std::string number = s.substr(p1 + 1, p2 - p1 - 1)
Ashot Khachatryan
  • 2,156
  • 2
  • 14
  • 30
9

The 'best' way to do this in C++11 and later is probably using regular expressions, which combine high expressiveness and high performance when the test is repeated often enough.

The following code demonstrates the basics. You should #include <regex> for it to work.

// The example inputs
std::vector<std::string> inputs {
    "Example_0-0", "Example_0-1", "Example_0-2", "Example_0-3", "Example_0-4",
    "Example_1-0", "Example_1-1", "Example_1-2", "Example_1-3", "Example_1-4"
};

// The regular expression. A lot of the cost is incurred when building the
// std::regex object, but when it's reused a lot that cost is amortised.
std::regex imgNumRegex { "^[^_]+_([[:digit:]]+)-([[:digit:]]+)$" };

for (const auto &input: inputs){
    // This wil contain the match results. Parts of the regular expression
    // enclosed in parentheses will be stored here, so in this case: both numbers
    std::smatch matchResults;

    if (!std::regex_match(input, matchResults, imgNumRegex)) {
        // Handle failure to match
        abort();
    }

    // Note that the first match is in str(1). str(0) contains the whole string
    std::string theFirstNumber = matchResults.str(1);
    std::string theSecondNumber = matchResults.str(2);

    std::cout << "The input had numbers " << theFirstNumber;
    std::cout << " and " << theSecondNumber << std::endl;
}
Thierry
  • 1,099
  • 9
  • 19
1

Using @Pixelchemist's answer and e.g. std::stoul:

bool getFirstNumber(std::string const & a_str, unsigned long & a_outVal)
{
    auto pos = a_str.find_first_of("0123456789");

    try
    {
        if (std::string::npos != pos)
        {
            a_outVal = std::stoul(a_str.substr(pos));

            return true;
        }
    }
    catch (...)
    {
        // handle conversion failure
        // ...
    }

    return false;
}
darkbit
  • 31
  • 3