0

I have a file that I have opened with std::ifstream. I have have a line of code that I want to parse:

<image source="tileset/grass-tiles-2-small.png" width="384" height="192"/>

And lets say I am interested in "384" found after width="

I am at a loss as how to best extract "384" from that line as the number 384 is not constant at all.

void parseFile(const std::string &mfName)
{
    std::ifstream file(mfName);

    std::string line;


    if (file.is_open())
    {
        while (getline(file, line))
        {
            std::size_t found = line.find("width");

            if (found != std::string::npos)
            {
                std::cout << found << std::endl;
            }
        }
    }
    else
        std::cerr << "file failed to open" << std::endl;
} 

Could anyone give me a hint or a link to a good tutorial that covers this?

πάντα ῥεῖ
  • 1
  • 13
  • 116
  • 190
AkiraRei
  • 3
  • 2
  • Take a look at boost::regex or std::regex if you are using C++11 – mathematician1975 Mar 02 '14 at 19:02
  • Do you have fixed points? Like: do you know at which line your information is, or is the imagename always the same? Do you already use additional libraries in your project, or should it work with plain c++ and stl? – MatthiasB Mar 02 '14 at 19:03
  • **http://www.johndcook.com/cpp_regex.html** if you use tr1 you can have alook to this link – user1767754 Mar 02 '14 at 19:04
  • yes, I know or can easily find out at which line my information is located at. and I am using c++11 std – AkiraRei Mar 02 '14 at 19:42

4 Answers4

1

This is your file:

<image source="tileset/grass-tiles-2-small.png" width="384" height="192"/>

And since all you're interested in is the width, we should first get the entire line:

if (std::getline(file, line))
{

Now we need to find width. We do that using the find() method:

    std::size_t pos = line.find("width");

The string inside find() is the value we want to look for.

Once we check if it found this position:

    if (pos != std::string::npos)
    {

We need to put it into a std::stringstream and parse out the data:

        std::istringstream iss(line.substr(pos));

The substr() call is used to select a subsequence of the string. pos is the position where we found "width". So far this is what is inside the stringstream:

 width="384" height="192"/>

Since we don't actually care about "width" but rather with the number inside the quotes, we have to ignore() everything before the quotes. That is done like this:

        iss.ignore(std::numeric_limits<std::streamsize>::max(), '"');

Now we use the extractor to extract the integer:

        int width;

        if (iss >> width)
        {
            std::cout << "The width is " << width << std::endl;
        }

I hope this helps. Here's a full example of the program:

#include <iostream>
#include <fstream>
#include <string>
#include <sstream>

void parseFile(const std::string& mfName)
{
    std::ifstream file(mfName);
    std::string line;

    if (std::getline(file, line))
    {
        auto pos = line.find("width");
        if (pos != std::string::npos)
        {
            std::istringstream iss(line.substr(pos));
            int width;

            if (iss.ignore(std::numeric_limits<std::streamsize>::max(), '"') &&
                iss >> width)
            {
                std::cout << "The width is " << width << std::endl;
            }
        }
    }
}
David G
  • 94,763
  • 41
  • 167
  • 253
  • I am trying to understand igonre() after: std::istringstream iss(line.substr(pos)); we have: width="384" height="192"/> but after: if (iss.ignore(std::numeric_limits::max(), '"') && iss >> width) we get the number. I thought ignore would, well ignore all char's up to and including '"' how come we end up with just a number and not go from: width="384" height="192"/> – AkiraRei Mar 05 '14 at 20:17
  • @user2299044 Inside the `if` statement, the `ignore()` call runs first. It will ignore (or rather "skip") all the characters up until and including `"`. When the `ignore()` call finishes, this is what will be the remaining content: `384" height="192"/>`. When `iss >> width` runs, it will extract the integer until it finds a non-integral character, namely the other `"`. – David G Mar 05 '14 at 20:30
  • quite clever. coming from python I expected to have to do what iss >> width does manually in C++. – AkiraRei Mar 05 '14 at 21:32
0

Parse strings using a regex parser. As you are doing C++, include the <regex> header, and use the function regex_search to match results. The results go into a smatch object, which is iteratable.

Reference: http://www.cplusplus.com/reference/regex/

Also see: Retrieving a regex search in C++

Community
  • 1
  • 1
Hidde
  • 11,493
  • 8
  • 43
  • 68
0

If I were you, I'd use an XML library (if this is actually XML). This is one of the things you certainly don't want to reinvent but reuse! :)

In the past, I've successfully used TinyXML for smaller projects. Or google "c++ xml library" for alternatives.

Christian Hackl
  • 27,051
  • 3
  • 32
  • 62
0

Using Boost-Regex, you can use something like following in your function

/* std::string line = "<image source= \
     \"tileset/grass-tiles-2-small.png\" width=\"384\" height=\"192\"/>";
*/

boost::regex expr ("width=\"(\\d+)\"");
boost::smatch matches;

if (boost::regex_search(line, matches, expr)) 
{
    std::cout << "match: " << matches[1] << std::endl;
}
P0W
  • 46,614
  • 9
  • 72
  • 119
  • Would using regex be the recommended way to parse a file that includes many lines like the one I have mentioned? – AkiraRei Mar 02 '14 at 19:30
  • @user2299044 First of all I'd use a scripting language to do that, but if C++ is really the need here, `boost::regex` would be one of the preferred option, and yes you can use it for many lines too, in your case it will be line-by-line. You can refer the online document for any regex for for help, it will be all most same for boost too – P0W Mar 02 '14 at 19:35