0

This is probably a simple question, but I can't get it to work. I've searched and have tried all the suggestions people gave for the OR operator ||, but my code just won't properly use it.

So here's what I've got: This part of my code looks at a vector (currently around ~20,000 entries long, but later will be in the millions) and erases all elements that contain the keywords: " event", " /event", " rwgt", and " /rwgt". It works perfectly fine when I use four for loops, one for each keyword like so:

for (int j = 0; j< myvec.size()-1;j++) {

    if(myvec[j] == "  <event>") {    //erase all instances of "<event>"
        myvec.erase(myvec.begin()+j);
    }
}

for (int j = 0; j< myvec.size()-1; j++) {

    if(myvec[j] == "  </event>") {   //erase all instances of "</event>"
        myvec.erase(myvec.begin()+j);
    }
}

for (int j = 0; j< myvec.size()-1; j++) {

    if(myvec[j] == "  <rwgt>") {   //erase all instances of "<rwgt>"
        myvec.erase(myvec.begin()+j);
    }
}

for (int j = 0; j< myvec.size()-1; j++) {

    if(myvec[j] == "  </rwgt>") {   //erase all instances of "</rwgt>"
        myvec.erase(myvec.begin()+j);
    }
}

However this is getting fairly computationally expensive (it takes a few minutes right now with only 20,000 entries; I can't imagine it when we get to the millions!) So I wanted to combine all four keywords into one for loop using the || (OR) operator like so:

for (int j = 0; j< myvec.size()-1;j++) {

if(myvec[j] == "  <event>" || myvec[j]  == "  </event>" || myvec[j] == "  <rwgt>" || myvec[j] == "  </rwgt>") {  //erase all instances of "<event>"
    myvec.erase(myvec.begin()+j);
}
}

However this is taking even longer than the original four for loops, and it ends up giving me a segmentation fault (core dumped) error, which to my knowledge means the vector is now empty somehow.

Does anyone have any ideas on how to fix this?

Thanks in advance!

khfrekek
  • 211
  • 1
  • 5
  • 12
  • 2
    C++ have many nice [algorithmic function in the standard library](http://en.cppreference.com/w/cpp/algorithm), including a [function to remove entries from a range](http://en.cppreference.com/w/cpp/algorithm/remove). Use it with [the erase-remove idiom](https://en.wikipedia.org/wiki/Erase%E2%80%93remove_idiom). – Some programmer dude Jun 16 '15 at 13:32
  • 3
    Also, if you are in fact parsing some XML, use a specialized parser. – Petr Jun 16 '15 at 13:34
  • 1
    Please note _your code as written is buggy and will skip some elements_. This is due to you erasing an item and then moving on. If, for example, you erase something at position '4' you'll then check position 5. However, when you erase position 4 the rest of the vector will move up putting something new in position 4 but you wont run the check on this... – Mike Vine Jun 16 '15 at 14:06

3 Answers3

3

You can use functor and std::remove_if() to do this.

#include <iostream>
#include <vector>
#include <string>
#include <algorithm>


struct removeTag
{
    bool operator()(const std::string& tag) const
    {
        return tag == "  <event>" || tag == "  </event>" || tag == "  <rwgt>" || tag == "  </rwgt>";
    }
};

int main()
{
    std::vector<std::string> data = { "  <event>", "  </event>", "  <rwgt>", "  </rwgt>" };
    auto it = std::remove_if(data.begin(), data.end(), removeTag());
    data.erase(it, data.end());
    std::cout << data.size();
    std::cin.get();
    return 0;
}

Ouput:

0
Community
  • 1
  • 1
NathanOliver
  • 171,901
  • 28
  • 288
  • 402
  • Thank you for your input! I tried using the method you gave but my compiler didn't like it, it seems like I need to install c++11 in order for it to work. Guess that's what I'll be doing next! – khfrekek Jun 16 '15 at 14:50
  • Yes `std::vector data = { " ", " ", " ", " " };` requires C++11 as it uses an initializer list. – NathanOliver Jun 16 '15 at 14:51
2

Use erase-remove idiom.

myvec.erase(
    std::remove_if(myvec.begin(), myvec.end(), [](std::string const &e) {
      return e == " blabla" || e == "blasd";
    }), myvec.end());
Blaz Bratanic
  • 2,279
  • 12
  • 17
1

I tried your code, and it worked fine after I replaced

for(int j = 0; j < myvec.size()-1; j++)

with

for(int j = 0; j < myvec.size(); j++)

and

myvec.erase(myvec.begin() + j);

with

myvec.erase(myvec.begin() + j--);

My Testprogramm looked like this:

#include <iostream>
#include <string>
#include <vector>

using namespace std;

int main()
{
   vector<string> myvec = {"<event>", "nichts", "</event>", "<rwgt>", "</rwgt>"};

   for (int j = 0; j < myvec.size(); j++) {

       if(myvec[j] == "<event>" || myvec[j] == "</event>" || 
             myvec[j] == "<rwgt>" || myvec[j] == "</rwgt>") {   

          myvec.erase(myvec.begin() + j--);
       }
   }

   for(int i = 0; i < myvec.size(); i++) {
      cout<<myvec[i]<<endl;
   }
   return 0;
}
Chris
  • 98
  • 5
  • @khfrekek This solution is *far* less efficient than the ones presented in the other two answers, which is why you're seeing such poor performance. Think (and learn) about what happens in which case, and don't use this code in practice. – bogdan Jun 16 '15 at 18:15