How to check if string is a proper subset of another string

Question

I want to check if a string is a strictly a subset of another string. For this end I used boost::contains and I compare the size of strings as follows:

#include <boost/algorithm/string.hpp>
#include <iostream>

using namespace std;
using namespace boost::algorithm;

int main()
{
  string str1 = "abc news";
  string str2 = "abc";
  //strim strings using boost
  trim(str1);
  trim(str2);
  //if str2 is a subset of str1 and its size is less than the size of str1 then it is strictly contained in str1
  if(contains(str1,str2) && (str2.size() < str1.size()))
  {
    cout <<"contains" << end;
  }
  return 0;
}

Is there a better way to solve this problem? Instead of also comparing the size of strings?

Example

ABC is a proper subset of ABC NEWS
ABC is not a proper subset of ABC

I would reverse the tests (i.e. check size *before* calling `contains()`), other than that, I don't think you can make it any better.. — Nim, May 04 '15 at 12:21

score 4 · Answer 1 · answered May 04 '15 at 12:24

4

I would use the following:

bool is_substr_of(const std::string& sub, const std::string& s) {
  return sub.size() < s.size() && s.find(sub) != s.npos;
}

This uses the standard library only, and does the size check first which is cheaper than s.find(sub) != s.npos.

answered May 04 '15 at 12:24

Lingxi

14,579
2
37
93

No idea about the downvote. This answer uses only the standard library and doesn't waste time comparing the strings. – Galik May 04 '15 at 12:38

score 3 · Accepted Answer · edited May 23 '17 at 12:28

3

You can just use == or != to compare the strings:

if(contains(str1, str2) && (str1 != str2))
    ...

If string contains a string and both are not equal, you have a real subset.

If this is better than your method is for you to decide. It is less typing and very clear (IMO), but probably a little bit slower if both strings are long and equal or both start with the same, long sequence.

Note: If you really care about performance, you might want to try the Boyer-Moore search and the Boyer-Moore-Horspool search. They are way faster than any trivial string search (as apparently used in the string search in stdlibc++, see here), I do not know if boost::contains uses them.

edited May 23 '17 at 12:28

Community

1
1

answered May 04 '15 at 12:14

Baum mit Augen

49,044
25
144
182

1

Comparing length would be more efficient. – tenfour May 04 '15 at 12:17
@tenfour Indeed. Most certainly does not matter for strings like "ABC News" though. – Baum mit Augen May 04 '15 at 12:18
1

But you can't know what the string will be. It could be the works of Shakespeare! :) – Galik May 04 '15 at 12:39
@Galik But whoever writes the code does know (or at lest know that he does not know). If he finds out by profiling that the cost of `!=` is significant, he can compare lengths as an optimization. If he finds comparing lengths clearer, he can go for it. – Baum mit Augen May 04 '15 at 12:42
@Galik Also even the longest Shakespear play (Hamlet) has only 30,557 words, comparing to this a couple of times would still be pretty fast. :) – Baum mit Augen May 04 '15 at 12:44
2

I don't see that as a reason for choosing a less efficient solution over the more efficient ones that have been presented here. – Galik May 04 '15 at 12:45
@Galik As I said, one might consider it clearer since *"proper subset"* is commonly defined a *"subset and not equal"* rather than *'subset and has less elements"* (the latter definition is false for infinite sets btw). I mentioned in my answer that the user needs to decide what fits his needs. – Baum mit Augen May 04 '15 at 12:51
There is no difference between "subset and not equal" and "subset and has different size". If they have different size they are definitely not equal. And if they have the same size and is a subset, then they are definitely equal. @BaummitAugen can you list a single real advantage that comparing strings has over comparing length? – tenfour May 04 '15 at 13:42
@tenfour There is no difference in the *finite* case. In the *infinite* case, there is: The natural numbers are a proper subset of the integers which is a proper subset of the rational numbers, but all those three sets have exactly the same size. This is obviously no problem on Computers because they are finite anyway. The advantage of my method is that it directly expresses the mathematical definition as code. It comes at the cost I mentioned. Now the reader may decide what is more important for him. – Baum mit Augen May 04 '15 at 13:48

score 1 · Answer 3 · edited May 23 '17 at 11:58

1

About Comparaison operations

TL;DR : Be sure about the format of what you're comparing.

Be wary of how you define strictly.

For example, you did not pointed out thoses issue is your question, but if i submit let's say :

 "ABC       " //IE whitespaces
 "ABC\n"

What is your take on it ? Do you accept it or not ? If you don't, you'll have to either trim or to clean your output before comparing - just a general note on comparaison operations -

Anyway, as Baum pointed out, you can either check equality of your strings using == or you can compare length (which is more efficient given that you first checked for substring) with either size() or length();

edited May 23 '17 at 11:58

Community

1
1

answered May 04 '15 at 12:19

Mekap

2,065
14
26

yes that's very important mekap. I had to mentioned it. I actually trimmed the strings before i do the comparison – Hani Goc May 04 '15 at 12:20
@HaniGoc Yes, i wrote this answer more for people who'll click on this page, to be careful about what operations to do before actually comparing. – Mekap May 04 '15 at 12:26

score 0 · Answer 4 · answered May 04 '15 at 12:26

0

another approach, using only the standard library:

#include <algorithm>
#include <string>
#include <iostream>

using namespace std;

int main()
{
  string str1 = "abc news";
  string str2 = "abc";
  if (str2 != str1
    && search(begin(str1), end(str1), 
              begin(str2), end(str2)) != end(str1))
  {
    cout <<"contains" << endl;
  }
  return 0;
}

answered May 04 '15 at 12:26

Richard Hodges

68,278
7
90
142

1

`std::string::find` seems neater. – Lingxi May 04 '15 at 12:28
It's another way, you're right. The advantage of std::search is that the searched-for string can be any sequence bounded by 2 iterators. – Richard Hodges May 04 '15 at 13:45

How to check if string is a proper subset of another string

Example

4 Answers4

About Comparaison operations