Bitwise comparisions

Question

#include <iostream>
using namespace std;

int main() {
    int n, a = 0xfffffff;
    cin >> n;
    while (n--) {
        string s;
        cin >> s;
        int c = 0;
        for (char ch : s)
            c |= 1 << (ch - 'a');
        a &= c;
    }
    cout << __builtin_popcount(a) << endl;
    return 0;
}

This code is to find if a character is present in all the inputted strings at least once. Can someone explain whats happening in this code. I am trying to learn bit wise operations in C++ and I am not able to understand whats going on here.

where did you get this code? you should give credits and if possible ask the author what was his intention — 463035818_is_not_an_ai, Jun 03 '16 at 19:03

score 6 · Accepted Answer · edited Jun 20 '20 at 09:12

The code is not to determine if a particular character is present in all strings.

It is to find the number of characters that are present in all strings.

Here's the breakdown of the code.

    int n, a = 0xfffffff;
    cin >> n;

n is the input parameter from the user which determines the number of strings. Let's say n is 3

    while (n--) {
        string s;
        cin >> s;
        int c = 0;
        for (char ch : s)
            c |= 1 << (ch - 'a');
        a &= c;
    }

This is the main part of the code..

you get n strings and for each string you store what characters it is composed of, in a bit array

For eg, let's says first string is "stack". you loop through the characters here

    for (char ch : s)
        c |= 1 << (ch - 'a');

c is initialized to 0 for every string.

In this example "stack", lets see what happens to c

c = c | 1 << ('s'-'a') => c = c | 1 << 18 (18th bit of c is set to 1)

c = c | 1 << ('t'-'a') => c = c | 1 << 19 (19th bit of c is set to 1)

c = c | 1 << ('a'-'a') => c = c | 1 << 0 (0th bit of c is set to 1)

c = c | 1 << ('c'-'a') => c = c | 1 << 2 (2nd bit of c is set to 1)

c = c | 1 << ('k'-'a') => c = c | 1 << 10 (10th bit of c is set to 1)

note that 1 << n means that 1 is left shifted by n digits. so 1 << 3 is = 0001 << 3 = binary 1000 = 2^3 = 8 (In case you did not understand shifting)

now c is a integer whose 0,2,10,18,19 bits are set to 1. a is initialized to all 1's before the loop.

a &= c; this gets rid of all 1's except 0,2,10,18 and 19.

we continue this for all strings. and at the end, a will have 1 bit set in the position which is occupied in all strings..

For eg, if string two is "aaaaa"

computing c reveals that c has only its 0th bit set.

a &= c; this gets rid of all 1's except 0 (ie., it sets 2,10,18 and 19th bits to 0, since they don't occur in "aaaaa")

and string 3 is "zzzzza", at the end, only 0th bit of a will be set

Hence, the number of characters that occurs in all these strings is 1

tadman · Answer 2 · 2016-06-03T19:12:56.217

2

There's not much going on in here, and breaking it down bit by bit reveals what it does:

#include <iostream>

using namespace std;

int main() {
  // Initialize a bitmask, here assumed to be 32-bits
  // which is probably "enough" for this case.
  int n, a = 0xfffffff;

  // Read in the number of strings to process
  cin >> n;

  // Assume n > 0
  while (n--) {
    string s;

    // Read in a string
    cin >> s;

    int c = 0;

    // For each character in this string
    for (char ch : s)
       // Turn on a bit on representing the character, where
       // 'a' is bit 0, 'b' is 1, etc.
       c |= 1 << (ch - 'a');

    // Apply this mask to a
    a &= c;
  }

  // Report on which bits are toggled
  cout << __builtin_popcount(a) << endl;

  return 0;
}

This is some seriously sloppy code on the whole. Any non lower case letters could cause undefined behaviour. Most compilers are 64-bit for modern machines so setting only 32 bits might be insufficient.

Note that when I use the word "assume" here I mean bad things will probably happen.

edited Jun 03 '16 at 19:12

answered Jun 03 '16 at 19:08

tadman

208,517
23
234
262

"Toggle a bit" `|=` doesn't *toggle*, it *sets* – kmdreko Jun 03 '16 at 19:11
I can change the phrasing, but I mean "toggles it on". – tadman Jun 03 '16 at 19:12
2

Strangely, there are only 7 f's in `a` initialization, not 8. Which means the mask only has 28 bits set. I guess the author of the code only needs 28 since there are only 26 letters in the alphabet? – KABoissonneault Jun 03 '16 at 19:12
@KABoissonneault That's a good observation. This whole chunk of code is in dire need of refactoring. I'm not sure why it doesn't start at zero instead of an arbitrary number of 1s. – tadman Jun 03 '16 at 19:13
2

Well there are 26 letters, so 28 is enough. – coyotte508 Jun 03 '16 at 19:13
@coyotte508 ß, þ, æ? Huge assumptions being made here. – tadman Jun 03 '16 at 19:16
@coyotte508 28 is enough, but isn't it too much? As far as I understand, if all 26 letters are found, it should report that 2 bits weren't unset – KABoissonneault Jun 03 '16 at 19:17
@KABoissonneault Yeah, I think `{` and `|` must be present in the string to flip them all to zero. – tadman Jun 03 '16 at 19:19
1

@KABoissonneault no, but if '{' or '|' are present in **all** strings, then the result will be more than expected, that is indeed a behavior error. – coyotte508 Jun 03 '16 at 19:19
@tadman: If it started at zero, it would always be zero. Also, there's no problem if `a` is 64 bit, although `int` is 32-bit on every 64-bit platform I know about. – Nick Matteo Jun 03 '16 at 19:20
1

@KABoissonneault: any extra 1's in `a`'s initial value disappear with `a &= c;`. – Nick Matteo Jun 03 '16 at 19:20
@Kundor upper case letters would cause negative shift which is undefined behavior: http://stackoverflow.com/questions/4945703/left-shifting-with-a-negative-shift-count – coyotte508 Jun 03 '16 at 19:21
@coyotte508: That's true, although I'm not sure why you're telling me. ;-) – Nick Matteo Jun 03 '16 at 19:22
@Kundor well, because of the "there's no problem", but yea, you were referring to something different, my bad! – coyotte508 Jun 03 '16 at 19:24
@kundor Ah, you're right, though there's [reason enough for confusion there](https://en.wikipedia.org/wiki/64-bit_computing#64-bit_data_models). I still think using `uint32_t` would be a lot better than just `int` and leaving it up to the whims of the compiler. – tadman Jun 03 '16 at 19:25

score 1 · Answer 3 · answered Jun 03 '16 at 19:13

Let's first take a look at

int c = 0;
for (char ch : s)
    c |= 1 << (ch - 'a');

You represent the characters of your input string bitwise by using the variable c:

If character a occurs in the string, bit 0 in variable c is set to 1,
If character b occurs in the string, bit 1 in variable c is set to 1,
If character c occurs in the string, bit 2 in variable c is set to 1,
and so on.

All other bits in c are zero.

After you are finished with one string, the code

a &= c;

gets executed. This code sets a bit in variable a to 1 iff it was 1 before and the corresponding bit in c was 1 too. Then the function moves on the reading the next string and does the same again.

At the end of execution, exactly those bits of a are set to 1 that were 1 in all the c's in the while block.

Bitwise comparisions

3 Answers3