3
#include <iostream>
using namespace std;

int main() {
    int n, a = 0xfffffff;
    cin >> n;
    while (n--) {
        string s;
        cin >> s;
        int c = 0;
        for (char ch : s)
            c |= 1 << (ch - 'a');
        a &= c;
    }
    cout << __builtin_popcount(a) << endl;
    return 0;
}

This code is to find if a character is present in all the inputted strings at least once. Can someone explain whats happening in this code. I am trying to learn bit wise operations in C++ and I am not able to understand whats going on here.

Raktim Biswas
  • 4,011
  • 5
  • 27
  • 32
srip
  • 617
  • 1
  • 6
  • 18

3 Answers3

6

The code is not to determine if a particular character is present in all strings.

It is to find the number of characters that are present in all strings.

Here's the breakdown of the code.

    int n, a = 0xfffffff;
    cin >> n;

n is the input parameter from the user which determines the number of strings. Let's say n is 3

    while (n--) {
        string s;
        cin >> s;
        int c = 0;
        for (char ch : s)
            c |= 1 << (ch - 'a');
        a &= c;
    }

This is the main part of the code..

you get n strings and for each string you store what characters it is composed of, in a bit array

For eg, let's says first string is "stack". you loop through the characters here

    for (char ch : s)
        c |= 1 << (ch - 'a');

c is initialized to 0 for every string.

In this example "stack", lets see what happens to c

c = c | 1 << ('s'-'a') => c = c | 1 << 18 (18th bit of c is set to 1)

c = c | 1 << ('t'-'a') => c = c | 1 << 19 (19th bit of c is set to 1)

c = c | 1 << ('a'-'a') => c = c | 1 << 0 (0th bit of c is set to 1)

c = c | 1 << ('c'-'a') => c = c | 1 << 2 (2nd bit of c is set to 1)

c = c | 1 << ('k'-'a') => c = c | 1 << 10 (10th bit of c is set to 1)

note that 1 << n means that 1 is left shifted by n digits. so 1 << 3 is = 0001 << 3 = binary 1000 = 2^3 = 8 (In case you did not understand shifting)

now c is a integer whose 0,2,10,18,19 bits are set to 1. a is initialized to all 1's before the loop.

a &= c; this gets rid of all 1's except 0,2,10,18 and 19.

we continue this for all strings. and at the end, a will have 1 bit set in the position which is occupied in all strings..

For eg, if string two is "aaaaa"

computing c reveals that c has only its 0th bit set.

a &= c; this gets rid of all 1's except 0 (ie., it sets 2,10,18 and 19th bits to 0, since they don't occur in "aaaaa")

and string 3 is "zzzzza", at the end, only 0th bit of a will be set

Hence, the number of characters that occurs in all these strings is 1

Community
  • 1
  • 1
2

There's not much going on in here, and breaking it down bit by bit reveals what it does:

#include <iostream>

using namespace std;

int main() {
  // Initialize a bitmask, here assumed to be 32-bits
  // which is probably "enough" for this case.
  int n, a = 0xfffffff;

  // Read in the number of strings to process
  cin >> n;

  // Assume n > 0
  while (n--) {
    string s;

    // Read in a string
    cin >> s;

    int c = 0;

    // For each character in this string
    for (char ch : s)
       // Turn on a bit on representing the character, where
       // 'a' is bit 0, 'b' is 1, etc.
       c |= 1 << (ch - 'a');

    // Apply this mask to a
    a &= c;
  }

  // Report on which bits are toggled
  cout << __builtin_popcount(a) << endl;

  return 0;
}

This is some seriously sloppy code on the whole. Any non lower case letters could cause undefined behaviour. Most compilers are 64-bit for modern machines so setting only 32 bits might be insufficient.

Note that when I use the word "assume" here I mean bad things will probably happen.

tadman
  • 208,517
  • 23
  • 234
  • 262
  • "Toggle a bit" `|=` doesn't *toggle*, it *sets* – kmdreko Jun 03 '16 at 19:11
  • I can change the phrasing, but I mean "toggles it on". – tadman Jun 03 '16 at 19:12
  • 2
    Strangely, there are only 7 f's in `a` initialization, not 8. Which means the mask only has 28 bits set. I guess the author of the code only needs 28 since there are only 26 letters in the alphabet? – KABoissonneault Jun 03 '16 at 19:12
  • @KABoissonneault That's a good observation. This whole chunk of code is in dire need of refactoring. I'm not sure why it doesn't start at zero instead of an arbitrary number of 1s. – tadman Jun 03 '16 at 19:13
  • 2
    Well there are 26 letters, so 28 is enough. – coyotte508 Jun 03 '16 at 19:13
  • @coyotte508 ß, þ, æ? Huge assumptions being made here. – tadman Jun 03 '16 at 19:16
  • @coyotte508 28 is enough, but isn't it too much? As far as I understand, if all 26 letters are found, it should report that 2 bits weren't unset – KABoissonneault Jun 03 '16 at 19:17
  • @KABoissonneault Yeah, I think `{` and `|` must be present in the string to flip them all to zero. – tadman Jun 03 '16 at 19:19
  • 1
    @KABoissonneault no, but if '{' or '|' are present in **all** strings, then the result will be more than expected, that is indeed a behavior error. – coyotte508 Jun 03 '16 at 19:19
  • @tadman: If it started at zero, it would always be zero. Also, there's no problem if `a` is 64 bit, although `int` is 32-bit on every 64-bit platform I know about. – Nick Matteo Jun 03 '16 at 19:20
  • 1
    @KABoissonneault: any extra 1's in `a`'s initial value disappear with `a &= c;`. – Nick Matteo Jun 03 '16 at 19:20
  • @Kundor upper case letters would cause negative shift which is undefined behavior: http://stackoverflow.com/questions/4945703/left-shifting-with-a-negative-shift-count – coyotte508 Jun 03 '16 at 19:21
  • @coyotte508: That's true, although I'm not sure why you're telling me. ;-) – Nick Matteo Jun 03 '16 at 19:22
  • @Kundor well, because of the "there's no problem", but yea, you were referring to something different, my bad! – coyotte508 Jun 03 '16 at 19:24
  • @kundor Ah, you're right, though there's [reason enough for confusion there](https://en.wikipedia.org/wiki/64-bit_computing#64-bit_data_models). I still think using `uint32_t` would be a lot better than just `int` and leaving it up to the whims of the compiler. – tadman Jun 03 '16 at 19:25
1

Let's first take a look at

int c = 0;
for (char ch : s)
    c |= 1 << (ch - 'a');

You represent the characters of your input string bitwise by using the variable c:

  • If character a occurs in the string, bit 0 in variable c is set to 1,
  • If character b occurs in the string, bit 1 in variable c is set to 1,
  • If character c occurs in the string, bit 2 in variable c is set to 1,
  • and so on.

All other bits in c are zero.

After you are finished with one string, the code

a &= c;

gets executed. This code sets a bit in variable a to 1 iff it was 1 before and the corresponding bit in c was 1 too. Then the function moves on the reading the next string and does the same again.

At the end of execution, exactly those bits of a are set to 1 that were 1 in all the c's in the while block.

Xaver
  • 1,035
  • 9
  • 17