12

I am making a program where one of the problems is that I need to do some analysis of the bit pattern in some integers.

Because of this I would like to be able to do something like this:

#Does **NOT** work:
num.each_bit do |i|
   #do something with i
end

I was able to make something that works, by doing:

num.to_s(2).each_char do |c|
   #do something with c as a char
end

This however does not have the performance I would like.

I have found that you can do this:

0.upto(num/2) do |i|
   #do something with n[i]
end

This have even worse performance than the each_char method

This loop is going to be executed millions of times, or more, so I would like it to be as fast as possible.

For reference, here is the entirety of the function

@@aHashMap = Hash.new(-1)

#The method finds the length of the longes continuous chain of ones, minus one 
#(101110 = 2, 11 = 1, 101010101 = 0, 10111110 = 4)

def afunc(n) 
if @@aHashMap[n] != -1
    return @@aHashMap[n]
end

num = 0
tempnum = 0
prev = false

(n.to_s(2)).each_char do |i|
    if i
        if prev
            tempnum += 1
            if tempnum > num
                num = tempnum
            end
        else
            prev = true
        end
    else
        prev = false
        tempnum = 0
    end
end

@@aHashMap[n] = num
return num
end
undur_gongor
  • 15,657
  • 5
  • 63
  • 75
Automatico
  • 12,420
  • 9
  • 82
  • 110
  • If you are going for performance, building a lookup table would probably be the right optimization in this case – J. Holmes Jun 06 '12 at 10:50
  • Declaring an `@@`-type variable is highly unusual. Do you have a good reason for doing it? – tadman Jun 06 '12 at 16:05
  • @tadman No, I do not have a very good reason for this. It's just something that stuck when I was making some static varables, and I have not bothered to do any refactoring yet. – Automatico Jun 07 '12 at 17:34
  • In most cases you should use standard `@` variables inside an instance of a class to keep things organized. `@@` are class variables. – tadman Jun 07 '12 at 17:36
  • I think I am missing something but why can't you just iterate through the bits like this: while n > 0 then val = (n & 1); n = n >> 1; puts val; end – Donato Jul 03 '15 at 00:03
  • @Donato, the length of the longest string of 1-bits is wanted. I don't see how your code does that. Perhaps you misunderstood the question. – Cary Swoveland Oct 08 '18 at 06:49
  • Your question seems to be quite popular, as evidenced by both the number of upvotes and the large number of answers, some of which are quite innovative. The problem is that your question is not clear, and can only be understood by reading through your code. Because of that some readers may have skipped the question and in future your question will not likely turn up in relevant searches. I suggest you edit to clarify the question... – Cary Swoveland Oct 09 '18 at 05:46
  • ...Starting with the title, I think you need something like, "Find the longest string of 1's in the binary representation of an integer". Then *begin* the question with more-or-less the same thing in different words, but being more specific, such as the following single-sentence paragraph: "Given a non-negative integer I wish to find a highly-efficient way to determine the length of the longest string of 1's in the binary representation of the number." The [rest is just gravy](https://www.usingenglish.com/reference/idioms/rest+is+gravy.html). – Cary Swoveland Oct 09 '18 at 05:54
  • @CarySwoveland My question is about how to look through bits, not to find a specific pattern. If I can look through the bits, I can do any kind of analysis of the pattern. You suggestion is to change the nature of the question, which I do not agree is beneficial in this case. – Automatico Oct 09 '18 at 13:57
  • Yet most of the answers address the specific problem I mentioned. One of those is @Stefan’s very clever answer. Does the fact that you selected it as being most helpful not tell the reader that that is the problem you want solved? – Cary Swoveland Oct 09 '18 at 15:38
  • It did solve my generic problem. It does the bitwise shift. I agree it is worded towards the specific example problem I was facing, but it is a very fast solution to the bit iteration which my original question ask about. – Automatico Oct 09 '18 at 15:43

8 Answers8

11

To determine the length of the longest sequence of consecutive 1's, this is more efficient:

def longest_one_chain(n)
  c = 0
  while n != 0
    n &= n >> 1
    c += 1
  end
  c
end

The method simply counts how many times you can "bitwise AND" the number with itself shifted 1 bit to the right until it is zero.

Example:

                 ______ <-- longest chain
    01011011100001111110011110101010 c=0
AND  0101101110000111111001111010101
        1001100000111110001110000000 c=1, 1’s deleted
AND      100110000011111000111000000
            100000011110000110000000 c=2, 11’s deleted
AND          10000001111000011000000
                    1110000010000000 c=3, 111’s deleted
AND                  111000001000000
                     110000000000000 c=4, 1111’s deleted
AND                   11000000000000
                      10000000000000 c=5, 11111’s deleted
AND                    1000000000000
                                   0 c=6, 111111’s deleted
Stefan
  • 109,145
  • 14
  • 143
  • 218
  • Brilliant! :D I found some other way of doing this as well, essentially doing recursive bit-shifting and counting. This is better though! – Automatico Jun 07 '12 at 17:23
  • `longest_one_chain -17`? – Cary Swoveland Oct 03 '18 at 21:05
  • @CarySwoveland interesting one! But what _is_ the longest 1-chain for negative numbers? I'd say you have to take the bit length into account to get a meaningful result. So for 8-bit you could pass 239 instead of -17, i.e. `[-17].pack('c').unpack1('C')`. – Stefan Oct 04 '18 at 10:50
  • I see `-17 & 255 #=> 239` as well. – Cary Swoveland Oct 05 '18 at 04:55
4

Ruby might not be a good choice for your project. The strength of ruby is not it's performance but that it lets you do things like:

n.to_s(2).scan(/1+/).sort.last.length - 1

instead of writing mountains of code. Really just about any other language is likely to perform better if you don't mind writing complex code (which you don't seem to).

pguardiario
  • 53,827
  • 19
  • 119
  • 159
3

Be aware that o and "0" all have a boolean value of true in ruby, so "if i" will not give the result you intended.

Converting each number to a string is of course something one should avoid.

Fixnum has a method [] to access bits of the number, so this has the chance to be faster.

If you have tried this with

0.upto(num/2) do |i|
   #do something with n[i]
end

then num/2 is probably much too big, so you loop much too often.

For 32 bit integers you should use

0.upto(31) do |i|
   if n[i] == 1
     ...
   end
end
undur_gongor
  • 15,657
  • 5
  • 63
  • 75
Meier
  • 3,858
  • 1
  • 17
  • 46
3

In Ruby, Integers (i.e. both Bignums and Fixnums) can already be indexed as if they were bit arrays. They are, however, not Enumerable.

But you can fix that, of course:

class Integer
  include Enumerable

  def each
    return to_enum unless block_given?      
    (size*8).times {|i| yield self[i] }
  end
end

A slightly less intrusive way might be to represent the Integer as an array:

class Integer
  def to_a
    Array.new(size*8, &method(:[]))
  end
end

Then you can use Ruby's nifty Enumerable methods:

0b10111110.chunk {|b| true if b == 1 }.map(&:last).max_by(&:size).size - 1

(Or 0b10111110.to_a.chunk … if you prefer the less intrusive method.)

If you are worried about performance, the execution engine you choose makes a big difference. Rubinius's or JRuby's optimizing compiler may be able to inline and optimize away many method calls that YARV's rather simple compiler can't, for example. YARV's special treatment of Fixnum may give it an advantage over MRI.

As you can see from the examples, I am a big fan of point-free style and functional programming. If you can prove via profiling that you have a bottleneck at a specific point in the code, you may need to replace it with a slightly less elegant or impure version, or you may want to hand-fuse the map and max_by.

class Integer
  def to_a
    Array.new(size*8) {|i| self[i] }
  end
end

0b10111110.chunk {|b| true if 1 == b }.map {|key, chunk| chunk.size }.max - 1

or

0b10111110.chunk {|b| true if 1 == b }.max_by {|key, chunk| chunk.size }.last.size - 1
Jörg W Mittag
  • 363,080
  • 75
  • 446
  • 653
1

If you are looking for performance, then building a look-up table will probably be the most performant way. Especially if you are doing these in a tight loop:

class BitCounter
    def initialize
        @lookup_table = (0..65535).map { |d| count_bits(d) }
    end

    def count(val)
        a,b,c = @lookup_table[val & 65535]
        d,e,f = @lookup_table[val >> 16]
        [a,b,c+d,e,f].max
    end

private

    def count_bits(val)
        lsb = lsb_bits(val)
        msb = msb_bits(val)
        [lsb, inner_bits(val, lsb, msb), msb]
    end

    def lsb_bits(val)
        len = 0
        while (val & 1 == 1) do
            val >>= 1
            len += 1
        end
        len
    end

    def msb_bits(val)
        len = 0
        while (val & (1<<15) == (1<<15)) do
            val <<= 1
            len += 1
        end
        len
    end

    def inner_bits(val, lsb, msb)
        lens = []
        ndx = lsb

        len = 0
        (lsb+1..(15-msb)).each do |x|
            if ((val & (1<<x)) == 0)
                if(len > 0)
                    lens << len
                    len = 0
                end
            else
                len += 1
            end
        end
        lens.max || 0
    end
end

And then an example:

counter = BitCounter.new
p counter.count 0b01011011100001111110011110101010  // 6

This basically creates a loopup table for all 16 bit values, and then calculates the largest result from those cached values.

You could even combine the more expressive form of n.to_s(2).scan(/1+/).sort.last.length - 1 rather than doing bitwise logic in your table initialization, since it is no longer the bottleneck point -- although I would stick with bitwise math just for clarity of expression rather than string parsing. Each look up only costs 2 table look ups, one addition and a max

J. Holmes
  • 18,466
  • 5
  • 47
  • 52
  • This looks like a good, but complex solution. Will try it out when I get back to this. However, I have found that my problem needs to be spead up by a factor of 1000 or more, potentially millions of times, so I think I need to leave ruby for this project. But this ruby code is very good to have. – Automatico Jun 07 '12 at 17:37
  • I have the impression that the OP wants to determine the longest string of 1's in arbitrarily large integers. – Cary Swoveland Oct 08 '18 at 07:35
1

Sometimes using strings is the most obvious method, and the performance is tolerable:

def oneseq(n)
  n.to_s(2).split(/0+/).sort_by(&:length).last.to_s.length
end
tadman
  • 208,517
  • 23
  • 234
  • 262
  • Performance is critical in this small app, so I actually need to go over to c++ and some openCL or cuda solution. The problem at hand, I found, is way too large for ruby. – Automatico Jun 07 '12 at 17:22
  • 1
    If you need gigabytes per second levels of performance then you'll need a more C-based solution, but even more importantly, an algorithm that's good at teasing out sequences of 1s. Don't forget it's not too hard to embed C inside Ruby for performance-critical sections. For example: [rubyinline](http://www.zenspider.com/ZSS/Products/RubyInline/) – tadman Jun 07 '12 at 17:39
  • I know this is possible with ruby, but this particular problem requires multi threading. Heavy duty MT at that. Basicly figured that this needs GPU processing, or if I am very unlucky, supercomputer processing. In that case, I am in trouble. – Automatico Jun 07 '12 at 17:42
  • How many numbers are you testing here? Is it a one-shot deal or something you'll be doing constantly? Ripping through even a terabyte-sized pile of integers shouldn't require OpenCL, but should be fine in C if you partition your data-set and launch several processes independently. Threads are often nothing but trouble unless you are a veteran multi-threader. Don't forget there are SIMD instructions you can use that make ordinary C code stupidly fast if you can vectorize it. – tadman Jun 07 '12 at 17:46
  • This is sort of off-topic related to the actual question, but: I am essentially doing an algorithm to find a sum of a range. Only problem is that the range produces numbers so vastly great that I have problems comprehending the size of the problem. Not only are the numbers themselves great, but I got to do calculations on all the numbers preceeding it to figure out the output of this current number. Then I got to find the N-t occurance of this number, where N is in the same order of magnitude as the first. So, I have a problem is what I am saying. :p – Automatico Jun 07 '12 at 17:52
  • What you could do is experiment with algorithms in Ruby until you find one that performs optimally, then port it to whatever you need, C or otherwise. Iterating in C is not always very efficient. – tadman Jun 07 '12 at 19:12
  • Automatico, I suggest you post a question that describes your actual problem, making no assumptions about the approach that should be taken to solving it. You might be surprised by the innovative algorithms that are suggested. – Cary Swoveland Oct 07 '18 at 16:42
  • I concur with tadman's point that the choice of algorithm is key; deciding which language to use is secondary. You might find that Ruby is plenty fast with the right algorithm. – Cary Swoveland Oct 08 '18 at 06:55
0

Algorithm

One might consider using a binary search. I have implemented the following algorithm to determine the length of the longest string of 1-bits in a given non-negative integer n. The computational complexity is O(n), but for many values of n it will approach O(log2(n)).

The steps of the algorithm are below, but the reader many find them easier to follow by reading the following section ("Illustration") first.

  1. Set longest to 0.
  2. Set len equal to the number of bits in n (len = n.bit_length).
  3. If len <= 2, return the answer (0, 1 or 2).
  4. Determine the index mid of the middle bit of n (mid = len/2 or mid = len >> 1).
  5. Set ri = mid+1 and li = mid-1.
  6. If the value of the middle bit (n[mid]) is zero go to Step 10.
  7. (n[mid] = 1 to reach this step.) Determine the largest index li >= mid and the smallest index ri <= mid such that n[i] = 1 for all ri <= i <= li.
  8. Set longest = [longest, ri-li+1].max, li += 1 and ri -= 1.
  9. If li == len go to Step 10; else set ln equal to the number comprised of bits at indices li (least significant) to len-1 (most significant). Recursively set to n to ln and go to step 2. This returns the longest string of 1-bits in the number ln (cand). Set longest = [longest, cand].max.
  10. If ri < 0 go to Step 11; else set rn equal to the number comprised of bits at indices 0 (least significant) to ri (most significant). Recursively set to n to rn and go to step 2. This returns the longest string of 1-bits in thet number rn (cand). Setlongest = [longest, cand].max`.
  11. Return longest in the recursion.

Illustration

Suppose

n = 0b1010011011101011110
  #=> 341854

Then

len = n.bit_length
  #=> 19
mid = len >> 1
  #=> 9
n[mid]
  #=> 1
longest = 0

We may picture n as follows

101001101 1 101011110

where the middle bit 1 stands out. Since it is a 1, we look left and right for sequences of 1's. We obtain the following.

10100110 111 01011110

As we have found three 1's, we update longest.

longest = [longest, 3].max
  #=> [0, 3].max => 3

We must now examine 0b10100110 and 0b1011110 (the leading zero in the second number drops out).

For the left number we compute the following.

n = 0b10100110
len = n.bit_length
  #=> 8
mid = len >> 1
  #=> 4
n[mid]
  #=> 0

so we have

101 0 0110

(Note n[0] is the least-significant bit). We can rule out both 0b101 and 0b110, since both have 3 bits, which does not exceed the current valule of longest, which is 3.

Now we consider the right half,

n = 0b1011110
len = n.bit_length
  #=> 7
mid = len >> 1
  #=> 3
n[mid]
  #=>1

so we have

101 1 110

As n[mid] #=> 1, we look left and right for strings of 1's and obtain

10 1111 0

As we have discovered a sting of 4 1's, we set

longest = [longest, 4].max
  #=> [3, 4].max = 4

Lastly we see that the numbers of bits on the left (2) and on the right (1) are both less than 3, so we are finished, and return longest #=> 4. (The OP actually wants longest - 1 #=> 3, which we regard as a side-calculation.)

Code

def longest_run(n, longest=0)
  len = n.bit_length
  return [longest, (n & 1) + (n >> 1)].max if len <= 2
  mid = len >> 1
  ri = mid-1
  li = mid+1
  if n[mid] == 1
    until n[ri] == 0
      ri -= 1
    end
    until n[li] == 0
      li += 1
    end
    cand = li-ri-1
    longest = cand if cand > longest
  end
  if ri >= 0
    shift = ri+1
    m = n ^ ((n >> shift) << shift)
    if m.bit_length > longest 
      cand = longest_run(m, longest) 
      longest = cand if cand > longest
    end
  end
  if li <= len - 1
    m = n >> li
    if m.bit_length > longest 
      cand = longest_run(m, longest) 
      longest = cand if cand > longest
    end
  end
  longest
end

Examples

longest_run 341854
  #=> 4
longest_run 0b1011110011111000000111100011101111011110
  #=> 5
longest_run 0b101111001111100000011110001110111111011111
  #=> 6
longest_run 2**500_000-1
  #=> 500000
longest_run ("10"*100_000).to_i(2)
  #=> 1
Cary Swoveland
  • 106,649
  • 6
  • 63
  • 100
  • For the number `n = 0b1001100111000` I may wish to spear out the last, say, `6` bits: `111000`. I have done this using [Integer#^](http://ruby-doc.org/core-2.5.1/Integer.html#method-i-5E) ("exclusive or"): `ri = 5; n ^ ((n >> ri+1) << ri+1) #=> 0b111000`. That works, but I'm thinking there must be a better way. I considered `n & (2**(ri+1)-1)`, but Ruby (MRI v2.5.1) seems to not like large powers of `2`: puts `2**100000000 #=> warning: in a**b, b may be too big`. Another way is `n & ("1"*(ri+1)).to_i(2)`. Can readers suggest the best way of extracting the last `m` bits of a positive integer? – Cary Swoveland Oct 08 '18 at 07:25
  • `num.digits(2).first(m)` might do it (in reversed order). – steenslag Oct 08 '18 at 13:36
  • @steenslag, that's interesting. (I didn't know `digits` could take a base as an argument.) I would still have to convert the array of zeros and ones back into an integer, of course. – Cary Swoveland Oct 09 '18 at 07:00
0

Here are a couple more straightforward methods (though I expect @Steven's answer and my other answer would be more efficient).

#1

def max_string_of_ones(n)
  max_length = 0
  cand = 0
  (0..n.bit_length).reduce(0) do |max_length, i|
    if n[i] == 1
      cand += 1
    else
      max_length = cand if cand > max_length
      cand = 0
    end
    max_length
  end
end

Note n[n.bit_length] #=> 0.

#2

This second method uses Ruby's flip-flop operator. In addition, I thought, "Wouldn't it be handy if Integer had an each_bit method that returned an enumerator?", so I added one.

class Integer
  def each_bit
    Enumerator.new do |yielder|
      if block_given?      
        bit_length.times { |i| yielder << yield(self[i]) }
      else
        bit_length.times { |i| yielder << self[i] }
      end
    end
  end
end

def max_string_of_ones(n)
  n.each_bit.slice_before { |b| true if b==0 .. b==1 }.max_by(&:size).size
end

max_string_of_ones(0b1100011101111011100)
  #=> 4

Note the following intermediate calculation.

def max_string_of_ones(n)
  n.each_bit.slice_before { |b| true if b==0 .. b==1 }
end

enum = max_string_of_ones(0b1100011101111011100)
  #=> #<Enumerator: #<Enumerator::Generator:0x00000000019a2f80>:each>
enum.to_a
  #=> [[0], [0], [1, 1, 1], [0], [1, 1, 1, 1], [0],
  #    [1, 1, 1], [0], [0], [0], [1, 1]]
Cary Swoveland
  • 106,649
  • 6
  • 63
  • 100