2

I am looking for a way to replace all occurrences of 'A' with 1, 'T' with 2, 'C' with 8, and 'G' with 16 in a byte array. How can this be done?

maasha
  • 1,926
  • 3
  • 25
  • 45

2 Answers2

1
require "narray"

class NArray
  def cast(type)
    a = NArray.new(type,*self.shape)
    a[] = self
    a
  end
end

conv = NArray.int(256)
atcg = NArray.to_na('ATCG', NArray::BYTE).cast(NArray::LINT)
conv[atcg] = [1,2,8,16]

seq_str = 'ABCDAGDE'
seq_ary = NArray.to_na(seq_str, NArray::BYTE).cast(NArray::LINT)

p conv[seq_ary]
#=> NArray.int(8):
#   [ 1, 0, 8, 0, 1, 16, 0, 0 ]
masa16
  • 461
  • 3
  • 5
  • Very nice. How do you reckon that compares speedwise to tr: search = 'ACGTUMRWSYKVHDBN'; replace = [1, 2, 4, 8, 2, 5, 9, 3, 12, 6, 10, 13, 7, 11, 14, 15].pack("C*"); string.tr!(search, replace) - ? – maasha Jan 06 '12 at 07:39
  • Test code (https://gist.github.com/1573753) shows tr is faster than NArray. If data is provided as a string, use String#tr. If it is in the context of numerical processing, NArray is applicable. – masa16 Jan 07 '12 at 04:23
0

Is it what you are looking for?

h = {'A' => 1, 'T' => 2, 'C' => 8, 'G' => 16}
a = ['A', 'B', 'C', 'D', 'A', 'G', 'D', 'E']

result = a.map {|c| h.include?(c) ? h[c] : c }
basgys
  • 4,320
  • 28
  • 39
  • Did you note that I am not shifting base of ABC, but specifically wanting to assign new values to ATCG? – maasha Jan 05 '12 at 12:37
  • Ok I completely messed up -_- – basgys Jan 05 '12 at 12:51
  • I think I get it... I changed the answer – basgys Jan 05 '12 at 12:58
  • Please check NArray -> http://narray.rubyforge.org/ - I would like to do this on NArray level. Otherwise I should think that tr would be a lot faster than your proposal. – maasha Jan 05 '12 at 13:01
  • I glanced at NArray API, but I have to admit that I'm not really good at matrix and stuff like that. But as you said, there is probably a better solution and I would be interested to see it just out of curiosity. Good luck :) – basgys Jan 09 '12 at 20:34