-2

I have a very large list of positions (DNA loci) and need to convert it a sequence of binaries.

Example:

Input:

[3,5] # positions 3 and 5

Output:

[0,0,1,0,1] # 1s only for third and fifth positions

The size of the input list is in the order of millions and the max position is 2.3 billion (the size of the DNA).

JayPeerachai
  • 3,499
  • 3
  • 14
  • 29
Mahdi Moqri
  • 27
  • 1
  • 6

3 Answers3

1

Use numpy.bincount:

a = [3, 5]
b = np.bincount(a) # (0, 0, 0, 1, 0, 1) 

You can ignore the zero-index value by slicing:

b = np.bincount(a)[1:] # (0, 0, 1, 0, 1) 
Jan Christoph Terasa
  • 5,781
  • 24
  • 34
0

Standard python solution:

input_array = [3, 5]
output_array = [1 if i in input_array else 0 for i in range(1,max(input_array)+1)]

Output:

[0, 0, 1, 0, 1]
Abhilash
  • 2,026
  • 17
  • 21
  • Could you please check the time for this? – Mahdi Moqri Aug 23 '20 at 20:56
  • ```python -m timeit -n 1000 "input_array = [i for i in range(0,1000,4)];output_array = [1 if i in input_array else 0 for i in range(1,max(input_array)+1)]"``` 1000 loops, best of 5: 2.31 msec per loop – Abhilash Aug 23 '20 at 21:13
  • I guess you can go with numpy bincount as mentioned by @Jan. For larger arrays, numpy works great. – Abhilash Aug 23 '20 at 21:17
0
arr = [3, 5]
print([1 if num + 1 in arr else 0 for num in range(arr[-1])])

Prints: [0, 0, 1, 0, 1]

Shimon Cohen
  • 489
  • 3
  • 11