I have a data file (temp.dat) that consists of 3 columns and ~20k rows. It looks like this:
0 1 100.00
0 2 100.00
0 3 100.00
...
1 10 100.00
1 11 100.00
1 12 100.00
1 13 100.00
1 14 100.00
1 15 100.00
1 16 100.00
1 17 100.00
...
0 10 100.00
0 11 100.00
0 12 100.00
...
I would like to count the number of rows that satisfy the following criteria in the code. I tried map and list comprehension but both seem incredible slow. List comprehension is about a minute faster.
data = np.genfromtxt('temp.dat')
base1, base2, pct = data[:,0], data[:,1], data[:,2]
expected_count = 10000
BASE_NAME = []
for x in range(0,36):
count1 = sum(map(lambda base1 : base1 == x, base1))
count2 = sum(map(lambda base2 : base2 == x, base2))
total_count = count1 + count2
if total_count == expected_count:
base_num = x
BASE_NAME.append(base_num)
total_base_name = len(BASE_NAME)
print (total_base_name)
For list comprehension, the syntax becomes:
count1 = sum([base1 == x for base1 in base1])
count2 = sum([base2 == x for base2 in base2])