Row counting with conditional range

Question

This is my first posted question, so please forgive me if I've entered my attempt incorrectly.

My goal: I am trying to count the number of rows that satisfies a conditional range. The individual array elements represent the time (in seconds) at which a peak occurred. Each row in the input data represent an active/firing cell. I want to calculate the number of active cells (rows) per minute (iterations of 60 seconds).

My data: My input data (T) was imported from txt as an array of integers and had several 0s that I do not want counted in other operations. I have copied a subset of this data below.

My issue: My specific issue is that I don't see anything wrong with my attempt (below), but since the array is fairly small, I'm able to manually check the truthyness of the output. For whatever reason, the True arguments begin on the 'correct' iteration, but then remain True (when they should return false) until another True occurs in the loop. Then the output remains 'correctly' false. This is driving me crazy and I would greatly appreciate any help. The following attempt does not even attempt to sum the rows, but only to return the correct arrangement of True/False arguments.

import numpy as np

T = T.astype(float)
T[T==0] = np.nan
for x in xrange(0, 1321, 60):
    RowSum = np.any(T>x, axis = 1) & np.any(T<x+60, axis = 1)
    print RowSum

Input data:

array([[  111.,   184.,   221.,   344.,   366.,     0.,     0.,     0.,
0.,     0.,     0.],
[  408.,   518.,   972.,  1165.,  1186.,     0.,     0.,     0.,
0.,     0.,     0.],
[  208.,   432.,  1290.,  1321.,     0.,     0.,     0.,     0.,
0.,     0.,     0.],
[  553.,   684.,   713.,   888.,  1012.,  1108.,  1134.,     0.,
0.,     0.,     0.],
[  285.,   552.,  1159.,  1183.,     0.,     0.,     0.,     0.,
0.,     0.,     0.],
[  304.,   812.,   852.,     0.,     0.,     0.,     0.,     0.,
0.,     0.,     0.]])

To better show your data, you can add a sample to the text of your question by copying and pasting the output of `print(T.__repr__())`, or a subset with, say, `print(T[:10].__repr__())`. — YXD, Mar 23 '15 at 21:47
You can probably use [`np.histogram`](http://docs.scipy.org/doc/numpy/reference/generated/numpy.histogram.html) if I understand the question correctly. See [this question](http://stackoverflow.com/questions/6986986/bin-size-in-matplotlib-histogram) on setting the bins. — YXD, Mar 23 '15 at 21:49
Mr. E, thank you for your quick reply. I have edited my post to include a subset of my data. np.histogram will return a histogram of all elements in my matrix from a flattened array. In my case, I need to count the number of rows which contain elements satisfying the conditions T>x and T — bblack140, Mar 23 '15 at 23:39

ali_m · Accepted Answer · 2015-03-24T16:28:03.677

Mr E is right - np.histogram is probably the simplest way to do this:

import numpy as np

# array of spike times
t = np.array([[ 111,  184,  221,  344,  366,    0,    0,    0,    0,    0,    0],
              [ 408,  518,  972, 1165, 1186,    0,    0,    0,    0,    0,    0],
              [ 208,  432, 1290, 1321,    0,    0,    0,    0,    0,    0,    0],
              [ 553,  684,  713,  888, 1012, 1108, 1134,    0,    0,    0,    0],
              [ 285,  552, 1159, 1183,    0,    0,    0,    0,    0,    0,    0],
              [ 304,  812,  852,    0,    0,    0,    0,    0,    0,    0,    0]],
              dtype=np.float)

# 60 second time bins
bins = np.arange(0, t.max() + 60, 60)

# get the total number of spikes in each 60 second bin over all rows (cells). we 
# can treat t as 1D since we don't care which spike times correspond to which
# cell.
counts, edges = np.histogram(t[t != 0], bins)

print(bins)
# [    0.    60.   120.   180.   240.   300.   360.   420.   480.   540.
#    600.   660.   720.   780.   840.   900.   960.  1020.  1080.  1140.
#   1200.  1260.  1320.  1380.]

print(counts)
# [0 1 0 3 1 2 2 1 1 2 0 2 0 1 2 0 2 0 2 4 0 1 1]

So we have zero total spikes between 0 and 60 sec, one spike between 60 and 120 sec etc. By the way, I'd suggest you avoid using T as a variable name - it can cause confusion since in numpy .T is used to get the transpose of an array.

To get the spike counts per cell you'll need to loop over the rows of t:

cell_counts = np.empty((t.shape[0], bins.shape[0] - 1), np.int)
for ii, row in enumerate(t):
    cell_counts[ii], edges = np.histogram(row[row != 0], bins)

print(cell_counts)
# [[0 1 0 2 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
#  [0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 1 0 0 2 0 0 0]
#  [0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1]
#  [0 0 0 0 0 0 0 0 0 1 0 2 0 0 1 0 1 0 2 0 0 0 0]
#  [0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 2 0 0 0]
#  [0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0]]

Update:

If I understand correctly, you want to know the total number of cells that spiked within each 60 sec time interval, regardless of the number of spikes that each cell emitted. A simple way to do this would be to truncate the values in the cell_counts array at 1, then sum along the rows:

total_active_cells = (cell_counts > 0).sum(0)

print(total_active_cells)
# [0 1 0 2 1 2 2 1 1 2 0 1 0 1 2 0 2 0 1 2 0 1 1]

ali_m, Thank you for your reply and suggestions. They have helped me to understand np.histogram a little bit better. However, as you say, Mr. E and your answer tell me that "we have zero total spikes between 0 and 60 sec, one spike between 60 and 120 sec etc." **I am not trying to count the number of spikes.*** I am counting the number of active neurons (rows) during that time period. So the output answer for the provided subset of data should be [0 1 0 2 . . . because although there are 3 spikes between 180 and 240, there are only 2 active rows (one row contains 2 spikes in that time period). — bblack140, Mar 24 '15 at 15:49
In other words, I want to sum your 'cell_counts' array if all non_zero elements were taken to be one. No twos allowed. — bblack140, Mar 24 '15 at 15:52
@bblack140 I see. This is easy to obtain from `cell_counts` (see my update). — ali_m, Mar 24 '15 at 16:28

Row counting with conditional range

1 Answers1

Update: