I have a very large data set gotten from twitter. I am trying to figure out how to do the equivalent of python filtering like the below in numpy. The environment is the python interpreter
>>tweets = [['buhari si good'], ['atiku is great'], ['buhari nfd sdfa atiku'],
['is nice man that buhari']]
>>>filter(lambda x: 'buhari' in x[0].lower(), tweets)
[['buhari si good'], ['buhari nfd sdfa atiku'], ['is nice man that buhari']]
I tried boolean indexing like the below, but the array turned up empty
>>>tweet_arr = np.array([['buhari si good'], ['atiku is great'], ['buhari nfd sdfa atiku'], ['is nice man that buhari']])
>>>flat_tweets = tweet_arr[:, 0]
>>>flat_tweets
array(['buhari si good', 'atiku is great', 'buhari nfd sdfa atiku',
'is nice man that buhari'], dtype='|S23')
>>>flat_tweets['buhari' in flat_tweets]
array([], shape=(0, 4), dtype='|S23')
I would like to know how to filter strings in a numpy array, the way I was easily able to filter even numbers here
>>> arr = np.arange(15).reshape((15,1))
>>>arr
array([[ 0],
[ 1],
[ 2],
[ 3],
[ 4],
[ 5],
[ 6],
[ 7],
[ 8],
[ 9],
[10],
[11],
[12],
[13],
[14]])
>>>arr[:][arr % 2 == 0]
array([ 0, 2, 4, 6, 8, 10, 12, 14])
Thanks