0

Example:

row=['12347','Van','18/01/2017']
npvalues = np.array([ ['12345','Bus','23/02/2017'],['12346','Truck','01/07/2017'],['12347','Van','18/01/2017']  ])
np.isin(row, npvalues)

Required output: [True, True, True]

ValueError: array is too big; arr.size * arr.dtype.itemsize is larger than the maximum possible size.

IanS
  • 15,771
  • 9
  • 60
  • 84
Suresh K
  • 71
  • 1
  • 10

1 Answers1

0

Cast the 'row' variable to an np.array instead of a list.

import numpy as np
row=['12347','Van','18/01/2017']
npvalues = np.array([ ['12345','Bus','23/02/2017'],['12346','Truck','01/07/2017'],['12347','Van','18/01/2017']  ])

row
Out[60]: ['12347', 'Van', '18/01/2017']

npvalues
Out[61]: 
array([['12345', 'Bus', '23/02/2017'],
       ['12346', 'Truck', '01/07/2017'],
       ['12347', 'Van', '18/01/2017']],
      dtype='<U10')

# Cast instead
row = np.asarray(row)
np.isin(row, npvalues)
Out[63]: array([ True,  True,  True], dtype=bool)

Note - I was able to run your code as is, and get the answer required.

row=['12347','Van','18/01/2017']
npvalues = np.array([ ['12345','Bus','23/02/2017'],['12346','Truck','01/07/2017'],['12347','Van','18/01/2017']  ])
np.isin(row, npvalues)
Out[64]: array([ True,  True,  True], dtype=bool)

Here are my version infos

import sys
sys.version
Out[71]: '3.6.4 |Anaconda, Inc.| (default, Mar 12 2018, 20:20:50) [MSC v.1900 64 bit (AMD64)]'
np.version.full_version
Out[67]: '1.13.3'
emmet02
  • 932
  • 5
  • 8
  • Thanks for your quick response. My code is working fine when using in a small dataframe. It will raise error if count is larger than 5000 – Suresh K Mar 20 '18 at 10:05
  • When which array has > 5000 elements? Unfortunately as you are using 32bit you will hit memory restrictions earlier. What RAM have you got available for the process? – emmet02 Mar 20 '18 at 10:18
  • npvalues > 5000. I'm using 8GB Ram – Suresh K Mar 20 '18 at 10:21
  • Do you have a 64 bit processor? If so, I would strongly advise uninstalling your 32bit version of python and installing 64bit instead. https://stackoverflow.com/questions/18282867/python-32-bit-memory-limits-on-64bit-windows - You will increase the headroom of your process, but obviously there will still be a limit. – emmet02 Mar 20 '18 at 10:25
  • Is there any other way to quickly check in a dataframe. It is taking 50 sec to complete a 500 records – Suresh K Mar 20 '18 at 11:28
  • There is an .isin() method on the pd.Series object that should be able to help you. If the data is already in a DataFrame, you ought to use the methods available. – emmet02 Mar 20 '18 at 14:55