1

I am converting R code to python. I know the alternative of calling R from python exists, however, I need it converted to python line by line.

I have a line in the R code which says
reps[-a] <- .Machine$integer.max
where reps is a vector and a is another vector containing some indices, eg a = [1, 2, 4]

I wanted to know how to implement this line in python.
From what I understand, it should assign the max value to all the elements in the array where the index does not equal 1,2 or 4.

Thank you.

Moshee
  • 544
  • 1
  • 3
  • 16

3 Answers3

2

The best translation of R's atomic vector/matrix/array is Python's numpy N-D array and .Machine$integer.max would translate to the max int32 value of numpy dtype.

However, numpy arrays do not have an easy index locator and hence a longer route is necessary as shown by @mglison using a boolean mask:

a = np.array([1, 3, 4, 6, 79, 10, 56])    
b = np.array([0, 1, 2])

mask = mask = np.full(a.shape, False)
mask[b] = True
a[~mask] = np.iinfo(np.int32).max

a
# array([         1,          3,          4, 2147483647, 2147483647,
#    2147483647, 2147483647])

Alternatively, Python's pandas Series is an extension of the numpy 1-D array but requires alignment of indexes:

a = pd.Series([1, 3, 4, 6, 79, 10, 56])
b = pd.Series([0, 1, 2])

a[~a.index.isin(b)] = np.iinfo(np.int32).max

# 0             1
# 1             3
# 2             4
# 3    2147483647
# 4    2147483647
# 5    2147483647
# 6    2147483647
# dtype: int64
Parfait
  • 104,375
  • 17
  • 94
  • 125
  • 1
    for the numpy version, why is `a[3] != np.iinfo(np.int32).max` after the masking? – dubbbdan Jul 22 '19 at 19:21
  • @Parfait Hi, I would like for the 6 in a to also be masked just like it is in the series example. Is there anyway to achieve that without using pandas. Thanks – Moshee Jul 22 '19 at 19:37
  • 1
    Whoops! Read too fast. Numpy arrays do not have a `not in` list property. So you will need to use a boolean mask. See edit. – Parfait Jul 22 '19 at 21:02
  • I like the 2013 reference :D! – dubbbdan Jul 22 '19 at 21:10
2

Another option is to create a boolean mask using np.ones with dtype=bool. Then simply mask reps after you have set mask[a] =False:

import numpy as np

reps = np.random.randint(0,10, 20)
a = np.array([1,2,4])

mask = np.ones(reps.shape, dtype=bool)
mask[a] = False

reps[mask] = np.iinfo(np.int32).max

which returns:

array([2147483647,          4,          4, 2147483647,          0,
       2147483647, 2147483647, 2147483647, 2147483647, 2147483647,
       2147483647, 2147483647, 2147483647, 2147483647, 2147483647,
       2147483647, 2147483647, 2147483647, 2147483647, 2147483647])

or a one-liner variation of @akrun's answer:

reps[list(set(range(len(reps))) - set(a))] = np.iinfo(np.int32).max
dubbbdan
  • 2,650
  • 1
  • 25
  • 43
  • Hi, the one liner gives an error saying, list indices must be integers or slices, not list – Moshee Jul 22 '19 at 20:28
  • Weird, I just re-ran it over here and it works fine. how about using [np.r_](https://docs.scipy.org/doc/numpy/reference/generated/numpy.r_.html): `reps[np.r_[list(set(range(len(reps))) - set(a))] ]= np.iinfo(np.int32).max` – dubbbdan Jul 22 '19 at 21:04
1

An option is

for i in set(range(len(a))) - set(b):
    a[i] = 1e5

a
#[1, 3, 4, 100000.0, 100000.0, 100000.0, 100000.0]

data

a = [1, 3, 4, 6, 79, 10, 56]
b = [0, 1, 2]
Community
  • 1
  • 1
akrun
  • 874,273
  • 37
  • 540
  • 662