converting from R in python

Question

I am converting R code to python. I know the alternative of calling R from python exists, however, I need it converted to python line by line.

I have a line in the R code which says
reps[-a] <- .Machine$integer.max
where reps is a vector and a is another vector containing some indices, eg a = [1, 2, 4]

I wanted to know how to implement this line in python.
From what I understand, it should assign the max value to all the elements in the array where the index does not equal 1,2 or 4.

Thank you.

If it is a column, then you may need to subset `df1.column.isin([1,2,4])` — akrun, Jul 22 '19 at 18:32
When I run the code below, I expect to get an array that is [1, 3, 4, 0, 0, 0, 0]
Instead I get the following output.
`a = [1, 3, 4, 6, 79, 10, 56]
b = [0, 1, 2]
for i in b:
a[~i] = 0
a
[1, 3, 4, 6, 0, 0, 0] `
Want to know why 6 is also being printed — Moshee, Jul 22 '19 at 18:47

Parfait · Accepted Answer · 2019-07-22T21:01:19.113

2

The best translation of R's atomic vector/matrix/array is Python's numpy N-D array and .Machine$integer.max would translate to the max int32 value of numpy dtype.

However, numpy arrays do not have an easy index locator and hence a longer route is necessary as shown by @mglison using a boolean mask:

a = np.array([1, 3, 4, 6, 79, 10, 56])    
b = np.array([0, 1, 2])

mask = mask = np.full(a.shape, False)
mask[b] = True
a[~mask] = np.iinfo(np.int32).max

a
# array([         1,          3,          4, 2147483647, 2147483647,
#    2147483647, 2147483647])

Alternatively, Python's pandas Series is an extension of the numpy 1-D array but requires alignment of indexes:

a = pd.Series([1, 3, 4, 6, 79, 10, 56])
b = pd.Series([0, 1, 2])

a[~a.index.isin(b)] = np.iinfo(np.int32).max

# 0             1
# 1             3
# 2             4
# 3    2147483647
# 4    2147483647
# 5    2147483647
# 6    2147483647
# dtype: int64

edited Jul 22 '19 at 21:01

answered Jul 22 '19 at 19:00

Parfait

104,375
17
94
125

1

for the numpy version, why is `a[3] != np.iinfo(np.int32).max` after the masking? – dubbbdan Jul 22 '19 at 19:21
@Parfait Hi, I would like for the 6 in a to also be masked just like it is in the series example. Is there anyway to achieve that without using pandas. Thanks – Moshee Jul 22 '19 at 19:37
1

Whoops! Read too fast. Numpy arrays do not have a `not in` list property. So you will need to use a boolean mask. See edit. – Parfait Jul 22 '19 at 21:02
I like the 2013 reference :D! – dubbbdan Jul 22 '19 at 21:10

dubbbdan · Answer 2 · 2019-07-22T20:04:24.183

2

Another option is to create a boolean mask using np.ones with dtype=bool. Then simply mask reps after you have set mask[a] =False:

import numpy as np

reps = np.random.randint(0,10, 20)
a = np.array([1,2,4])

mask = np.ones(reps.shape, dtype=bool)
mask[a] = False

reps[mask] = np.iinfo(np.int32).max

which returns:

array([2147483647,          4,          4, 2147483647,          0,
       2147483647, 2147483647, 2147483647, 2147483647, 2147483647,
       2147483647, 2147483647, 2147483647, 2147483647, 2147483647,
       2147483647, 2147483647, 2147483647, 2147483647, 2147483647])

or a one-liner variation of @akrun's answer:

reps[list(set(range(len(reps))) - set(a))] = np.iinfo(np.int32).max

edited Jul 22 '19 at 20:04

answered Jul 22 '19 at 19:16

dubbbdan

2,650
1
25
43

Hi, the one liner gives an error saying, list indices must be integers or slices, not list – Moshee Jul 22 '19 at 20:28
Weird, I just re-ran it over here and it works fine. how about using [np.r_](https://docs.scipy.org/doc/numpy/reference/generated/numpy.r_.html): `reps[np.r_[list(set(range(len(reps))) - set(a))] ]= np.iinfo(np.int32).max` – dubbbdan Jul 22 '19 at 21:04

score 1 · Answer 3 · edited Jun 20 '20 at 09:12

1

An option is

for i in set(range(len(a))) - set(b):
    a[i] = 1e5

a
#[1, 3, 4, 100000.0, 100000.0, 100000.0, 100000.0]

data

a = [1, 3, 4, 6, 79, 10, 56]
b = [0, 1, 2]

edited Jun 20 '20 at 09:12

Community

1
1

answered Jul 22 '19 at 18:56

akrun

874,273
37
540
662

converting from R in python

3 Answers3

data