Is there a numpy function to replace str to int values based on list index

Question

Is there way to accomplish the code below with out the for loop?

I’m assigning int values to str based on its index position.

import numpy as np
icing_types=[
    "none",
    "l-mixed",
    "l-rime",
    "l-clear",
    "m-mixed",
    "m-rime",
    "m-clear",
]
idx=np.array(icing_types)
#validtime=basetime+index hr
forecast_icing=[
    "none", 
    "none", 
    "l-rime",
    "m-rime"
]
arr=np.array([np.where(ice==idx)for ice in forecast_icing]).flatten()

update:

I put together some benchmarks for two of the answers, both worked pretty well but as @chris mentoned @hpaulj solution was a fair bit faster.

To put into context I am parsing a large text file into a DataFrame. the file has 186 rows and 144 columns, most of the data is in a numeric format. Except for icing, turbulence, and precip. So I needed a method to convert the str value to an int to determine the Delta in a forecast with respect to time.

df = pd.DataFrame()
dfa = np.array(df)

for i, k in enumerate(df.index):
    if k == '###UA SECTION###':  # skip ua section header
        continue

    lvl, key = self._make_lvl_key(k)  # key

    try:
        val = dfa[i].astype(float)

A ValueError exception occurs when the array cannot be cast to a float dtype.

The None string values are inconsistent throughout the file, so everything that would be considered as None is set as "none".

_get_wxidx is a simple method that returns the correct index for for string data being processed based on it's key value. keys are generally, turbulence, icing, and precip.

    except ValueError:  # value error occurs when array values are not numerical
        # array of strings -> lowercase, stripped, string values
        aos = np.char.strip(np.char.lower(dfa[i].astype(str)))
        aos[aos == ''] = 'none'
        aos[aos == 'null'] = 'none'

        idx = self._get_wxidx(key)

Benchmarks:

In both methods any value greater than 10 is to be assumed as a None type value, in which case the it is set to a negative value.

Where method:

        def _where():
            a1 = np.where(aos[:, None] == idx)[1]
            a2 = np.where(a1 < 10, a1, -1)
            return a2

Vector method:

        vget = np.vectorize({b: a for a, b in enumerate(idx)}.get)

        def _vector():
            a1 = vget(aos)
            a2 = np.where(a1 < 10, a1, -1)
            return a2

Running benchmarks

The __main__ function calls to parse two separate text file. within the benchmark it was set to run each function 1,000 times. 2,000 times each function and 4,000 total, converting 324,000 rows each with 144 columns of string data into int values.

        # benchmark range
        _r = 1000
        # _where benchmark
        t1S = datetime.now()
        for i in range(_r):
            _where()
        t1E = datetime.now()
        t1D = (t1E-t1S).total_seconds()
        t1time.append(t1D)

        # _vector benchmark
        t2S = datetime.now()
        for i in range(_r):
            _vector()
        t2E = datetime.now()
        t2D = (t2E-t2S).total_seconds()
        t2time.append(t2D)

results:

where benchmark
rows processed: 81000
1.245809 seconds

vector benchmark
rows processed: 81000
2.235406 seconds

where benchmark
rows processed: 81000
1.229645 seconds

vector benchmark
rows processed: 81000
2.2235739999999997 seconds

Chris · Answer 1 · 2021-11-07T21:58:45.993

Borrowed from this answer you can generate a map and vectorize the function. It's worth noting that the other answer here is faster.

import numpy as np
icing_types=[
    "none",
    "l-mixed",
    "l-rime",
    "l-clear",
    "m-mixed",
    "m-rime",
    "m-clear",
]


forecast_icing=np.array([
    "none", 
    "none", 
    "l-rime",
    "m-rime"
])

np.vectorize({b:a for a,b in enumerate(icing_types)}.get)(forecast_icing)

score 0 · Answer 2 · answered Nov 07 '21 at 21:50

There is not, but I am not 100% sure.

Anyway, even if there was, you can get much better performance by using mapping dict (icing_type -> idx):

icing_type_to_idx = {icinig_type: idx for idx, icing_type in enumerate(icing_types)}
arr = np.array([icing_type_to_idx[ice] for ice in forecast_icing])

score 0 · Accepted Answer · answered Nov 08 '21 at 00:29

In [103]: arr = np.array(forecast_icing)
In [104]: idx,arr
Out[104]: 
(array(['none', 'l-mixed', 'l-rime', 'l-clear', 'm-mixed', 'm-rime',
        'm-clear'], dtype='<U7'),
 array(['none', 'none', 'l-rime', 'm-rime'], dtype='<U6'))

We can test the 2 arrays against each other with:

In [105]: arr[:,None]==idx
Out[105]: 
array([[ True, False, False, False, False, False, False],
       [ True, False, False, False, False, False, False],
       [False, False,  True, False, False, False, False],
       [False, False, False, False, False,  True, False]])

The indicies of the True are:

In [106]: np.where(_)
Out[106]: (array([0, 1, 2, 3]), array([0, 0, 2, 5]))

The 2nd array gives the matches of arr in idx:

In [107]: _[1]
Out[107]: array([0, 0, 2, 5])

If you can guarantee a one and only one match, I don't think you need to do anything fancier.

Right I’ll give that a shot. Based on some of the responses, I’ve been rethinking my approach, it may just be easier/faster to use a dict. It’s worth noting that I’m using this method on forecast_icing, forecast_turbulence, and precip_type. The None values produced by the model are consistently inconsistent , some being ‘None’,’NONE’, ‘null, ‘’, … etc. I’ll likely just assign a negative value to the various None strings. — Jason Leaver, Nov 08 '21 at 03:00