Is there way to accomplish the code below with out the for loop?
I’m assigning int values to str based on its index position.
import numpy as np
icing_types=[
"none",
"l-mixed",
"l-rime",
"l-clear",
"m-mixed",
"m-rime",
"m-clear",
]
idx=np.array(icing_types)
#validtime=basetime+index hr
forecast_icing=[
"none",
"none",
"l-rime",
"m-rime"
]
arr=np.array([np.where(ice==idx)for ice in forecast_icing]).flatten()
update:
I put together some benchmarks for two of the answers, both worked pretty well but as @chris mentoned @hpaulj solution was a fair bit faster.
To put into context I am parsing a large text file into a DataFrame
. the file has 186 rows and 144 columns, most of the data is in a numeric format. Except for icing, turbulence, and precip. So I needed a method to convert the str
value to an int
to determine the Delta in a forecast with respect to time.
df = pd.DataFrame()
dfa = np.array(df)
for i, k in enumerate(df.index):
if k == '###UA SECTION###': # skip ua section header
continue
lvl, key = self._make_lvl_key(k) # key
try:
val = dfa[i].astype(float)
A ValueError
exception occurs when the array
cannot be cast to a float
dtype
.
The None
string values are inconsistent throughout the file, so everything that would be considered as None
is set as "none"
.
_get_wxidx
is a simple method that returns the correct index for for string data being processed based on it's key value. keys are generally, turbulence, icing, and precip.
except ValueError: # value error occurs when array values are not numerical
# array of strings -> lowercase, stripped, string values
aos = np.char.strip(np.char.lower(dfa[i].astype(str)))
aos[aos == ''] = 'none'
aos[aos == 'null'] = 'none'
idx = self._get_wxidx(key)
Benchmarks:
In both methods any value greater than 10 is to be assumed as a None
type value, in which case the it is set to a negative value.
Where method:
def _where():
a1 = np.where(aos[:, None] == idx)[1]
a2 = np.where(a1 < 10, a1, -1)
return a2
Vector method:
vget = np.vectorize({b: a for a, b in enumerate(idx)}.get)
def _vector():
a1 = vget(aos)
a2 = np.where(a1 < 10, a1, -1)
return a2
Running benchmarks
The __main__
function calls to parse two separate text file. within the benchmark it was set to run each function 1,000 times. 2,000 times each function and 4,000 total, converting 324,000 rows each with 144 columns of string data into int values.
# benchmark range
_r = 1000
# _where benchmark
t1S = datetime.now()
for i in range(_r):
_where()
t1E = datetime.now()
t1D = (t1E-t1S).total_seconds()
t1time.append(t1D)
# _vector benchmark
t2S = datetime.now()
for i in range(_r):
_vector()
t2E = datetime.now()
t2D = (t2E-t2S).total_seconds()
t2time.append(t2D)
results:
where benchmark
rows processed: 81000
1.245809 seconds
vector benchmark
rows processed: 81000
2.235406 seconds
where benchmark
rows processed: 81000
1.229645 seconds
vector benchmark
rows processed: 81000
2.2235739999999997 seconds