Removing nan from an string array within a dictionary

Question

I have a problem removing nan from an array within a dictionary. Here's the data that I'm working with:

{'test1': array(['a','b','c','d','e','f',nan,'g '], dtype=object)}

I have tried this code:

test_dict = {key: values for key, values in sdg_keywords_dict.items() if not isnan(values)}

Unfortunately, it returns only size-1 arrays can be converted to Python scalars.

Think about how to write the code to process a single array (see the linked duplicate), and **then** write the comprehension to process all the dict values. — Karl Knechtel, Jul 23 '22 at 17:54
Duplicates don't have to be exact. The `x == x` based approaches in the linked duplicate will work; to use `np.isnan`, it is only necessary to add a type check first. Alternately, we may decide that we simply only want to keep the strings. But the point is to show the general technique of masking and slicing the source array. If the question isn't a duplicate, then it is ill posed (needs more focus) because it conflates two tasks: iterating over the dict values, and processing a single value. — Karl Knechtel, Jul 23 '22 at 18:06
Like I just finished explaining: 1) it only requires adding logic to check the type first; 2) you can instead use something like `v[v==v]` to process a given dict value `v`, and there is at least one answer there showing that approach. — Karl Knechtel, Jul 23 '22 at 18:09

constantstranger · Accepted Answer · 2022-07-23T18:19:20.477

Here is a simple way to do what you're asking:

test_dict = {key: np.array([v for v in values if v is not np.nan]) for key, values in sdg_keywords_dict.items()}

Output:

{'test1': array(['a', 'b', 'c', 'd', 'e', 'f', 'g '], dtype=object)}

Note that if your values were numeric, you could use np.isnan() for vectorized boolean indexing like this:

sdg_keywords_dict = {'test1': np.array([1,2,3,4,5,6,np.nan,7])}
test_dict = {key: values[~np.isnan(values)] for key, values in sdg_keywords_dict.items()}

Input: {'test1': array([ 1., 2., 3., 4., 5., 6., nan, 7.])}

Output: {'test1': array([1., 2., 3., 4., 5., 6., 7.])}

However, the presence of elements of type str in your array means that np.isnan() will raise an exception:

TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

Removing nan from an string array within a dictionary

1 Answers1