0

I have a column with single elements as well as array-like elements but I don't think it looks like an array:

enter image description here

I need to extract first value from this array-like element. First, I check if it's array. If so I extract first elemnt, if not I return default value. I use this function:

def get_top_element(x):
    if isinstance(x, np.ndarray):
        return x[0]
    return x

df_no_outliers['brand'] = df_no_outliers['brand'].apply(get_top_element)

Following error araises:

IndexError: index 0 is out of bounds for axis 0 with size 0

How can I make it work?

EDIT:

If I convert column to tuple then I get this:

tpl=tuple(df_no_outliers['brand'])
tpl

enter image description here

mustafa00
  • 751
  • 1
  • 7
  • 28
  • "First, I check if it's array. If so I extract first elemnt, if not I return default value." What do you think should be the result, if it's an *empty* array? – Karl Knechtel May 08 '23 at 18:58
  • It's certainly not `np.ndarray` as they would print `array([...])`. They may be lists or strings that look like lists. what do you get when you print `df.tail(5).to_dict()`? – Quang Hoang May 08 '23 at 18:59
  • `df['Bluesky'].str[0]` – mozway May 08 '23 at 18:59
  • @mozway `.str[0]` certainly doesn't give the expected answer for the case `return x`. – Quang Hoang May 08 '23 at 19:02
  • @Quang `pd.Series(['a', ['b', 'c']]).str[0]` gives `pd.Series(['a', 'b'])`, no? Sure it doesn't check specifically for a numpy array, but this might not be needed. – mozway May 08 '23 at 19:03
  • what would it give for `pd.Series(['ab','abc'])`? I'd think OP expects the same series back. – Quang Hoang May 08 '23 at 19:05
  • Fair point, but in this case slicing wouldn't be needed. Feel free to reopen if you want. OP can always use `return x[0] if len(x)>0 else x` to fix the error. – mozway May 08 '23 at 19:07
  • I don't know what's the elemnt type. If it's list or string or what. It looks trange and I can't slice it. – mustafa00 May 08 '23 at 19:09
  • @mozway it doesn't solve the issue. It returns the first character only from each element – mustafa00 May 08 '23 at 19:15
  • I edited my question and added code and result of converting column to `tuple`. It shows that `[...]` is an array. At least I think so – mustafa00 May 08 '23 at 19:31
  • Have you tested `return x[0] if len(x)>0 else x`? – mozway May 08 '23 at 19:32
  • The error is because, somewhere you might have had an empty numpy array `np.array([])`. – Quang Hoang May 08 '23 at 19:32
  • @mozway I answered you before, it returns only the very first character from the element – mustafa00 May 08 '23 at 19:34
  • Then your "strings" are instances of `np.ndarray`? Please provide a reproducible input. – mozway May 08 '23 at 19:40
  • 1
    Given the code and sample data in the OP, this issue is not reproducible: [see code](https://i.stack.imgur.com/vVvHV.png) – Trenton McKinney May 08 '23 at 19:44
  • 1
    @mustafa00, `if isinstance(x, np.ndarray) and x.size>0 :` should solve your *problem*. – Timeless May 08 '23 at 20:47
  • @Timeless it works! Thanks a lot:) I wonder why `x.size` solves the problem? – mustafa00 May 08 '23 at 21:01
  • You're welcome ;). The error you were getting was triggered by the indexing `[0]` of the first empty *NumPy*'s array (*i.e* `np.array([])`) hold by the column `"brand"`. So by adding `x.size>0`, you'll bypass those empty arrays (*i.e avoid indexing*) because [`size`](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.size.html) returns the number of elements in an *NumPy*'s array. – Timeless May 08 '23 at 21:07

0 Answers0