How to drop elements from a series by using a Pandas for loop index as the index parameter for the drop function?

Question

I am attempting to run a loop that filters certain elements based on a condition and removes those that match, as shown below:

for index, value in enumerate(some_dataset.iloc):
if min(some_dataset.iloc[index]) >= some_dataset.iloc[0].values[index]:
    dataset_filtered = some_dataset.drop(index=index)

However, the value being passed to the index parameter in the variable index does not seem to behave as an integer. Instead, I receive the following error for the first value that attempts to be dropped:

KeyError: '[1] not found in axis'

Thinking it was a Series element, I attempted to cast it as an integer by setting index = index.astype(int) in the parameters for the drop() function, but in this case, it does seem to behave as an integer, producing the following error message:

AttributeError: 'int' object has no attribute 'astype'

To solve this problem, I looked at Anton Protopopov's answer to this question asked by jjjayn, but it did not help in my situation as specific elements were referenced in place of an iterating index.

For context, the if statement is in place to filter out any samples whose lowest values are at the 0th index (thus, where the min() value of a sample transect is equal to the value at index 0. Essentially, it would tell me that values in the sample only grow larger for increasing x, which here is wavelength. When I print a table to see which samples this applies to, the results are what I expect (100 nm wavelengths are the 0th index):

Sample     Value (100 nm)  Value (minima)  Min (λ)   

#2         0.0050          0.0050          100          

#3         0.0060          0.0060          100          

#14        0.0025          0.0025          100

...

So, with these results printed, I don't think the condition is the issue. Indeed, the first index that should be getting dropped is also one that I'd expect to be dropped -- sample 2, which corresponds to [1], is getting passed, but I think the brackets are being passed along with it (at least, that's my guess). So in sum, the issue is that a single-element list/series [n] is being passed to the index parameter instead of the integer, n, which is what I want.

ifly6 · Answer 1 · 2023-02-17T02:36:56.777

1

Answer fully rewritten due to new information. See diff for previous versions.

I reproduced your error. I had initially thought it was something related to an indexing error, which the following code would have forced to occur.

df = pd.DataFrame({'A': range(0, 3)})
for i, r in enumerate(df.iloc):
    print(i, type(r), df.iloc[0].values[i])

This code will throw an IndexError, complaining that the value in i is greater than the number of columns present. But the error that you report, a KeyError, only occurs if you try something like this:

>>> df.drop(index=15)
KeyError: '[15] not found in axis'

The reason for this is because when you drop with the index keyword, it is not dropping based on your iloc indexer (ie 0 ... n) but rather on the standard loc indexer, which can be in arbitrary order and have missing values etc. The underlying reason for why the integer 15 is turned into [15] is because it is automatically wrapped into a list by the error print line: raise KeyError(f'{ list(labels[mask]) } not found in axis').

Use only one form of indexer. In this instance, I would use for i, row in df.iterrows() rather than enumerating over the df.iloc property.

edited Feb 17 '23 at 02:36

answered Feb 17 '23 at 01:25

ifly6

5,003
2
24
47

Thank you for the response! `some_dataset.iloc[0].values[index]` isn't a mistake (at least, I don't think it is) -- I am trying to filter out any samples whose lowest values are at the 0th index. When I print a table to see which samples this apply to, the results are what I expect. So, I don't think the condition is the issue. Indeed, the first index that should be getting dropped is also one that I'd expect to be dropped -- but `[n]` is being passed to index instead of an integer, `n`. – ttoshiro Feb 17 '23 at 01:54
1

Can you edit your question to include what you're trying to do and a short example of it? – ifly6 Feb 17 '23 at 01:57
Absolutely! I've added it now. – ttoshiro Feb 17 '23 at 02:28
@ttoshiro Major changes to my answer. It relates again to indexing. Check `set(range(0, len(some_dataset)) == set(some_dataset.index.values)`. – ifly6 Feb 17 '23 at 02:38

How to drop elements from a series by using a Pandas for loop index as the index parameter for the drop function?

1 Answers1