1

On my dataset, i have a column as below:

hist = ['A','FAT',nan,'TAH']

Then i should use a loop to obtain the cells which contains an 'A'. Here is my code:

    import numpy as np
    import pandas as pd
    import math
    from numpy import nan

    for rowId in np.arange(dt.shape[0]):
        for hist in np.arange(10):
            if math.isnan(dt.iloc[rowId,hist])!=True:
                if 'A' in dt.iloc[rowId,hist]:
                    print("A found in: "+str(dt.iloc[rowId,hist]))

In the line if 'A' in dt.iloc[rowId,hist] when the value of dt.iloc[rowId,hist] is NAN then it complains with, TypeError: argument of type 'float' is not iterable

so i decided to add if math.isnan(dt.iloc[rowId,hist])!=True: But, also this one leads to the below error:

TypeError: must be real number, not str

How may i find the values which contains 'A'?

Jeff
  • 7,767
  • 28
  • 85
  • 138
  • 1
    "Then i should use a loop " Why do you *need* to use a loop? One of the advantages of using dataframes is that you usually don't need to use loops – DeepSpace Jun 30 '19 at 11:18
  • 1
    Using loops on a dataframe is definitely not something you want to do. Typical case of an [XY problem](https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem). Ask about your problem, not why your "solution" is not working. – Erfan Jun 30 '19 at 11:24
  • [Here's](https://stackoverflow.com/a/55557758/9081267) a good explanation why we dont want to iterate over our dataframe – Erfan Jun 30 '19 at 11:26

1 Answers1

1

Instead of iterating over this, you can just use the .str.contains [pandas-doc] on the column, like:

>>> df
     0
0    A
1  FAT
2  NaN
3  TAH
>>> df[0].str.contains('A')
0    True
1    True
2     NaN
3    True
Name: 0, dtype: object

You can then for example filter or, obtain the indices:

>>> df[df[0].str.contains('A') == True]
     0
0    A
1  FAT
3  TAH
>>> df.index[df[0].str.contains('A') == True]
Int64Index([0, 1, 3], dtype='int64')

or we can use .notna instead of == True:

>>> df[df[0].str.contains('A').notna()]
     0
0    A
1  FAT
3  TAH
>>> df.index[df[0].str.contains('A').notna()]
Int64Index([0, 1, 3], dtype='int64')

or filter in the .contains() like @Erfan says:

>>> df[df[0].str.contains('A', na=False)]
     0
0    A
1  FAT
3  TAH
>>> df.index[df[0].str.contains('A', na=False)]
Int64Index([0, 1, 3], dtype='int64')

So you can print the values with:

for val in df[df[0].str.contains('A') == True][0]:
    print('A found in {}'.format(val))

this gives us:

>>> for val in df[df[0].str.contains('A') == True][0]:
...     print('A found in {}'.format(val))
... 
A found in A
A found in FAT
A found in TAH
Willem Van Onsem
  • 443,496
  • 30
  • 428
  • 555