4

A certain dataframe, has a similar break with the one below:

import pandas as pd

df = pd.DataFrame({'name': ['John', 'Elvis', 'Gerrard', 'Pitty'],
              'age': [22,23,24,25],
              'document': [111,222,333,4444]})

How can I make a filter to return only the rows where the values ​​in the document column are only 3 digits?

Costa.Gustavo
  • 849
  • 10
  • 21
  • 4
    I usually do not care about the accept question op picked , however, you should not using apply in this situation , since it will slowing down the whole process https://stackoverflow.com/questions/54432583/when-should-i-ever-want-to-use-pandas-apply-in-my-code – BENY May 31 '19 at 21:35

4 Answers4

5

log

df.query('2 <= log10(document) < 3')

      name  age  document
0     John   22       111
1    Elvis   23       222
2  Gerrard   24       333

Expanded example

df = pd.DataFrame({
    'name': ['John', 'Elvis', 'Gerrard', 'Pitty'],
    'age': [22, 23, 24, 25],
    'document': [11, 222, 999, 1000]
})

df

      name  age  document
0     John   22        11  # 2 digit number
1    Elvis   23       222  # 3 digit number
2  Gerrard   24       999  # 3 digit number | edge case
3    Pitty   25      1000  # 4 digit number | edge case

Let's get only 3 digit numbers

df.query('2 <= log10(document) < 3')

      name  age  document
1    Elvis   23       222
2  Gerrard   24       999
piRSquared
  • 285,575
  • 57
  • 475
  • 624
3

Try // which will return the divisor of 1000, since 3 digit so the divisor should be 0

df[df.document//1000==0]
Out[474]: 
      name  age  document
0     John   22       111
1    Elvis   23       222
2  Gerrard   24       333

We convert to str then count the len

df[df.document.astype(str).str.len().eq(3)]
Out[476]: 
      name  age  document
0     John   22       111
1    Elvis   23       222
2  Gerrard   24       333

Update

df[df.document.astype(str).str.split('.').str[0].str.len().eq(3)]
BENY
  • 317,841
  • 20
  • 164
  • 234
1
df[df.document.apply(lambda x: len(str(x)) == 3)]
TYZ
  • 8,466
  • 5
  • 29
  • 60
0

This will work for positive and negative numbers in the document column:

df[df['document'].abs().astype(str).str.len() == 3]
jeschwar
  • 1,286
  • 7
  • 10