0

I have a dataframe containing mined comments (and responses) that look like this:

Comment_ID | COMMENT
1.0          foo
1.1          re:foo
1.2          re:foo
2.0          foo

The comment ID indicates whether a comment is toplevel or a response via the number. Even numbers are always top level comments (in this example 1.0 and 2.0) . I now want to extract only top level comments from my dataframe. How can I do this?

One solution I came up with was this

df = df[df['Comment_ID'] == 1.0]

but that only yields me one row. I need something like this (that works)

df = df[df['Comment_ID'] == 1.0:300.0]
thafra
  • 3
  • 1
  • `df[df['Comment_ID'].between(1,300)]` – Quang Hoang Nov 26 '20 at 19:45
  • Do you mean even numbers or whole numbers? Because only 2 is an even number, but 1 and 2 are whole numbers. – flyingdutchman Nov 26 '20 at 19:52
  • and maybe a good question is, if whole numbers, what is whole number, only 1.0 or even 1.0000038 for example? on one decimal both are same – Ruli Nov 26 '20 at 19:54
  • Yeah I got that wrong, I meant whole numbers. The numbers are always whole in the sense "1.0", "45.0", "1845.0". The answer provided by mCoding did the trick. – thafra Nov 27 '20 at 08:58

2 Answers2

0

When there is some operation you want to do to a column that is custom, you can always use .apply. E.g.

import pandas as pd

if __name__ == '__main__':
    df = pd.DataFrame(data={'Comment_ID':[1.0, 1.1, 1.2, 2.0], 'COMMENT':['foo','re:foo','re:foo','foo']})
    print(df)
    mask = df['Comment_ID'].apply(lambda x: str(x).partition('.')[2]) == '0'
    print(df[mask])

prints

   Comment_ID COMMENT
0         1.0     foo
1         1.1  re:foo
2         1.2  re:foo
3         2.0     foo
   Comment_ID COMMENT
0         1.0     foo
3         2.0     foo

I'm assuming here that all your data are of the form number.digit as in your example. If there can be multiple digits you would have to do something slightly different.

mCoding
  • 4,059
  • 1
  • 5
  • 11
0

Alternatively to @mCoding answer you could check whether your value is a whole number by other means e.g. described in this SO question. One good example adapted from your question is:

df[df["Comment_ID"] % 1 == 0]
flyingdutchman
  • 1,197
  • 11
  • 17