0

I am trying to create a data frame that is a subset of the original based on specific results in a column but it keeps excluding some of the data, specifically codes 59960, 59961, 59962.

I have also confirmed that the column includes the identifier I am parsing for using .unique()

Here is my code:

new_df = original_df[(original_df["Course Offering Code"] == 19191)|\
(original_df["Course Offering Code"] == 2201.20215)|\
(original_df["Course Offering Code"] == 2387.2205)|\
(original_df["Course Offering Code"] == 2388.20225)|\
(original_df["Course Offering Code"] == 59960.20211)|\
(original_df["Course Offering Code"] == 59961.20211)|\
(original_df["Course Offering Code"] == 59962.20211)|\
(original_df["Course Offering Code"] == 61199.20211)|\
(original_df["Course Offering Code"] == 61201.20211)|\
(original_df["Course Offering Code"] == 61202.20211)]

thank you!

Kevin
  • 5
  • 3
  • What does `original_df["Course Offering Code"].dtype` return? – BeRT2me Sep 27 '22 at 19:20
  • @BeRT2me it returns dtype('float64') Note: I did shorten the values here for simplicity, so to be more clear i'll update my post – Kevin Sep 27 '22 at 19:35

2 Answers2

1

It is due to float comparisons that are not precise in pandas.

You will have to either round it or use close comparisons. Having said that, it looks like Course offering codes are just codes and might not need to be float64 - because technically a code can be represented by any unique number. Therefore, you can instead change the Course Offering Code column to str and select them instead, where you wont land into these problems.

s510
  • 2,271
  • 11
  • 18
0

Try it like this instead...

codes = [19191, 2201, 2387, 59960, 59961, 59962, 61199, 61201, 61202]
new_df = original_df[original_df['Course Offering Code'].isin(codes)]
BeRT2me
  • 12,699
  • 2
  • 13
  • 31
  • Thanks for the simpler code but remains the same issue. Updated post with full course offering code values in the event these are in someway causing the issue. – Kevin Sep 27 '22 at 19:40
  • Ah, it's a floating point issue. See [Is floating point math broken?](https://stackoverflow.com/questions/588004/is-floating-point-math-broken) – BeRT2me Sep 27 '22 at 19:52