1

Watching this piece of code in the book:

def split_train_test_by_id(data, test_ratio, id_column, hash=hashlib.md5):
ids = data[id_column]
in_test_set = ids.apply(lambda id_: test_set_check(id_, test_ratio, hash))
return data.loc[~in_test_set], data.loc[in_test_set]

Never saw this loc[~<..>] before. Probably understanding the functionality, however want to be sure. Also is it working only in pandas or python in general?

Henry Ecker
  • 34,399
  • 18
  • 41
  • 57
zzHQzz
  • 301
  • 3
  • 10

1 Answers1

0

I saw some great comments above, but wanted to make sure that it's clear for a beginner. The ~ flips 1s to 0s and 0s to 1s. It is commonly used with pandas to signify not. In your example, ~in_test_set is similar to saying not in_test_set. The advantage to ~ is that it works with a set of values and is not limited to a single value. See the Python wiki on bitwise operators.

Polkaguy6000
  • 1,150
  • 1
  • 8
  • 15