Randomly draw a sample for 2 columns

Question

A well known function for this in Python is random.sample()

However, my dataset consist of multiple columns, and i need the 'lat' and 'lng' coordinates to be sampled. As these two are related, i cannot use the random.sample() separately to get some random lat coordinates + some non corresponding lng coordinates.

What would be the most elegant solution for this?

Perhaps first making a third column, in which i combine lat&lng Then sample Then unmerge?

If so, how should i do this, the fact that both lat and lng values are floats with different lengts doesn't make it easier. Probably by adding a'-' in between?

Aren't you really asking for a random *row* (which contains a lat and a lng)? — Scott Hunter, May 27 '22 at 15:27
Somthing like that, except for the fact that i don't need a single one, but i need to sample a list — Cornelis, May 27 '22 at 15:35

ForceBru · Accepted Answer · 2022-05-27T15:31:55.037

1

Essentially, you're talking about sampling an entire row which has values [lat_i, lng_i]. This leads to a very simple (but perhaps too verbose) solution:

random_row_index = random.randint(0, number_of_rows_in_dataset - 1)
random_row = dataset[randon_row_index, :]

If you have a Pandas dataframe, simply use DataFrame.sample.

edited May 27 '22 at 15:31

answered May 27 '22 at 15:28

ForceBru

43,482
10
63
98

I do use Pandas, df.sample seems to work perfectly! Thanks! What is the difference with "random.sample()" though? isn't the df.sample() random? – Cornelis May 27 '22 at 15:42
1

@Cornelis, `df.sample` is Pandas-specific. You can sample your dataframe in less than 20 characters of code, while with `random.sample` you'd have to fiddle with a random index, like I show in my answer, which could look unnecessarily complicated. But indeed, `random.sample`, `numpy.random.sample` and `df.sample` essentially accomplish the same task, so they're similar. – ForceBru May 27 '22 at 15:51

score 0 · Answer 2 · answered May 27 '22 at 15:32

0

That is what train_test_split is made for: https://realpython.com/train-test-split-python-data/

from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y)

answered May 27 '22 at 15:32

Nohman

444
2
9

Not sure what that does, but it looks interesting... i'll take a look to see how it works. Thanks – Cornelis May 27 '22 at 15:36

Randomly draw a sample for 2 columns

2 Answers2