Repeated random sampling and kurtosis on unbalanced sample

Question

I have an unbalanced dataset with people from liberal and conservative background giving rating on an issue (1-7). Would like to see how polarized the issue is.

The sample is heavily skewed towards liberal (70% of the sample). How do I do repeated sampling using R to create a balanced sample (50-50) and calculate kurtosis?

For example, I have total 50 conservatives. How do I randomly sample 50 liberals out of 150 repeatedly?

A sample dataframe below:

  political_ort   rating  
    liberal         1 
    liberal         6 
    conservative    5   
    conservative    3   
    liberal         7  
    liberal         3 
    liberal         1

Does this answer your question? [Sampling from a data.frame while controlling for a proportion \[stratified sampling\]](https://stackoverflow.com/questions/29360799/sampling-from-a-data-frame-while-controlling-for-a-proportion-stratified-sampli) — jared_mamrot, Jan 28 '21 at 23:46
Thanks! Not really. I'm looking to sample the same number of liberals as conservatives. So if there are 10 conservatives, would like to sample 10 from 70 liberals repeatly. — Yvonne, Jan 29 '21 at 01:45

jared_mamrot · Accepted Answer · 2021-01-29T02:35:10.827

What you're describing is termed 'undersampling'. Here is one method using tidyverse functions:

# Load library
library(tidyverse)

# Create some 'test' (fake) data
sample_df <- data_frame(id_number = (1:100),
                        political_ort = c(rep("liberal", 70),
                                          rep("conservative", 30)),
                        ratings = sample(1:7, size = 100, replace = TRUE))

# Take the fake data
undersampled_df <- sample_df %>% 
# Group the data by category (liberal / conservative) to treat them separately
  group_by(political_ort) %>% 
# And randomly sample 30 rows from each category (liberal / conservative)
  sample_n(size = 30, replace = FALSE) %>%
# Because there are only 30 conservatives in total they are all included
# Finally, ungroup the data so it goes back to a 'vanilla' dataframe/tibble
  ungroup()
# You can see the id_numbers aren't in order anymore indicating the sampling was random

There is also the ROSE package that has a function ("ovun.sample") that can do this for you: https://www.rdocumentation.org/packages/ROSE/versions/0.0-3/topics/ovun.sample

Repeated random sampling and kurtosis on unbalanced sample

1 Answers1