Questions tagged [rsample]

16 questions
3
votes
2 answers

Bootstrap resampling and tidy regression models with grouped/nested data

I am trying to estimate regression slopes and their confidence intervals using bootstrapping. I would like to do it for grouped data. I was following the example at this website (https://www.tidymodels.org/learn/statistics/bootstrap/), but I…
D Kincaid
  • 167
  • 1
  • 13
2
votes
1 answer

How to speed up the tidymodels bootstrapping with parallelization

I have the following code, that performs bootstrapping and calculates the confidence interval. library(resample) ibrary(broom) library(dplyr) library(purrr) library(tibble) lm_est <- function(split, ...) { lm(mpg ~ disp + hp, data =…
littleworth
  • 4,781
  • 6
  • 42
  • 76
1
vote
1 answer

Using Yardstick to calculate RMSE for aggregate of predictions per group

Sometimes I don't want to assess my models on their performance on predicting single observations, but rather I want to assess how a model performs for predictions in aggregate for groups. The group resampling tools in rsample, like group_vfold_cv,…
1
vote
1 answer

calculating bootstrap resampling for grouped variables

I have the following dataset to calculate standardized effect size for Soil_N and Soil_P for which I used the code below for each replicate. df <- tibble( Soilwater = rep(rep(c("optimal", "reduced"), each = 5), times = 2), Diversity =…
Amit
  • 37
  • 5
1
vote
1 answer

Does rsample::bootstraps store data rather than just row indices?

I'm trying to understand why the rsample::bootstraps function apparently stores the entire data set for each bootstrap sample. I was expecting the function would just store the dataset once, along with the bootstrap indices for each resample. In the…
Robert McDonald
  • 1,250
  • 1
  • 12
  • 20
1
vote
1 answer

Unnesting deep lists after applying the rolling_origin function from the rsample package

I have some data which looks like: head: dfID date group groupValues 1 df1 2020-03-01 grp1 0.175 2 df1 2020-03-01 grp2 0.150 3 df1 2020-03-01 grp3 0.0509 tail: dfID date …
user113156
  • 6,761
  • 5
  • 35
  • 81
0
votes
0 answers

Cannot Extract Information from glm model using tidy function from rsample package

I have been foll0wing the logistic regression chapter in Hands on Programing with R. As I started all the codes were working fine but then I retarted my R session and when I run this code tidy(model1) it throws this error message. `Error in…
0
votes
1 answer

Compute Gini Index on a nested/rsplit object

I used rsample::bootstraps function to create a nested object just as follows : Sampled_Data=bootstraps(credit_data,times = 2,strata="Home",apparent = TRUE) What I get is as follows : splits id
WalliYo_
  • 173
  • 7
0
votes
1 answer

Match Each Winner with a Unique Prize

In a contest, each winner and prize is assigned a random integer [1, 9] called a "ticket" number and a unique "ID" number [1111, 9999]. Each winner receives a unique prize from a limited stock of prizes based on the winner's ticket number…
Tavaro Evanis
  • 180
  • 1
  • 11
0
votes
1 answer

Python how to convert monthly employment data into annual, csv, panda

I've been stuck on this problem for two days. Below is the csv file. df = pd.read_csv('/14100017.csv') df = pd.DataFrame(data) df.head() df_year = df.groupby('REF_DATE')['REF_DATE'].count() print(df_year) This is my code. Could you please tell me…
0
votes
1 answer

Select a proportionate stratified random sample, where stratification is based on sites and gender

I have three IDBs and this is the number of people registered from each Site female Male Total IDB_A 46 14 60 IDB_B 17 23 40 IDB_C 79 21 100 Total 142 58 200 And this is the sample I want to…
Saed Jama
  • 1
  • 1
0
votes
1 answer

How does gtsummary produce confidence intervals and standard error statistics for glm models? (Code Examples Included)

Want to preface this with heaps of appreciate for gtsummary -- wonderful package. After using tidymodels, GLM, and gtsummary for a while, I've been trying to understand gtsummary's computations for GLM model performance and confidence intervals. Can…
Triage
  • 21
  • 1
  • 3
0
votes
0 answers

set.seed() doesn't create identical outputs across different .rmd files?

I have two .Rmd files that reference the same dataset, but when I use set.seed(), I get different outputs: library(tidymodels) # load data and setup data("ames") ames_mod <- ames %>% select(First_Flr_SF, Sale_Price) %>% …
Mark Rieke
  • 306
  • 3
  • 13
0
votes
1 answer

How to propotionally split data using initial_split r

I would like to proportionally split the data I have. For example, I have 100 rows and I want to randomly sample 1 row every two rows. Using tidymodels rsample I assumed I would do the below. dat <- as_tibble(seq(1:100)) split <- inital_split(dat,…
S_Gill
  • 27
  • 3
0
votes
2 answers

Block Bootstrapping using Tidymodels

I have a monthly (Jan - Dec) data set for weather and crop yield. This data is collected for multiple years (2002 - 2019). My aim is to obtain bootstrapped slope coefficient of the affect of temperature in each month on yield gap. In bootstrapping,…
1
2