I wanted to understand what is wrong with my syntax when I try to sample data from my dataset which has 5000 rows, I only want to random sample 500 from it.
repex of dataset(xdata):
AccountId Street City State ZipCode CloseFactorPct OpenFactorPct ZipIncome ZipDegree
1 455697 3919 Birkdale Ln Se Olympia WA 98501 0.75 1.40 67060 0.17879866
2 490095 29174 Wagon Rd Agoura Hills CA 91301 0.85 2.50 115125 0.21376952
3 427399 301a Franklin Ave Princeton NJ 8540 0.80 2.25 124954 0.50428200
4 470678 1461 Woodsview Way Macedon NY 14502 0.80 2.50 67780 0.13772373
5 424824 616 Locust Ave Las Animas CO 81054 0.80 2.25 31343 0.02021198
6 437343 13 New Oxford Rd Conway AR 72034 0.80 2.25 51435 0.15904222
TotalOwed
1 0.0
2 185.1
3 1645.0
4 0.0
5 0.0
6 0.0
>
My code:
sample2 <- xdata[sample(nrow(xdata), "500", replace=T), sample(ncol(xdata), 10, replace=T)]
head(sample2)
ZipIncome City ZipIncome.1 TotalOwed Street OpenFactorPct ZipHhIncome.2
14470 41866 Columbus 41866 841.31 792 Dennison Avenue 0.85 41866
23502 55221 El Paso 55221 0.00 12949 Eastbrook Drive Apt 53 0.70 55221
7370 93373 Saddle Brook 93373 570.38 229 S Boulevard 0.70 93373
31627 61830 Choudrant 61830 1156.28 153 Jones Street 0.70 61830
29840 39697 Beckley 39697 0.00 2109 S Kanawha St 0.75 39697
14938 91313 Bradenton 91313 0.00 5007 Serata Dr 0.85 91313
ZipIncome.3 ClosedFactorPct ZipIncome.4
14470 41866 0.95 41866
23502 55221 0.80 55221
7370 93373 1.20 93373
31627 61830 0.80 61830
29840 39697 0.80 39697
14938 91313 1.30 91313
The output I receive gives me 4 duplicates of zipincome. Why does this happen? can someone help me understand if my syntax to pull out a random sample is incorrect or if I require to set.seed()?