1

everyone. i hope your happiness. and I need your help for my hapiness

I made similar question less than a day ago. but i stuck in similar error this is the thing i should do data_01 is data frame with 2277 rows, 37 cols. my plan was split data_01 to several data frames (and remove data frames less than 100 rows).

data_01_00<-data_01 #family 2277
data_01_01<-data_01_00 %>% filter(rowSums(data_01_00[,1:39])==1 & data_01_00[,1]==1)
data_01_02<-data_01_00 %>% filter(rowSums(data_01_00[,1:2])==2 & data_01_00[,2]==1)
data_01_03<-data_01_00 %>% filter(rowSums(data_01_00[,1:3])==2 & data_01_00[,3]==1)
data_01_05<-data_01_00 %>% filter(rowSums(data_01_00[,1:5])==2 & data_01_00[,5]==1)
data_01_06<-data_01_00 %>% filter(rowSums(data_01_00[,1:6])==2 & data_01_00[,6]==1)
data_01_08<-data_01_00 %>% filter(rowSums(data_01_00[,1:8])==2 & data_01_00[,8]==1)

based on this pattern i tried this. No for loop because data_01_04 and data_01_07 is removed. so i decieded to use user function.

family<- vector(mode = "list", length = 40)
family[1]<-list(data_01_00)
family[2]<-list(data_01_01)

testfunc<-function(i){
  family[i]<-data_01_00 %>% filter(paste0('rowSums(data_01_00[,1:',i,'])==2 & data_01_00[,',i,']==1'))
}

I faild. if there was nothing wrong, i would write codes

testfunc(3)
...
testfunc(8)

(actually, code should be devided into 39).

what should i do..?

신유철
  • 61
  • 1
  • 1
    It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. `paste()` will only return a character string. You should not attempt to "build" code that way since it will not be evaulated. – MrFlick Feb 23 '22 at 06:19

1 Answers1

0

If you are just filtering by column numbers, I don’t think there’s any need to complicate matters with paste0. You can just use base R for this and it will probably be faster than dplyr anyway. As a note, it’s always much easier to help if you provide sample data. Solutions will match your data that way. Below, I simulate some data for this problem. I've changed i to j as traditionally j refers to columns and i refers to rows.

set.seed(1)
data_01_00 <- matrix(sample(0:1, size = 99, prob = c(0.90, 0.1), replace = TRUE), 
                     nrow = 2277, ncol = 39, byrow = TRUE)


testfunc <- function(j){
  if (j == 1) family <- data_01_00[rowSums(data_01_00[ , 1:39]) == 1 & data_01_00[ , 1] == 1 , ]
    else family <- data_01_00[rowSums(data_01_00[ , 1:j]) == 1 & data_01_00[ , j] == 1 , ]
  return(family)
}

It’s probably easiest to use lapply and collect all your data.frames into a list. I’ve removed 4 & 7 as you’ve done.

x <- 1:39
x <- x[!x %in% c(4, 7)]
mydat <- lapply(x, testfunc)

length(mydat)
#> [1] 37

That way you can easily filter out those with less than 100 rows if you want.

mydat <- lapply(1:length(mydat), function(x) if(nrow(mydat[[x]]) >= 100) mydat[[x]])
mydat <- mydat[lengths(mydat) > 0]

length(mydat)
#> [1] 6

However, you can produce the data.frames separately if that’s what you need.

All this said, you could have used a loop in the same way I've used this function. 4 & 7 not being in the sequence isn't a limitation for the loop. You would just remove 4 & 7 from the sequence and loop over the sequence. It's probably better to use a function anyway for clarity and efficiency.

TrainingPizza
  • 1,090
  • 3
  • 12