0

I am currently working in R and this is a samople of a task i was assigned:

POINA[1] <-sum(as.numeric(ifelse(  (datos1$c_res==1 | datos1$c_res==3) & (datos1$r_def==0) & (datos1$eda>=15 & datos1$eda<= 98) & datos1$emp_ppal==1 & datos1$ambito1 != 1 ,1,0)))
POINA[2] <-sum(as.numeric(ifelse(  (datos2$c_res==1 | datos2$c_res==3) & (datos2$r_def==0) & (datos2$eda>=15 & datos2$eda<= 98) & datos2$emp_ppal==1 & datos2$ambito1 != 1 ,1,0)))
POINA[3] <-sum(as.numeric(ifelse(  (datos3$c_res==1 | datos3$c_res==3) & (datos3$r_def==0) & (datos3$eda>=15 & datos3$eda<= 98) & datos3$emp_ppal==1 & datos3$ambito1 != 1 ,1,0)))
POINA[4] <-sum(as.numeric(ifelse(  (datos4$c_res==1 | datos4$c_res==3) & (datos4$r_def==0) & (datos4$eda>=15 & datos4$eda<= 98) & datos4$emp_ppal==1 & datos4$ambito1 != 1 ,1,0)))
POINA[5] <-sum(as.numeric(ifelse(  (datos5$c_res==1 | datos5$c_res==3) & (datos5$r_def==0) & (datos5$eda>=15 & datos5$eda<= 98) & datos5$emp_ppal==1 & datos5$ambito1 != 1 ,1,0)))
POINA[6] <-sum(as.numeric(ifelse(  (datos6$c_res==1 | datos6$c_res==3) & (datos6$r_def==0) & (datos6$eda>=15 & datos6$eda<= 98) & datos6$emp_ppal==1 & datos6$ambito1 != 1 ,1,0)))
POINA[7] <-sum(as.numeric(ifelse(  (datos7$c_res==1 | datos7$c_res==3) & (datos7$r_def==0) & (datos7$eda>=15 & datos7$eda<= 98) & datos7$emp_ppal==1 & datos7$ambito1 != 1 ,1,0)))
POINA[8] <-sum(as.numeric(ifelse(  (datos8$c_res==1 | datos8$c_res==3) & (datos8$r_def==0) & (datos8$eda>=15 & datos8$eda<= 98) & datos8$emp_ppal==1 & datos8$ambito1 != 1 ,1,0)))
POINA[9] <-sum(as.numeric(ifelse(  (datos9$c_res==1 | datos9$c_res==3) & (datos9$r_def==0) & (datos9$eda>=15 & datos9$eda<= 98) & datos9$emp_ppal==1 & datos9$ambito1 != 1 ,1,0)))
POINA[10] <-sum(as.numeric(ifelse(  (datos10$c_res==1 | datos10$c_res==3) & (datos10$r_def==0) & (datos10$eda>=15 & datos10$eda<= 98) & datos10$emp_ppal==1 & datos10$ambito1 != 1 ,1,0)))

I have several dataframes, that for sake of simplicity, are named "datos1".. "datos120". This dataframes are the results of telephone polls. Each dataframe contains different individuals and each poll corresponds to a specific week in the year. POINA[i] is a numeric vector where each entry is the total sum of surveyed people who fit the criterion specified above.

As can be seen, the criteria remains the same every week but, since each week is a diferent frame datos[i] changes for every POINA[i].

Is there a way such that i dont have to write the 120 weeks one by one?

I have tried doing it manually but there are just to many cases so, any help in making this more efficient would be deeply apreciated

  • 1
    It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Jan 09 '23 at 19:20
  • 4
    Its usually better in these cases if you never created all those variables in the first place. You usually want a [list of data.frames](https://stackoverflow.com/questions/17499013/how-do-i-make-a-list-of-data-frames) so you can apply functions over them. It's generally bad practice to have a bunch of global variables with indexes in their name. It makes it much hard to work with in R. So how did you create all those values in the first place? – MrFlick Jan 09 '23 at 19:37
  • Ok, if i understand this correctly, I should make a list of length 120 where every entry is one dataframe e.g. df_list. to access "datos1" i would use df_list[1]. But after that, how do i work with the variables in this accessed dataframe? @MrFlick – Fernando Torrero Jan 09 '23 at 21:24
  • I wonder if it would make more sense to use `todos_datos <- dplyr::bind_rows(sem01 = datos1, sem02 = datos2, .... .id = "sem")` so that all your data can be in one data frame, with a column `sem` denoting the week. Then you could do one operation on all the data at once, perhaps after `group_by(sem)` if the calculation for each week should be separate. – Jon Spring Jan 09 '23 at 22:22
  • @JonSpring i am not entirely sure if this will work but i will give it a try! sounds promising! tyvm. – Fernando Torrero Jan 09 '23 at 23:22
  • Please provide enough code so others can better understand or reproduce the problem. – Community Jan 10 '23 at 13:27
  • Actually you need `df_list[[1]]` with double brackets to extract the data.frames. So you just use `df_list[[1]]` rather than `datos1`. Then you can use `lapply()` to map any function you want over the list. Your summary function could be rewritten as `function(x) with(x, sum(as.numeric(ifelse( (c_res==1 | c_res==3) & (r_def==0) & (eda>=15 & eda<= 98) & emp_ppal==1 & ambito1 != 1 ,1,0))))` for use with lapply. – MrFlick Jan 10 '23 at 14:19

0 Answers0