1
Year Score 1 Score 2
2012 34 45
2012 41 46
2013 31 44
2013 44 33
2014 35 56
2014 42 21

I wrote this but it gives me the final year only, I am a newbie and could not find the similar example as my case, can someone help me?

chi
  • 13
  • 4
  • Your variable `abc` is overwritten each pass through the loop so only the final result is returned. – dcarlson Aug 17 '22 at 19:25
  • so how should I modify it? can u help me @dcarlson – chi Aug 17 '22 at 19:27
  • I think you could do `library(dplyr); newdf %>% group_by(Year) %>% mutate(across(2:3), ~bestNormalize(., standardize=TRUE, quiet = TRUE)` but I'm not familiar with the `bestNormalize` function or what it expects for input or what it outputs. – Jon Spring Aug 17 '22 at 19:40
  • This question seems similar, btw: https://stackoverflow.com/questions/62948741/data-transformation-with-the-package-bestnormalize-on-a-list-with-multiple-dat. Does that address your question? – Jon Spring Aug 17 '22 at 19:41
  • This is a common problem when using for loops. You are not accumulating intermediate results. My suggestion is to NOT use a for loop. Can't show you as you don't supply any data Use purrr or apply. – John Garland Aug 17 '22 at 19:43

2 Answers2

0

If you want to use a loop, you will need to define abc as a matrix or data.frame and index it to store each set of results. It would be simpler to just use lapply and sapply. I can't test this with bestNormalize because it does not work with the sample sizes in your example. First provide reproducible data rather than a table using dput(newdf):

newdf <- structure(list(Year = c(2012L, 2012L, 2013L, 2013L, 2014L, 2014L
), Score1 = c(34L, 41L, 31L, 44L, 35L, 42L), Score2 = c(45L, 
46L, 44L, 33L, 56L, 21L)), class = "data.frame", row.names = c(NA, 
-6L))

Then split into years:

df.splt <- split(newdf, newdf$Year)

Then use lapply:

df.lst <- lapply(df.splt, function(x) sapply(x[, -1], scale, center=FALSE, scale=TRUE))
df.lst
# $`2012`
#         Score1    Score2
# [1,] 0.6383359 0.6992942
# [2,] 0.7697580 0.7148340
# 
# $`2013`
#         Score1 Score2
# [1,] 0.5759535    0.8
# [2,] 0.8174824    0.6
# 
# $`2014`
#         Score1    Score2
# [1,] 0.6401844 0.9363292
# [2,] 0.7682213 0.3511234

The object df.lst is a list containing matrices for each set of results.

dcarlson
  • 10,936
  • 2
  • 15
  • 18
0

If you want to do it by a for loop, you can do it by a for loop.

Taking the values from @dcarlson

newdf <- structure(list(Year = c(2012L, 2012L, 2013L, 2013L, 2014L, 2014L),
Score1 = c(34L, 41L, 31L, 44L, 35L, 42L), 
Score2 = c(45L, 46L, 44L, 33L, 56L, 21L)), class = "data.frame", 
row.names = c(NA, -6L))

I am not familiar with bestNormalize so I will just add the three values per row. The important thing is, that you need some place to store your values in and that should be a list as in

result <- list()

Now we can run a loop and append to that list whatever we have calculated:

for (i in 1:3){
  cat(i);cat(" - processing year ");cat(i+2011);cat("\n") # FYI
  tmp = newdf[newdf$Year==i+2011,]
  abc = sum(tmp[1,1], tmp[1,2], tmp[1,3]) # replace by your function
  result <- append(result, abc)  # accumulating results in a list
}

print(result)
str(result)

Because I just added three numbers, the result per year is just a number so in my case the result is just a list of three sums.

You may want to throw in a call to names so that you'll remember, which year made which list entry:

result <- list()
for (i in 1:3){
  tmp = newdf[newdf$Year==i+2011,]
  abc = sum(tmp[1,1], tmp[1,2], tmp[1,3]) #replace by your function
  names(abc) <- 2011+i
  result <- append(result, abc)  #accumulating results with names in a list
}

print(result)
str(result)
Bernhard
  • 4,272
  • 1
  • 13
  • 23
  • Glad, it was helpful. Further down your R road give `tapply` and `by` a chance. But as long as efficiency is not of the essence, you can always rely on your `for` skills. – Bernhard Aug 17 '22 at 20:43
  • what I realized, the loop does not apply the function within the group, I mean I want is to first filter the year, 2012, then apply the function for each column and then pass to next year, 2013, so on, the loop apply the function for the whole data set 2012-2014) doest not filter :( @Bernhard – chi Aug 17 '22 at 21:18
  • Put a `print(tmp)` anywhere into the loop and you will find, that `tmp` is a "by year" filtered version of `newdf`. So yes, it filters. All three `result` entries are different, so they stem from different subsets of data. So yes, it filters. If you think it does not, you really should explain why you think that. – Bernhard Aug 18 '22 at 08:06
  • because I checked it with the function which does not filter year and does the normalization column based for all years once, so it gives me the same result as for loop does, I should have different outcomes since I am filtering the years and the number of data point decreases, hence it needs to produce different results, but it is exactly same as one that does not filter, that is why I thought it does not filter. @Bernhard – chi Aug 19 '22 at 02:31