Issue with a loop: mean and min

Question

I'm implementing a loop in R. Let me simplify things as much as I can.

Suppose I have

#    country year     key potential
# 1      FRA 2010 FRA2010         0
# 2      FRA 2011 FRA2011         0
# 3      FRA 2012 FRA2012         0
# 4      FRA 2013 FRA2013         1
# 5      ITA 2010 ITA2010         1
# 6      ITA 2011 ITA2011         1
# 7      ITA 2012 ITA2012         0
# 8      ITA 2013 ITA2013         1
# 9      USA 2010 USA2010         0
# 10     USA 2011 USA2011         0
# 11     USA 2012 USA2012         1
# 12     USA 2013 USA2013         1

Then, I take the unique values satisfying potential=1

unique <- unique(df$key[df$potential == 1])

Then, I want to have the mean year for each country such that potential == 1. I wanna have the min year by country where potential == 1 as well.

That's my attempt:

for (i in unique) {
mean_year <- mean(df$year[df$key == i], na.rm = TRUE)
date ,- min(df$year[df$key == i], na.rm = TRUE)
}

The loop returns one value per mean_year and date, respectively. Instead, it should return one value per each country for both mean_year and date.

For mean_year I should have: 2013 for FRA, 2011.33 for ITA, and 2012.5 for USA.

The same reasoning should occur for date.

data

df <- structure(list(country = c("FRA", "FRA", "FRA", "FRA", "ITA", 
"ITA", "ITA", "ITA", "USA", "USA", "USA", "USA"), year = c(2010L, 
2011L, 2012L, 2013L, 2010L, 2011L, 2012L, 2013L, 2010L, 2011L, 
2012L, 2013L), key = structure(1:12, levels = c("FRA2010", "FRA2011", 
"FRA2012", "FRA2013", "ITA2010", "ITA2011", "ITA2012", "ITA2013", 
"USA2010", "USA2011", "USA2012", "USA2013"), class = "factor"), 
    potential = c(0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1)), row.names = c(NA, 
-12L), class = "data.frame")

jay.sf · Answer 1 · 2023-08-16T10:29:33.363

1

Try this.

split(df, df$country) |> sapply(\(x) {
  c(min=min(x[x$potential == 1, 'year']), mean=mean(x[x$potential == 1, 'year']))
}) |> t()
#      min     mean
# FRA 2013 2013.000
# ITA 2010 2011.333
# USA 2012 2012.500

edited Aug 16 '23 at 10:29

answered Aug 16 '23 at 10:27

jay.sf

60,139
8
53
110

Thanks for your reply. Unfortunately, I shall be able to implement the loop, because I have done just an example. I have a few different functions to run in the loop – Maximilian Aug 16 '23 at 10:28
@Maximilian Just updated. Loop doesn't sound good. Can you make a clearer example what you want? – jay.sf Aug 16 '23 at 10:30
It is quite difficult to make an example. What I'm actually trying to do is to move from a Stata to R. Let me add in the main question the Stata code. – Maximilian Aug 16 '23 at 10:32
@Maximilian good decision to use free and open software ;) go ahead and try your best! – jay.sf Aug 16 '23 at 10:34
@Maximilian PS: You can use `dput(df)` to share your data. – jay.sf Aug 16 '23 at 10:35
The script is almost over, what I'm trying to work on is just the loop mentioned above – Maximilian Aug 16 '23 at 10:39
2

@Maximilian You can use slow loops in R language but thereby miss the fact that R allows fast and easy vectorized operations. See [here](https://docs.ycrc.yale.edu/r-novice-gapminder/09-vectorization/). At the end there's always a loop, but implemented in much faster languages such as C. The sooner you get familiar with this, the better. Read the official docs I link in my profile. Cheers! – jay.sf Aug 16 '23 at 12:06
I made a new question which is much clear than this one with a proper example and attempt. Feel free to have a look: https://stackoverflow.com/questions/76914755/challenging-regression-to-make-maybe-with-a-loop-and-f-stat – Maximilian Aug 16 '23 at 15:52

score 1 · Accepted Answer · answered Aug 16 '23 at 15:48

I wouldnt use a loop to do such calculations, as tidyverse is much more elegant; but I've provided this for loop example as you were quite insistent, and I think this shows a reasonable approach to hand crafted looping.

df <- structure(list(
  country = c(
    "FRA", "FRA", "FRA", "FRA", "ITA",
    "ITA", "ITA", "ITA", "USA", "USA", "USA", "USA"
  ), year = c(
    2010L,
    2011L, 2012L, 2013L, 2010L, 2011L, 2012L, 2013L, 2010L, 2011L,
    2012L, 2013L
  ), key = structure(1:12, levels = c(
    "FRA2010", "FRA2011",
    "FRA2012", "FRA2013", "ITA2010", "ITA2011", "ITA2012", "ITA2013",
    "USA2010", "USA2011", "USA2012", "USA2013"
  ), class = "factor"),
  potential = c(0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1)
), row.names = c(
  NA,
  -12L
), class = "data.frame")



# basic prep, you say you only want to consider potential == 1 
# so simply shorten the data and then no need to think on it more

(sub_df <- subset(df, df$potential == 1))


# your description says to loop over countries ; keys seem irrelevant
(countrycodes <- unique(sub_df$country))
(lc <- length(countrycodes))

# making an empty structure of the desired size to contain the results 
(res <- data.frame(
  country = character(lc),
  mean = numeric(lc),
  min = numeric(lc)
))

# the loop
for (i in seq_len(lc)) {
  ctry <- countrycodes[i]
  years <- sub_df$year[sub_df$country == ctry]

  res[i, ] <- data.frame(
    country = ctry,
    mean = mean(years),
    min = min(years)
  )
}

res

I've made a new question which is much clear than this one with a proper example and attempt. Feel free to have a look: https://stackoverflow.com/questions/76914755/challenging-regression-to-make-maybe-with-a-loop-and-f-stat — Maximilian, Aug 16 '23 at 15:53
I might look later; but perhaps you can learn from this example, and apply the lessons to your more detailed case — Nir Graham, Aug 16 '23 at 15:53
Btw, that's brilliant. I'll try to apply this procedure to my new question — Maximilian, Aug 16 '23 at 15:58

Issue with a loop: mean and min

data

2 Answers2