2

I was helped earlier with applying a weight while generating a frequency table. Now I have a question about the next step, but here I'm merely looking for a better solution than the clumsy work-around I have managed to come up with.

My dataframe now has another column containing country codes, and I want to generate my weighted frequency tables separately for each country.

The work-around is this:

library(descr)

country <- c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2)
var <- c(1, 3, 2, 1, 2, 2, 3, 3, 1, 3, NA, NA)
wght <- c(0.8, 0.9, 1.2, 1.5, 0.5, 1, 0.7, 0.9, 0.8, 1.1, 1, 0.8)
df <- cbind.data.frame(country, var, wght)

df1 <- subset(df, country == 1)
freq(df1$var, df1$wght)

df2 <- subset(df, country == 2)
freq(df2$var, df2$wght)

For this example it works just fine, but I have around thirty countries in my real data. Doing it this way is tedious, and especially, I'd be hacking my data into pieces which may come to bite me in later stages of the analysis (e.g. if I want to aggregate countries into regions and compare them). Is there a cleaner, less "invasive" way?

SpecialK201
  • 111
  • 7
  • 2
    See https://stackoverflow.com/q/11562656/3358272. You might use `split`, `by`, `tapply`, for example. – r2evans Mar 17 '23 at 17:07

1 Answers1

1

One option is to use lapply afterspliting the data by country. This approach stores the data in a list, with each element of the list representing a country (ie, in your example data, it has length 2):

lapply(split(df, df$country), 
       function(x) descr::freq(x[,"var"], x[,"wght"]))

Output:

$`1`
x[, "var"] 
      Frequency Percent
1           2.3   38.98
2           2.7   45.76
3           0.9   15.25
Total       5.9  100.00

$`2`
x[, "var"] 
      Frequency Percent Valid Percent
1           0.8   15.09         22.86
3           2.7   50.94         77.14
NA's        1.8   33.96              
Total       5.3  100.00        100.00
jpsmith
  • 11,023
  • 5
  • 15
  • 36
  • This works on the initial dataframe, but now I tried to transpose the solution to a different dataframe and I get an error message ("Error in if (xi > xj) 1L else -1L : missing value where TRUE/FALSE needed In addition: Warning message: In Ops.factor(xi, xj) : ‘>’ not meaningful for factors"). I gather from Google that it has to do with unexpected NAs, and the end of the warning message suggests something's up with variable classes. But I have no NAs except in what corresponds to "var" in the example above, and the classes are the same as in the example that works. Any idea why this happens? – SpecialK201 Mar 21 '23 at 16:08
  • Just in case anyone is reading along encountering the same problem, it turned out to be because the dataset on which the code worked was a data.frame, while the second one was a tibble (tbl_df). I don't know whether there is a more elegant solution to the problem, but after running ```df <- as.data.frame(df)``` it did work. – SpecialK201 Mar 22 '23 at 11:41