0

I have a dataset with the following structure

  ID  ISCO EMPLOYMENT
   1    1       1
   2    3       0
   3    1       0
   4    7       1
   .    .       .
   .    .       .

what I would like to do with it is to create a new dataframe with the occupational unemployment of each ISCO group (the variable 1-0 indicates employment status, with 1 meaning being employed and 0 unemployed)

The formula for each of the j ISCO groups would be:

(number of people unemployed in j)/(number of people unemployed in j + number of people employed in j) × 100.

But I don't know how to go about this in R. I thought about creating a loop function, but it seems that in R it is preferable to use the apply() family of functions. (also, consider that the ISCO groups that I have are not a linear increasing line of numbers, but are numbers from 1 to 99, with not all of them appearing. For example, I might have values in the ISCO variable of 3,4 and 6, but not of 5. Obviously, I only need the calculation made for the values that appear in the sample).

Could anybody help me out? Thanks

ffolkvar
  • 47
  • 4

1 Answers1

2

You can calculate the ratio using mean of logical values.

With dplyr :

library(dplyr)
df %>%
  group_by(ISCO) %>%
  summarise(unemployment = mean(EMPLOYMENT == 0) * 100)

In base R :

aggregate(EMPLOYMENT~ISCO, df, function(x) mean(x == 0) * 100)

and data.table :

library(data.table)
setDT(df)[, unemployment = mean(EMPLOYMENT == 0) * 100, ISCO]
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213