how to write a complex by-function in r

Question

my database is:

Time      Sex   Weight  Time.midnight
0005       1    3837       5
0104       1    3334      64
0118       2    3554      78
0155       2    3838     115
0257       2    3625     177
0405       1    2208     245
0407       1    1745     247
0422       2    2846     262
0431       2    3166     271
0708       2    3520     428
0735       2    3380     455
0812       2    3294     492
0814       1    2576     494

which contains the time of birth, sex, and birth weight for babies born in one 24-hour period at hospital. The variables are the following:

Time: Time of birth recorded on the 24-hour clock / Sex: sex of the child(1=girl,2=boy) / Weight: birth weight in grams / Time.midnight: number of minutes after midnight of each birth

now I want to calculate what is the proportion of girls with a weight smaller than 3 kg ? Compare with the corresponding proportion for boys?

I wanted to use by() function, but below command returned error.

by(Weight, Sex, length(which(Weight<3000)))

Could you please guide?

R does not consider column names to be first class objects. You may want to wrap `with( db_name, ... )` around this although the logic of the code doesn't appear to correspond to my reading of your natural language description. — IRTFM, Jan 18 '19 at 02:26
@Lili.Y It's usually [not advisable to use `attach`](https://stackoverflow.com/questions/10067680/why-is-it-not-advisable-to-use-attach-in-r-and-what-should-i-use-instead). — Maurits Evers, Jan 18 '19 at 03:05
@Lili.Y: You are advised to include all setup and code that is needed to reproduce errors. `attach` calls are especially relevant. Error messages _should_ be reporduced IN FULL. — IRTFM, Jan 18 '19 at 03:14
thanks for your advice I'll definitely consider it in my codes. — Lili.Y, Jan 18 '19 at 03:32

score 0 · Accepted Answer · answered Jan 18 '19 at 02:33

You can do the following

by(df, df$Sex, function(x) sum(x$Weight < 3000) / length(x$Weight))
#df$Sex: 1
#[1] 0.6
#------------------------------------------------------------
#df$Sex: 2
#[1] 0.125

Or instead of by you can also use tapply which returns a named numeric vector

with(df, tapply(Weight, Sex, function(x) sum(x < 3000) / length(x)))
#    1     2
#0.600 0.125

Sample data

df <- read.table(text =
    "Time      Sex   Weight  Time.midnight
0005       1    3837       5
0104       1    3334      64
0118       2    3554      78
0155       2    3838     115
0257       2    3625     177
0405       1    2208     245
0407       1    1745     247
0422       2    2846     262
0431       2    3166     271
0708       2    3520     428
0735       2    3380     455
0812       2    3294     492
0814       1    2576     494", header = T)

Solved! thank you. the correct answer is by(mydata, Sex, function(x) sum(x$Weight < 3000) / length(Weight)) — Lili.Y, Jan 18 '19 at 03:26

how to write a complex by-function in r

1 Answers1

Sample data