-1

my database is:

Time      Sex   Weight  Time.midnight
0005       1    3837       5
0104       1    3334      64
0118       2    3554      78
0155       2    3838     115
0257       2    3625     177
0405       1    2208     245
0407       1    1745     247
0422       2    2846     262
0431       2    3166     271
0708       2    3520     428
0735       2    3380     455
0812       2    3294     492
0814       1    2576     494

which contains the time of birth, sex, and birth weight for babies born in one 24-hour period at hospital. The variables are the following:

Time: Time of birth recorded on the 24-hour clock / Sex: sex of the child(1=girl,2=boy) / Weight: birth weight in grams / Time.midnight: number of minutes after midnight of each birth

now I want to calculate what is the proportion of girls with a weight smaller than 3 kg ? Compare with the corresponding proportion for boys?

I wanted to use by() function, but below command returned error.

by(Weight, Sex, length(which(Weight<3000)))

Could you please guide?

IRTFM
  • 258,963
  • 21
  • 364
  • 487
Lili.Y
  • 13
  • 3
  • 1
    R does not consider column names to be first class objects. You may want to wrap `with( db_name, ... )` around this although the logic of the code doesn't appear to correspond to my reading of your natural language description. – IRTFM Jan 18 '19 at 02:26
  • 1
    @Lili.Y It's usually [not advisable to use `attach`](https://stackoverflow.com/questions/10067680/why-is-it-not-advisable-to-use-attach-in-r-and-what-should-i-use-instead). – Maurits Evers Jan 18 '19 at 03:05
  • @Lili.Y: You are advised to include all setup and code that is needed to reproduce errors. `attach` calls are especially relevant. Error messages _should_ be reporduced IN FULL. – IRTFM Jan 18 '19 at 03:14
  • thanks for your advice I'll definitely consider it in my codes. – Lili.Y Jan 18 '19 at 03:32

1 Answers1

0

You can do the following

by(df, df$Sex, function(x) sum(x$Weight < 3000) / length(x$Weight))
#df$Sex: 1
#[1] 0.6
#------------------------------------------------------------
#df$Sex: 2
#[1] 0.125

Or instead of by you can also use tapply which returns a named numeric vector

with(df, tapply(Weight, Sex, function(x) sum(x < 3000) / length(x)))
#    1     2
#0.600 0.125

Sample data

df <- read.table(text =
    "Time      Sex   Weight  Time.midnight
0005       1    3837       5
0104       1    3334      64
0118       2    3554      78
0155       2    3838     115
0257       2    3625     177
0405       1    2208     245
0407       1    1745     247
0422       2    2846     262
0431       2    3166     271
0708       2    3520     428
0735       2    3380     455
0812       2    3294     492
0814       1    2576     494", header = T)
Maurits Evers
  • 49,617
  • 4
  • 47
  • 68
  • Solved! thank you. the correct answer is by(mydata, Sex, function(x) sum(x$Weight < 3000) / length(Weight)) – Lili.Y Jan 18 '19 at 03:26