0

I have 5 variables (age, date, ahe, female, bachelor) and would like to split the data by the column 'female' which takes a value 1 for females and 0 for males. I understand the function split() can split this for me with the code:

split(data_wage$ahe, data_wage$female)

but what I don't understand is how to use these two split groups after this part is done. I want to plot a scatter plot of 'age' on 'ahe' twice one time with the females and one time with males. Any help would be greatly appreciated!

G. Cito
  • 6,210
  • 3
  • 29
  • 42
user2756399
  • 11
  • 1
  • 1
  • 2
  • Any [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) would be greatly appreciated! –  Feb 18 '15 at 03:10

3 Answers3

4

split can be avoided for problems like these, particularly if you use tools like "lattice" or "ggplot2".

Here's a "lattice"-based approach:

## sample data
set.seed(1)
mydf <- data.frame(
  ahe = sample(100, 1000, TRUE),
  age = sample(18:60, 1000, TRUE),
  female = sample(c(0, 1), 1000, TRUE)
)

## Convert the female column to a factor
## Not necessary, but makes the output nicer
mydf$female <- factor(mydf$female, c(0, 1), c("male", "female"))

## Load the lattice package
library(lattice)

## Side by side
xyplot(ahe ~ age | female, data = mydf)

enter image description here

## all in one, with key
xyplot(ahe ~ age, groups = female, data = mydf, auto.key = TRUE)

enter image description here

A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
1

split() returns a list, in this case a list of two data.frames, one for Male and one for Female.

lapply(list,function) will apply a function to each element of a list, so, consider this code:

splitList = split(data_wage, data_wage$female)
par(mfrow=c(1,2))
lapply(splitList,function(x){plot(age~ahe,data=x)})

This will give you two scatter plots, side by side, one for men and one for women.

Mark
  • 4,387
  • 2
  • 28
  • 48
  • How would this work? You've split two columns, and are then plotting a column that doesn't exist in the split data. Also, I don't remember `plot` having a `data` argument. – A5C1D2H2I1M1N2O1R2T1 Feb 18 '15 at 03:56
  • @AnandaMahto thanks for catching the error in my split function - it should just be the whole data.frame for the first argument. `plot` doesn't have an explicit `data` argument but it accepts it. – Mark Feb 18 '15 at 14:06
0

use "subset" to make a new dataframe with only the records you want. Specify this with a logical operator such as data_wage$female==1:

`data_wage_female <- subset(data_wage, data_wage$female==1)

data_wage_male <- subset(data_wage, data_wage$female==0)

     ## now you can plot females and males separately using these subsets:
plot(data_wage_female$age ~ data_wage_female$ahe, col="red")  

     ## plots females with red symbols
points(data_wage_male$age ~ data_wage_male$ahe, col="blue")
 ## plots males with blue symbols on the same scatter plot'
shauryachats
  • 9,975
  • 4
  • 35
  • 48
kkraz
  • 1
  • 1