0

Im very new to R and did not find a solution to my specific problem. I really hope you guys can help me.

I have the following data frame:

hid <- c('1','2','2','2','2','4','4','4','4','4','4')
syear <- c(2000,2001,2003,2003,2003,2000,2000,2001,2001,2002,2002)
employlvl <- c('Full-time','Part-time','Part-time','Unemployed','Unemployed','Full-time','Full-time','Full-time','Unemployed','Part-time', 'Full-time')
relHead <- c('Head','Head','Head','Partner','Child','Head','Partner','Head','Partner','Head','Partner')

df <- data.frame(hid,syear,employlvl,relHead)



| hid | syear |  Employment | Relation to Head of HH|
|-----|-------|-------------|-----------------------|
|  1  | 2000  |  Full-time  |         Head          |
|  2  | 2001  |  Part-time  |         Head          |
|  2  | 2003  |  Part-time  |         Head          |
|  2  | 2003  |  Unemployed |        Partner        |
|  2  | 2003  |  Unemployed |         Child         |
|  4  | 2000  |  Full-time  |         Head          |
|  4  | 2000  |  Full-time  |        Partner        |
|  4  | 2001  |  Full-time  |         Head          |
|  4  | 2001  |  Unemployed |        Partner        |
|  4  | 2002  |  Part-time  |         Head          |
|  4  | 2002  |  Full-time  |        Partner        |

I would like to create a new column with the employment level of the Partner if the values in hid (household identification number) and syear (survey year) are equal.

I hope to get the following output:

| hid | syear |  Employment | Relation to Head of HH| Employment Partner|
|-----|-------|-------------|-----------------------|-------------------|
|  1  | 2000  |  Part-time  |         Head          |        NA         |
|  2  | 2001  |  Part-time  |         Head          |        NA         |
|  2  | 2003  |  Part-time  |         Head          |    Unemployed     |
|  2  | 2003  |  Unemployed |       Partner         |        NA         |
|  2  | 2003  |  Unemployed |         Child         |        NA         |
|  4  | 2000  |  Full-time  |         Head          |     Full-time     |
|  4  | 2000  |  Full-time  |        Partner        |        NA         |
|  4  | 2001  |  Full-time  |         Head          |    Unemployed     |
|  4  | 2001  |  Unemployed |        Partner        |        NA         |
|  4  | 2002  |  Part-time  |         Head          |     Full-time     |
|  4  | 2002  |  Full-time  |        Partner        |        NA         |

Thank you so much in advance!

Manuel
  • 155
  • 1
  • 6
  • 1
    Welcome to StackOverflow! Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269). This will make it much easier for others to help you. – Sotos Aug 18 '17 at 12:07
  • 1
    Also what do you mean *if the values in hid (household identification number) and syear (survey year) are equal.*? – Sotos Aug 18 '17 at 12:09
  • First of all thanks for your quick response. By equal I mean that the new column should only consider those rows with matching hid and syear. I also added the code for the data frame. I hope this helps – Manuel Aug 18 '17 at 12:26

1 Answers1

1

We could achieve this by using dplyr and tidyr. There are two steps.

Step 1: Find out which hid and syear combinations have more than two records. Filter them and also filter out the records with Child. Use spread to find the Head and Partner relationship, creating a new data frame. Create a new column with Head for merging. dt2 is the output of this step.

Step 2: Use left_join to combine dt2 with the original data frame dt. dt3 is the final output.

library(dplyr)
library(tidyr)

dt2 <- dt %>%
  group_by(hid, syear) %>%
  filter(n() > 1) %>%
  filter(`Relation to Head of HH` != "Child") %>%
  spread(`Relation to Head of HH`, Employment) %>%
  mutate(Relation = "Head") %>%
  rename(`Employment Partner` = Partner) %>%
  select(-Head)

dt3 <- dt %>%
  left_join(dt2, by = c("hid", "syear", "Relation to Head of HH" = "Relation"))

Data:

library(dplyr)
dt <- data_frame(hid = c(1, 2, 2, 2, 2, 4, 4, 4, 4, 4, 4),
                 syear = c(2000, 2001, 2003, 2003, 2003, 2000, 2000, 2001, 2001, 2002, 2002),
                 Employment = c("Full-time", "Part-time", "Part-time", "Unemployed", "Unemployed",
                                "Full-time", "Full-time", "Full-time", "Unemployed", "Part-time", 
                                "Full-time"),
                 "Relation to Head of HH" = c("Head", "Head", "Head", "Partner", "Child", "Head", 
                                              "Partner", "Head", "Partner", "Head", "Partner")) 
www
  • 38,575
  • 12
  • 48
  • 84
  • I am glad it helps! Please accept my answe if it is useful. – www Aug 18 '17 at 12:41
  • Just one more question. When I try to run the code on my data I get the following Error: Data source must be a dictionary. Do you maybe know what the problem is? – Manuel Aug 18 '17 at 15:31
  • I am not familiar with this error. My google search of this error leads me to some pages talking about issues associated with the `rlang` package. Maybe it would be helpful to re-installed the `dplyr` package or even install the `tidyverse` package to keep things up to date. If you still cannot fix this, maybe you can try to create a new post to talk about this. error – www Aug 18 '17 at 15:37
  • Thanks for your advice I will definitely create a new post. I think the problem might be that I have more columns than in the data set above. If I add another column in the small example dataframe I don't get the output I was hoping for. Do you know how to treat other columns to get the desired output? – Manuel Aug 22 '17 at 07:08