I have following R dplyr dataframe in df_pub (Science/Nature Publication Data)
Please note that there are same PMID (or paper) with contributing authors in each row (Authors info is not shown here).
I need to select and store publications (PMID) which has no email attached to it and store the last observation of it in data-frame.
Actually I want to remove all PMIDs having any email in any observation. I need to collect the Publications (PMIDs) which does not have an attached email, and then find the last author or last observation (usually she/he/xe are the group leader or PI, we'll contact them manually and request them to update their email).
So for the example above, the expected output will not contain PMID 22522932 because it has an email attached. For other PMIDs only the last row of each such PMID will be stored.
I started with this but then lost
df_pub %>%
group_by(pmid) %>%
filter(is.na(email)) # This does not do the expected