4

I can't find an exact answer to this problem, so I hope I'm not duplicating a question.

I have a dataframe as follows

groupid  col1  col2  col3  col4
   1      0     n     NA     2    
   1      NA    NA    2      2

What I'm trying to convey with this is that there are duplicate IDs where the total information is spread across both rows and I want to combine these rows to get all the information into one row. How do I go about this?

I've tried to play around with group_by and paste but that ends up making the data messier (getting 22 instead of 2 in col4 for example) and sum() does not work because some columns are strings and those that are not are categorical variables and summing them would change the information.

Is there something I can do to collapse the rows and leave consistent data unchanged while filling in NAs?

EDIT:

Sorry desired output is as follows:

groupid  col1  col2  col3  col4
   1      0     n     2     2
MokeEire
  • 638
  • 1
  • 8
  • 19
  • Is this helpful? https://stackoverflow.com/a/42567254/496803 Or even this one - https://stackoverflow.com/questions/33601267/merging-similar-rows-in-a-data-frame/33601491 – thelatemail Jul 19 '17 at 21:56
  • Oh that second link might work, I'll give it a try – MokeEire Jul 19 '17 at 22:05
  • https://stackoverflow.com/questions/28036294/collapsing-rows-where-some-are-all-na-others-are-disjoint-with-some-nas – CPak Jul 19 '17 at 22:24
  • 1
    Possible duplicate of [How can I remove all cells with "NA" value by columns](https://stackoverflow.com/questions/45376531/how-can-i-remove-all-cells-with-na-value-by-columns) – Uwe Aug 04 '17 at 22:23

3 Answers3

10

Is this what you want ? zoo+dplyr also check the link here

df %>%
    group_by(groupid) %>%
    mutate_all(funs(na.locf(., na.rm = FALSE, fromLast = FALSE)))%>%filter(row_number()==n())


# A tibble: 1 x 5
# Groups:   groupid [1]
  groupid  col1  col2  col3  col4
    <int> <int> <chr> <int> <int>
1       1     0     n     2     2

EDIT1

without the filter , will give back whole dataframe.

    df %>%
        group_by(groupid) %>%
        mutate_all(funs(na.locf(., na.rm = FALSE, fromLast = FALSE)))

# A tibble: 2 x 5
# Groups:   groupid [1]
  groupid  col1  col2  col3  col4
    <int> <int> <chr> <int> <int>
1       1     0     n    NA     2
2       1     0     n     2     2

filter here, just slice the last one, na.locf will carry on the previous not NA value, which mean the last row in your group is what you want.

Also base on @ thelatemail recommended. you can do the following , give back the same answer.

df %>% group_by(groupid) %>% summarise_all(funs(.[!is.na(.)][1]))

EDIT2

Assuming you have conflict and you want to show them all.

df <- read.table(text="groupid  col1  col2  col3  col4
   1      0     n     NA     2    
                 1      1    NA    2      2",
                 header=TRUE,stringsAsFactors=FALSE)
 df
  groupid col1 col2 col3 col4
1       1    0    n   NA    2
2       1    1(#)<NA>    2    2(#)
df %>%
    group_by(groupid) %>%
    summarise_all(funs(toString(unique(na.omit(.)))))#unique for duplicated like col4
  groupid  col1  col2  col3  col4
    <int> <chr> <chr> <chr> <chr>
1       1  0, 1     n     2   2
BENY
  • 317,841
  • 20
  • 164
  • 234
6

Another option with just dplyr is just to take the first non-NA value when available. You can do

dd <- read.table(text="groupid  col1  col2  col3  col4
1      0     n     NA     2    
1      NA    NA    2      2", header=T)

dd %>% 
  group_by(groupid) %>% 
  summarise_all(~first(na.omit(.)))
MrFlick
  • 195,160
  • 17
  • 277
  • 295
0

Would you be able to draw the desired output in this case? Converting data.frame into anothre type as.vector(), as.matrix() and grouping/factoring might help.

UPDATE: Finding a unique elements for each column and omitting NAs.

df<-data.frame(groupid=c(1,1), col1=c(0,NA), col2=c('n', NA), col3=c(NA,2),  col4=c(2,2)) # your input
out<-data.frame(df[1,]) # where the output is stored, duplicate retaining 1 row
for(i in 1:ncol(df)) out[,i]<-na.omit(unique(df[,i]))
print(out)
Miroslav Radojević
  • 487
  • 1
  • 5
  • 20
  • This does not provide an answer to the question. To critique or request clarification from an author, leave a comment below their post. - [From Review](/review/low-quality-posts/16766375) – loki Jul 20 '17 at 08:16
  • 1
    I did not have priviledges to post comment yesterday. This was as much as I could do. It was necessary to clarify the question and everything I did was ethically sound. – Miroslav Radojević Jul 20 '17 at 08:39
  • Then you yould do it now and remove this answer. – loki Jul 20 '17 at 08:42
  • The answer is updated now, no need to remove it. As I said, the commenting was not possible, apology for that. – Miroslav Radojević Jul 20 '17 at 09:59