0

I have a dataframe where some rows are follow-ups of other other rows (completing each other) which I would like to combine into one row. Take the following example

+-------+-------------+-----------+-----+---------+
| rowID | name        | address   | age | firstId |
+-------+-------------+-----------+-----+---------+
| 1     | Bert        |           | 60  |         |
+-------+-------------+-----------+-----+---------+
| 2     | Ernie       | Berlin    | 72  |         |
+-------+-------------+-----------+-----+---------+
| 3     | Bert Sesame | Amsterdam |     | 1       |
+-------+-------------+-----------+-----+---------+

The 3rd rowId refers back to the 1st RowId which in turn would make Bert 60 years old. Meanwhile the second row has no firstId (a row to follow-up on) and should be left as is.

Some columns can be filled in on both rows and I would like to take the row which has the firstID field filled in (thus the latest row). For example the 3rd row would have the name "Bert Sesame", i would like to use the name "bert sesame" in that case, the row with a firstId value.

The end dataframe would be

+-------+-------------+-----------+-----+---------+
| rowID | name        | address   | age | firstId |
+-------+-------------+-----------+-----+---------+
| 2     | Ernie       | Berlin    | 72  |         |
+-------+-------------+-----------+-----+---------+
| 3     | Bert Sesame | Amsterdam | 60  | 1       |
+-------+-------------+-----------+-----+---------+

How do I achieve this?

I have looked at questions such as this. Merge two rows in data.frame

But this refers to all rows by grouping them together. I only want to merge / combine rows which specifically refer to other rows.

Rien
  • 398
  • 2
  • 12
  • 37
  • 3
    Please provide a reproducible example, expected output and your attempt that failed – Sotos Nov 07 '18 at 08:41
  • @Sotos I believe I added as you requested – Rien Nov 07 '18 at 08:47
  • 2
    [How to make a great R reproducible example?](http://stackoverflow.com/questions/5963269) – Sotos Nov 07 '18 at 08:51
  • I think you are looking for "fuzzy matching", what happens if you have "Berty" instead of "Bert", would you merge it with "Bert Sesame"? – zx8754 Nov 07 '18 at 09:02
  • @zx8754 no I only want to merge rows which the column firstId specifically refers to. The records always exist and have a ID to refer to. – Rien Nov 07 '18 at 09:06
  • Oh, I see, missed that column, yes you are right. Then look into merge, and fill NA and remove duplicated rows. – zx8754 Nov 07 '18 at 09:07

1 Answers1

0

Here is an attempt using tidyverse.

library(tidyverse)

df %>% 
 group_by(rowID_new = replace(rowID, firstId != '', firstId[firstId != ''])) %>% 
 mutate(age = replace(age, age == '', age[age != ''])) %>% 
 ungroup() %>% 
 filter(!rowID %in% firstId) %>% 
 select(-rowID_new)

which gives,

# A tibble: 2 x 5
  rowID name  address age   firstId
  <dbl> <chr> <chr>   <chr> <chr>  
1     2 B     E       72    ""     
2     3 C     F       60    1

EDIT: If you have multiple variables to fill, we can replace '' with NA and use fill, i.e.

df %>% 
 mutate_all(function(i) replace(i, i == '', NA)) %>% 
 group_by(rowID_new = replace(rowID, !is.na(firstId), firstId[!is.na(firstId)])) %>% 
 fill(-rowID, .direction = 'up') %>% #you might not need this[with .direction = 'up']
 fill(-rowID) %>% 
 ungroup() %>% 
 filter(!rowID %in% firstId)

which gives,

# A tibble: 2 x 6
  rowID name  address age   firstId rowID_new
  <dbl> <chr> <chr>   <chr> <chr>   <chr>    
1     3 C     F       60    1       1        
2     2 B     E       72    <NA>    2

EXAMPLE USED

structure(list(rowID = c(1, 2, 3), name = c("A", "B", "C"), address = c("D", 
"E", "F"), age = c("60", "72", ""), firstId = c("", "", "1")), class = "data.frame", row.names = c(NA, 
-3L))
Sotos
  • 51,121
  • 6
  • 32
  • 66
  • Yes this works but this will only work for the one column "age", ive got 200 columns in total which need to be merged just like age here – Rien Nov 07 '18 at 10:25
  • 1
    @KrijnvanderBurg edited. Please let me know If that's ok – Sotos Nov 07 '18 at 10:51
  • 1
    Took me quite a lot of testing to verify it worked ok. its working perfectly! thank you very much – Rien Nov 07 '18 at 11:14