Reshaping columns of the same dataframe into one

Question

I have a df looking like this:

Department   ID   Category   Category.ID
    NA       NA      NA          NA
   Sales     101     2           4
   Sales     101     2           4
    NA       NA      NA          NA
   Sales     101     2           4
   Sales     101     2           4
    NA       NA      NA          NA
   Sales     101     2           4
   Sales     101     2           4

df = data.frame(Department = rep(c(NA, 'Sales', 'Sales'), times = 3),
                ID = rep(c(NA, 101, 101), times = 3),
                Category.Department = rep(c(NA, 2, 2), times = 3),
                Category.ID = rep(c(NA, 4, 4), times = 3), stringsAsFactors = FALSE)

And I would like to have an output like this, where in only one column I can have the Department and ID and in another one, the Category. The NA in each column it is important to separate the groups.

New.Col   Category
  NA         NA
 Sales       2
  101        4
  NA         NA
 Sales       2
  101        4
  NA         NA
 Sales       2
  101        4

So far I tried with transpose, sapply and a function but it has not worked as I expected. Any suggestions in base?

is the output just an example? I can not see the logic behind that in the `merge` context. It looks like a formatting from wide to long format. — DSGym, Jun 07 '19 at 07:55
The `NA` is necessary to separate each group. I also just tried with `reshape` but it is not giving the output I need. — Biostatician, Jun 07 '19 at 08:07
I agree with the first comment. The logic behind your expected output is not clear. What are you trying to do here? Please provide some details/explanations. Why the particular order of rows in your expected output? — Maurits Evers, Jun 07 '19 at 08:09

Humpelstielzchen · Accepted Answer · 2019-06-07T09:01:08.143

1

Can't accept an accept without true expected output.


df$group <- rep(1:3, times = 3)

df2 <- reshape(df[df$group != 3,], direction = "long", varying = list(New.col = c(1,2), Category = c(3,4)),
               idvar = "id", v.names = c("New.col", "Category"))

df3 <- df2[order(df2$id),]

df3[!(df3$time == 1 & df3$group == 1), c(3,4)] 

    New.col Category
1.2    <NA>       NA
2.1   Sales        2
2.2     101        4
3.2    <NA>       NA
4.1   Sales        2
4.2     101        4
5.2    <NA>       NA
6.1   Sales        2
6.2     101        4

edited Jun 07 '19 at 09:01

answered Jun 07 '19 at 08:09

Humpelstielzchen

6,126
3
14
34

It works good... now I just need to find a way to delete the remaining data duplicates :-P Thanks!! – Biostatician Jun 07 '19 at 08:21
Which ones do you consider duplicates? – Humpelstielzchen Jun 07 '19 at 08:22
On your example, rows `3.1`, `3.2`, `6.1`, `6.2`, `9.1` & `9.2` since I just need the data once per group and not twice. – Biostatician Jun 07 '19 at 08:24
So, following your solutions, plus [this answer](https://stackoverflow.com/questions/7942519/deleting-every-n-th-row-in-a-dataframe) I got the desired output... Thank you!!! – Biostatician Jun 07 '19 at 08:43

score 0 · Answer 2 · answered Jun 07 '19 at 08:29

Here is a different approach than casting to long format, which relies in coalesce. In addition, I created a group variable and removed the NA rows as they will not serve a purpose in your analysis, i.e.

library(tidyverse)

df %>% 
 group_by(grp = cumsum(rowSums(is.na(.)) == ncol(.))) %>% 
 mutate_at(vars(contains('ID')), funs(lag)) %>% 
 mutate_at(vars(contains('Department')), funs(lead)) %>% 
 mutate(new.col = coalesce(Department, as.character(ID)), 
        category = coalesce(Category.Department, Category.ID)) %>% 
 select(grp, new.col, category) %>% 
 distinct()

which gives,

# A tibble: 6 x 3
# Groups:   grp [3]
    grp new.col category
  <int> <chr>      <dbl>
1     1 Sales          2
2     1 101            4
3     2 Sales          2
4     2 101            4
5     3 Sales          2
6     3 101            4

Reshaping columns of the same dataframe into one

2 Answers2