1

I have a data frame which I need to transform. I need to change the rows into unique columns based on the value of a column.

ex:

The Input DataFrame

| column_1 | column_2 |
-----------------------
|   A      |     B    |
|   A      |     C    |
|   B      |     E    |
|   B      |     C    |
|   C      |     F    |
|   C      |     G    |

The Output DataFrame

| column_1 | column_2 | column_3 |
----------------------------------
|   A      |     B    |     C    |
|   B      |     E    |     C    |
|   C      |     F    |     G    |

The final DataFrame should have all the unique values in column_1 and the values from column_2 from input DataFrame will be added as new columns in new DataFrame i.e. Column_2 and Column_3.

I have tried to use reshape and melt packages in R but I am getting erroneous data frame.

PradhanKamal
  • 540
  • 4
  • 18

3 Answers3

2

We can use the dplyr and cSplit function from the splitstackshape. It will also work for cases when there are more than two values per group.

library(dplyr)
library(splitstackshape)
dt2 <- dt %>%
  group_by(column_1) %>%
  summarise(column_2 = toString(column_2)) %>%
  cSplit("column_2") %>%
  setNames(paste0("column_", 1:ncol(.)))

dt2
   column_1 column_2 column_3
1:        A        B        C
2:        B        E        C
3:        C        F        G

Data

dt <- data.frame(column_1 = c("A", "A", "B", "B", "C", "C"),
                 column_2 = c("B", "C", "E", "C", "F", "G"),
                 stringsAsFactors = FALSE)
www
  • 38,575
  • 12
  • 48
  • 84
1

Assuming there are always 2 lines for each value in column_1.

Extract the first line for each column_1 element in a first data.table, then the last line in a second data.table, finally merge them into a new data.table

library(data.table)

df <- data.frame(column_1=c("A","A","B","B","C","C"),column_2=c("B","C","E","C","F","G"))
df <- as.data.table(df)
setkey(df,column_1)
first_part <- df[J(unique(column_1)), mult = "first"]
second_part <- df[J(unique(column_1)), mult = "last"]
setnames(second_part,"column_2","column_3")

new_df <- merge(first_part,second_part, by="column_1")

   column_1 column_2 column_3
1:        A        B        C
2:        B        E        C
3:        C        F        G
fmarm
  • 4,209
  • 1
  • 17
  • 29
  • 2
    With `data.table`, you can do it more easily: `dcast(df, column_1 ~ rowid(column_1), value.var = 'column_2')` – mt1022 Sep 13 '17 at 07:56
0

Here's a short solution with dplyr and tidyr :

library(dplyr)
library(tidyr)
df %>% mutate(col = c("column_2","column_3")[duplicated(column_1)+1]) %>%
  spread(col,column_2)

#   column_1 column_2 column_3
# 1        A        B        C
# 2        B        E        C
# 3        C        F        G

And a general solution :

df <- data.frame(column_1 = c("A", "A", "B", "B", "C", "C","A","B","C"),
                 column_2 = c("B", "C", "E", "C", "F", "G","X","Y","Z"),
                 stringsAsFactors = FALSE)

df %>% group_by(column_1) %>%
  mutate(col=paste0("column_",row_number()+1)) %>%
  spread(col,column_2) %>% ungroup

# # A tibble: 3 x 4
#   column_1 column_2 column_3 column_4
# *    <chr>    <chr>    <chr>    <chr>
# 1        A        B        C        X
# 2        B        E        C        Y
# 3        C        F        G        Z
moodymudskipper
  • 46,417
  • 11
  • 121
  • 167