0

Imagine I have the following dataframe:

df <- data.frame(c("A","A","B","B","B","C"), c("a1","a2","b1","b2","b3","c1"))

I would like to re-arrange so I can get a table like:

data.frame(c("**A**","a1","a2"), c("**B**","b1","b2","b3"), c("**C**","c1"))

How could I do it?

Nad Pat
  • 3,129
  • 3
  • 10
  • 20
  • 1
    Is your data frame `df` correct? try copying and running in a fresh instance of R. – Peter Mar 24 '22 at 16:40
  • Maybe check this post: https://stackoverflow.com/questions/20396582/order-a-mixed-vector-numbers-with-letters – Quinten Mar 24 '22 at 16:42
  • 2
    Sorry, I still think there is an issue with the initial source `df`. Try copying this in a fresh session of R `data.frame(c("A","A","B","B","B","C"), c("a1","a2","b1","b2","b3","c1"))`. The dataframes seem to be missing variable names. Unless that is what you intend. – Peter Mar 24 '22 at 16:46
  • the df names are not important here... the point here is start from a "tabular dataframe" (2 columns) and finish in a dataframe with the different values (b1,b2,b3) per unique value in the first column of tabular dataframe (B) – Alexandro H Mar 24 '22 at 17:41

2 Answers2

1

The short answer is you can not do this because each column in your desired output has a different length.

What you can do instead is produce a list with the desired structure, because each item in a list can have a different length:

library(dplyr)

df <- data.frame(c1 = c("A","A","B","B","B","C"), c2 = c("a1","a2","b1","b2","b3","c1"))

# get the distinct values from column 1
dist_values = df %>%
  select(c1) %>%
  distinct() %>%
  unlist(use.names = FALSE)

# function to fetch the matching values from column 2
get_components = function(dist_value){
  out = df %>%
    filter(c1 == dist_value) %>%
    select(c2) %>%
    unlist(use.names = FALSE)
  
  c(dist_value, out)
}

# apply fetch function to every distinct value
named_list = lapply(dist_values, get_components)

This produces the output:

named_list
[[1]]
[1] "A"  "a1" "a2"

[[2]]
[1] "B"  "b1" "b2" "b3"

[[3]]
[1] "C"  "c1"

However attempting to convert this to a data frame as.data.frame(named_list) gives the following error:

Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 3, 4, 2

Hopefully the longer answer gives you a useful step to an alternative.

Simon.S.A.
  • 6,240
  • 7
  • 22
  • 41
  • Thank you! I think this makes sense, maybe once I have the named_list I could look for the "longer" sublist (in this case [[2]], with 4 elements) and copy row by row to a pre-made dataframe with that length (4). – Alexandro H Mar 24 '22 at 21:30
  • Yes, that would be a straightforward way to use the above to accomplish what you are looking for. – Simon.S.A. Mar 25 '22 at 00:38
0

I work with combinign unique values and if i read your comment right :

the point here is start from a "tabular dataframe" (2 columns) and finish in a dataframe with the different values (b1,b2,b3) per unique value in the first column of tabular dataframe (B) 

==== Alternative 1: You can merge in the same df of 2 columns with 3rd contianing unique values by grouping and summarizing.

s <- data.frame(V1=c("A","A","B","B","B","C"), V2=c("a1","a2","b1","b2","b3","c1"))

s%>%
  group_by(V1)%>%
  summarise(
    Values = paste0('**',V1,'**'," ",paste(unique(V2),collapse = ", "))
            )

yields :

V1    Values          
  <chr> <chr>           
1 A     **A** a1, a2    
2 A     **A** a1, a2    
3 B     **B** b1, b2, b3
4 B     **B** b1, b2, b3
5 B     **B** b1, b2, b3
6 C     **C** c1

You can remove duplicates with %>%unique() to get

V1    Values          
  <chr> <chr>           
1 A     **A** a1, a2    
2 B     **B** b1, b2, b3
3 C     **C** c1

Let me know if this is what you are looking for though Its exactly same input you posted and output is same as well (but in a single column)

anuanand
  • 400
  • 1
  • 9