How to take unique IDs and convert each row of another column into another column in R and Python

Question

My data has the form:

groups <- c("4","4.2","4.2.1","4.2.1.1", "1", "1.2", "1.2.1", "1.2.1.2","1.2.1.2.1")
x <- data.frame(ID = c(rep("samp_1", 4), rep("samp_2", 5)), Group = groups)

How do I get this?:

ID       col_1   col_2   col_3   col_4     col_5
samp_1   4       4.2     4.2.1   4.2.1.1   NA
samp_2   1       1.2     1.2.1   1.2.1.2   1.2.1.2.1

Each column will be determined by the length of the string, so all data in column 4 will have length 4 (or length 7 including the dots).

I am looking for the most general solution to this (e.g. using loops; using as few packages as possible) because I need to implement this in both R and Python.

Re "both R and Python" - one question at a time please. Here's the canonical Q&A for `R`: [Transpose / reshape dataframe without “timevar” from long to wide format](https://stackoverflow.com/questions/11322801/transpose-reshape-dataframe-without-timevar-from-long-to-wide-format). — Henrik, May 30 '20 at 20:42
hmmmm you are providing the data.frame in R and expecting a solution in python? this is super weird — StupidWolf, May 30 '20 at 22:10
Yes, sorry, about the R/python thing: I merely provided the R code for the dataframe in order to make it reproducible. I simply wanted to demonstrate the structure of the data, but I should emphasise that since the structure of the data is so simple, I am seeking a general solution that I can implement in both because that is what I happen to be using. Sorry if this is taboo in Stackoverflow, I'm still trying to get the hang of the culture here. — Luther_Blissett, May 31 '20 at 11:25

akrun · Answer 1 · 2020-05-30T20:41:42.723

In R, we can create a column for new names with rowid (from data.table) and pivot to 'wide' format

library(dplyr)
library(data.table)
library(stringr)
x %>%
   mutate(name = str_c('col_', rowid(ID))) %>% 
   pivot_wider(names_from = name, values_from = Group)
# A tibble: 2 x 6
#  ID     col_1 col_2 col_3 col_4   col_5    
#  <chr>  <chr> <chr> <chr> <chr>   <chr>    
#1 samp_1 4     4.2   4.2.1 4.2.1.1 <NA>     
#2 samp_2 1     1.2   1.2.1 1.2.1.2 1.2.1.2.1

Or using data.table

library(data.table)
dcast(setDT(x), ID ~ paste0('col_', rowid(ID)), value.var = 'Group')
#       ID col_1 col_2 col_3   col_4     col_5
#1: samp_1     4   4.2 4.2.1 4.2.1.1      <NA>
#2: samp_2     1   1.2 1.2.1 1.2.1.2 1.2.1.2.1

Or using base R with reshape

reshape(transform(x, name = paste0('col_', ave(seq_along(ID), ID, 
    FUN = seq_along))), idvar = 'ID', direction = 'wide', timevar = 'name')

Shan R · Answer 2 · 2020-05-30T21:04:57.823

Excellent options by akrun. If the data is a bit out of order, you may want to try this:

x %>%
  mutate(temp = str_c('col_', str_count(Group, "\\."))) %>%
  pivot_wider(names_from = temp, values_from = Group) %>%
  select(ID, order(colnames(.)))

Data:

groups <- c("41.2","4","4.2.1","4.2.1.1", "1", "1.2", "1.2.1", "1.2.1.2","1.2.1.2.1")
x <- data.frame(ID = c(rep("samp_1", 4), rep("samp_2", 5)), Group = groups)

Result:

# A tibble: 2 x 6
  ID     col_0 col_1 col_2 col_3   col_4    
  <chr>  <chr> <chr> <chr> <chr>   <chr>    
1 samp_1 4     41.2  4.2.1 4.2.1.1 NA       
2 samp_2 1     1.2   1.2.1 1.2.1.2 1.2.1.2.1

score 1 · Answer 3 · answered Jun 01 '20 at 19:51

You can try this in python:

import pandas as pd
import numpy as np
df= pd.DataFrame({'ID':np.repeat(["samp_1","samp_2"],[4,5]),
                 'groups':["4","4.2","4.2.1","4.2.1.1", "1", "1.2", "1.2.1", "1.2.1.2","1.2.1.2.1"],})
df['entry']=df.groupby(['ID']).cumcount()+1

We provide like a number per group, and add it as entry column. Below we do pivoting like in R, with that column to provide the column names, and last we reset the index:

df.pivot(values='groups',columns='entry',index='ID').reset_index()

entry   ID  1   2   3   4   5
0   samp_1  4   4.2 4.2.1   4.2.1.1 NaN
1   samp_2  1   1.2 1.2.1   1.2.1.2 1.2.1.2.1

How to take unique IDs and convert each row of another column into another column in R and Python

3 Answers3