0

My data has the form:

groups <- c("4","4.2","4.2.1","4.2.1.1", "1", "1.2", "1.2.1", "1.2.1.2","1.2.1.2.1")
x <- data.frame(ID = c(rep("samp_1", 4), rep("samp_2", 5)), Group = groups)

How do I get this?:

ID       col_1   col_2   col_3   col_4     col_5
samp_1   4       4.2     4.2.1   4.2.1.1   NA
samp_2   1       1.2     1.2.1   1.2.1.2   1.2.1.2.1

Each column will be determined by the length of the string, so all data in column 4 will have length 4 (or length 7 including the dots).

I am looking for the most general solution to this (e.g. using loops; using as few packages as possible) because I need to implement this in both R and Python.

Luther_Blissett
  • 327
  • 1
  • 6
  • 16
  • 1
    Re "both R and Python" - one question at a time please. Here's the canonical Q&A for `R`: [Transpose / reshape dataframe without “timevar” from long to wide format](https://stackoverflow.com/questions/11322801/transpose-reshape-dataframe-without-timevar-from-long-to-wide-format). – Henrik May 30 '20 at 20:42
  • hmmmm you are providing the data.frame in R and expecting a solution in python? this is super weird – StupidWolf May 30 '20 at 22:10
  • Yes, sorry, about the R/python thing: I merely provided the R code for the dataframe in order to make it reproducible. I simply wanted to demonstrate the structure of the data, but I should emphasise that since the structure of the data is so simple, I am seeking a general solution that I can implement in both because that is what I happen to be using. Sorry if this is taboo in Stackoverflow, I'm still trying to get the hang of the culture here. – Luther_Blissett May 31 '20 at 11:25

3 Answers3

2

In R, we can create a column for new names with rowid (from data.table) and pivot to 'wide' format

library(dplyr)
library(data.table)
library(stringr)
x %>%
   mutate(name = str_c('col_', rowid(ID))) %>% 
   pivot_wider(names_from = name, values_from = Group)
# A tibble: 2 x 6
#  ID     col_1 col_2 col_3 col_4   col_5    
#  <chr>  <chr> <chr> <chr> <chr>   <chr>    
#1 samp_1 4     4.2   4.2.1 4.2.1.1 <NA>     
#2 samp_2 1     1.2   1.2.1 1.2.1.2 1.2.1.2.1

Or using data.table

library(data.table)
dcast(setDT(x), ID ~ paste0('col_', rowid(ID)), value.var = 'Group')
#       ID col_1 col_2 col_3   col_4     col_5
#1: samp_1     4   4.2 4.2.1 4.2.1.1      <NA>
#2: samp_2     1   1.2 1.2.1 1.2.1.2 1.2.1.2.1

Or using base R with reshape

reshape(transform(x, name = paste0('col_', ave(seq_along(ID), ID, 
    FUN = seq_along))), idvar = 'ID', direction = 'wide', timevar = 'name')
akrun
  • 874,273
  • 37
  • 540
  • 662
1

Excellent options by akrun. If the data is a bit out of order, you may want to try this:

x %>%
  mutate(temp = str_c('col_', str_count(Group, "\\."))) %>%
  pivot_wider(names_from = temp, values_from = Group) %>%
  select(ID, order(colnames(.)))

Data:

groups <- c("41.2","4","4.2.1","4.2.1.1", "1", "1.2", "1.2.1", "1.2.1.2","1.2.1.2.1")
x <- data.frame(ID = c(rep("samp_1", 4), rep("samp_2", 5)), Group = groups)

Result:

# A tibble: 2 x 6
  ID     col_0 col_1 col_2 col_3   col_4    
  <chr>  <chr> <chr> <chr> <chr>   <chr>    
1 samp_1 4     41.2  4.2.1 4.2.1.1 NA       
2 samp_2 1     1.2   1.2.1 1.2.1.2 1.2.1.2.1
Shan R
  • 521
  • 4
  • 8
1

You can try this in python:

import pandas as pd
import numpy as np
df= pd.DataFrame({'ID':np.repeat(["samp_1","samp_2"],[4,5]),
                 'groups':["4","4.2","4.2.1","4.2.1.1", "1", "1.2", "1.2.1", "1.2.1.2","1.2.1.2.1"],})
df['entry']=df.groupby(['ID']).cumcount()+1

We provide like a number per group, and add it as entry column. Below we do pivoting like in R, with that column to provide the column names, and last we reset the index:

df.pivot(values='groups',columns='entry',index='ID').reset_index()

entry   ID  1   2   3   4   5
0   samp_1  4   4.2 4.2.1   4.2.1.1 NaN
1   samp_2  1   1.2 1.2.1   1.2.1.2 1.2.1.2.1
StupidWolf
  • 45,075
  • 17
  • 40
  • 72