2

I have a table that looks like this: enter image description here

code to reproduce this dataframe in  R:
ID = c("0", "51", "7", "62", "1","10","5", "79", "62", "10","1","7")
mRNA = c("0", "0", "30", "1", "0", "14", "0", "1", "1", "16", "0", "0")
Centroid = c("d0","d0", "d0", "d0", "d1", "d1","d1", "d1", "d1", "d10", "d10", "d10")
df <- data.frame(ID,mRNA,Centroid)

I would like to reformat this data to look like this:

enter image description here

So far I have tried this:

r <- reshape(df, direction = "wide", idvar="Centroid", timevar="ID")

The output looks close to what I want but not there yet (I don't want it to rename the columns using the IDs).

enter image description here

I prefer solutions in R, but if you have suggestions in python I can also try them out. Any help would be very much appreciated!

2 Answers2

4

We can create a unique ID column for each Centroid and then reshape the data in wide format.

library(dplyr)

df %>%
  group_by(Centroid) %>%
  mutate(ID = paste0("mRNA_", row_number())) %>%
  tidyr::pivot_wider(names_from = ID, values_from = mRNA)

# Centroid mRNA_1 mRNA_2 mRNA_3 mRNA_4 mRNA_5
#  <fct>    <fct>  <fct>  <fct>  <fct>  <fct> 
#1 d0       0      0      30     1      NA    
#2 d1       0      14     0      1      1     
#3 d10      16     0      0      NA     NA    
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • @Amaranta_Remedios I don't know what you mean by `doesn't keep the order.` but as far as I can see, it gives the same expected output as shown in your post. – Ronak Shah Jan 25 '20 at 08:35
  • never mind, I got a with mixed up and the problem was in my previous code. Your code works perfectly, sorry about that. – Amaranta_Remedios Jan 25 '20 at 16:52
2

In case you ever need this in Python, this is a solution using pandas which has an analogous data structure to R, the DataFrame.

# setup
import pandas as pd

ID = [0, 51, 7, 62, 1, 10, 5, 79, 62, 10, 1, 7]
mRNA = [0, 0, 30, 1, 0, 14, 0, 1, 1, 16, 0, 0]
Centroid = ['d0', 'd0', 'd0', 'd0', 'd1', 'd1', 'd1', 'd1', 'd1', 'd10', 'd10', 'd10']
df = pd.DataFrame([ID,mRNA,Centroid])
df = df.transpose()
df.rename(columns={0:'ID',1:'mRNA',2:'Centroid'},inplace=True)

# transformation
df['mRNA_idx'] = 'mRNA_' + (df.groupby(['Centroid']).cumcount() + 1).astype(str)
df.pivot(index='Centroid',columns='mRNA_idx',values='mRNA')

# result
>>> df
mRNA_idx mRNA_1 mRNA_2 mRNA_3 mRNA_4 mRNA_5
Centroid                                   
d0            0      0     30      1    NaN
d1            0     14      0      1      1
d10          16      0      0    NaN    NaN
mechanical_meat
  • 163,903
  • 24
  • 228
  • 223