0

A data frame(df) has multiple rows and one columns. I should convert one columns to multiple columns and remove 1: to N: .

head(df[1:3,])

[1] Q1        1     1: 0.009110   2:-0.002122   3:-0.005770   4:-0.016751   5: 0.003284   6:-0.082381              
[2] Q2        1     1: 0.018065   2:-0.033954   3:-0.033954   4: 0.005826   5:-0.033918   6:-0.034069   7:-0.030281   
[3] Q3        1     1: 0.058728   2: 0.003693   3:-0.008006   4: 0.035635   5: 0.039816   6: 0.040578              
20 Levels: Q1        1     1: 0.009110   2:-0.002122   3:-0.005770   4:-0.016751   5: 0.003284   6:-0.082381 ...

df<-read.csv("effect.txt",header = F,skip = 1)
df2 <- lapply(df, gsub, pattern="1:", replacement= "")
adele
  • 13
  • 4
  • 3
    It's not at all clear what the source data looks like. Please produce a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) by using `dput` of your source data and including what your expected output looks like. – r2evans Jul 01 '18 at 12:34

1 Answers1

0

Here is a long way but it works.

#Read the data set
df <- read.table(text = "
                 'Q1        1     1: 0.009110   2:-0.002122   3:-0.005770   4:-0.016751   5: 0.003284   6:-0.082381'              
                 'Q2        1     1: 0.018065   2:-0.033954   3:-0.033954   4: 0.005826   5:-0.033918   6:-0.034069   7:-0.030281'   
                 'Q3        1     1: 0.058728   2: 0.003693   3:-0.008006   4: 0.035635   5: 0.039816   6: 0.040578 '             
                 ",header=F)

library(tidyr)
df[,1] <- gsub("[1-9]:",";",df[,1])  #replace any one digit number i.e. [1-9]  followed by ':' with ';'
df[,1] <- gsub("Q[1-9]        1     ;","",df[,1])   #replace any Q with one digit number then space one digit number then space then ';' e.g. "Q1        1     ;", "Q2        1     ;", "Q3        1     ;", ... etc with ""

max.length <- max(sapply(strsplit(df[,1],";"),length))   #find the length of each row to predifenied the number of columns required by `separate` 
df_clean <- separate(df,1, paste0("a",1:max.length),sep = ";",fill = "right")

df_clean %>% mutate_if(is.character,as.numeric) #change all character columns to numeric

        a1        a2        a3        a4        a5        a6        a7
1 0.009110 -0.002122 -0.005770 -0.016751  0.003284 -0.082381        NA
2 0.018065 -0.033954 -0.033954  0.005826 -0.033918 -0.034069 -0.030281
3 0.058728  0.003693 -0.008006  0.035635  0.039816  0.040578        NA

Update

gsub("Q\\d{1,3}\\s+\\d{1,2}\\s+;","","Q300      29       ;")
[1] ""

  • Q\\d{1,3} Q followed by a number contains 1-3 digits i.e. Q1, Q12, or Q123
  • \\s+ will match 1 or more whitespaces

    So now you can update

    df[,1] <- gsub("Q[1-9]        1     ;","",df[,1])
    

    by

    df[,1] <- gsub("Q\\d{1,3}\\s+\\d{1,2}\\s+;","",df[,1])
    
  • A. Suliman
    • 12,923
    • 5
    • 24
    • 37
    • Thank you for your fast reply. I run your code and got this error:Error in `$<-.data.frame`(`*tmp*`, "Q", value = character(0)) : replacement has 0 rows, data has 20.Can you explain more. – adele Jul 01 '18 at 17:52
    • I think as you do not have a column called Q. Now I update my answer to work with the first column regardless of its name. – A. Suliman Jul 01 '18 at 20:02
    • It was great. Thank you very much. Only when q is q10 to q300, it does not work and for example: q10 1 is placed in a1. And another matter, the digit after q not always one, sometimes is 2 to 29. I spent more than one months on this file conversion. Please consider this too. – adele Jul 02 '18 at 10:24
    • I am eternally grateful for everything you’ve taught me. I checked and changed it. – adele Jul 02 '18 at 11:41