0

I have 22 of Excel files (850*2). I loaded into R by such code

setwd ("D: /baseline")
 file_2day=list. files (pattern = "*. csv")
  d_2day<-do.call("rbind", sapply(file_2day, read.csv, simplify = FALSE)) . 

They have a naming pattern, like T1_W1_base.CSV, T1_W10_base.CSV, etc. Below is the sample of my data

     feature.name  value
w1.1    3ddim         100
w1.2    2ddim         80
w1.3    mean          5
w10.1   3ddim         90
w10.2   2ddim         70
w10.3   mean           3

I'd like to arrange my data like this

Feature.name     3ddim   2ddim    mean 
w1               100       80       5
w10              90        70       3

actually my features are 850. Does anyone have any suggestions to achieve this format?

HajarM
  • 331
  • 5
  • 14
  • 3
    When asking for help, you should include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. (Pictures of data aren't particularly helpful). And this type of operation is called "reshaping from long to wide". There are tons of questions like this answered already if you do a bit of searching. – MrFlick Feb 23 '18 at 22:47
  • 1
    By posting example data like this, you are expecting from us to type approximately 600 type strokes to help you. Please provide example data with `dput()` and consider the comment from @MrFlick above. – jay.sf Feb 23 '18 at 23:26
  • What is the name of your first column? Or is it the rownames?? – Onyambu Feb 24 '18 at 00:12

1 Answers1

1

Currently in your sample data I can see that there are duplicate values in "rownames" and R doesn't allow that. But when I back tracked your post I saw that you have distinct rownames in real data so it won't be an issue.

Assumption - Considering this fact I have modified below sample data accordingly (by referring your earlier sample data posted as an image).

library(dplyr)
library(tidyr)
library(tibble)

df %>%
  rownames_to_column("rowname_col") %>%
  mutate(rowname_col = gsub("(\\S+)[.].*", "\\1", rowname_col)) %>%
  spread(feature_name, value) %>%
  rename(feature_name = rowname_col)

Output is:

  feature_name 2ddim 3ddim mean
1           w1    80   100    5
2          w10    70    90    3

Sample data:

df <- structure(list(feature_name = c("3ddim", "2ddim", "mean", "3ddim", 
"2ddim", "mean"), value = c(100L, 80L, 5L, 90L, 70L, 3L)), .Names = c("feature_name", 
"value"), class = "data.frame", row.names = c("w1.1", "w1.2", 
"w1.3", "w10.10", "w10.20", "w10.30"))

       feature_name value
w1.1          3ddim   100
w1.2          2ddim    80
w1.3           mean     5
w10.10        3ddim    90
w10.20        2ddim    70
w10.30         mean     3
Prem
  • 11,775
  • 1
  • 19
  • 33
  • Thanks for answering. – HajarM Feb 26 '18 at 14:15
  • Glad that it helped! – Prem Feb 26 '18 at 17:39
  • Hi, I really appreciated if you could explaine this part of your code mutate(rowname_col = gsub("(\\S+)[.].*", "\\1", rowname_col)) %>% Thanks – HajarM Feb 27 '18 at 09:58
  • `mutate` is used to create a new column so it'll add a new column `rowname_col` to `df` dataframe. Now `gsub` is used for pattern matching and replacement (for more info refer `?gsub`). 1st parameter in `gsub` is the "regex pattern", 2nd parm is the "replacement" and 3rd parm is the "string" on which you want to perform replacement (to read more about regex you should read [this](https://www.regular-expressions.info/tutorial.html)). Here I have used [back-referencing](https://www.regular-expressions.info/backref.html) in 2nd parm of `gsub`. – Prem Feb 27 '18 at 10:40