4

Originally I have this TSV file (sample):

name   type   qty   
cxfm   1C     0
d2     H50    2
g3g    1G     2
hb     E37    1
nlx    E45    4

so I am using read.csv to read data from a .tsv file but I always get this output:

name   type   qty   
1      cxfm   1C     0
2      d2     H50    2
3      g3g    1G     2
4      hb     E37    1
5      nlx    E45    4

instead of getting this one:

       name   type   qty   
1      cxfm   1C     0
2      d2     H50    2
3      g3g    1G     2
4      hb     E37    1
5      nlx    E45    4

Any ideas this? this is what I am using to read the files:

    file_list<-list.files()

for (file in file_list){

  if (!exists("dataset")){
    dataset <- read.table(file, header = TRUE, sep = "\t", row.names = NULL, blank.lines.skip = TRUE, fill = TRUE)
    names(dataset) <- c("rowID", names(dataset)[1:ncol(dataset)-1])
    }

  if (exists("dataset")){
    temp_dataset <- read.table(file, header = TRUE, sep = "\t", row.names = NULL, blank.lines.skip = TRUE, fill = TRUE)
    names(temp_dataset) <- c("rowID", names(temp_dataset)[1:ncol(temp_dataset)-1])
    dataset <- rbind(dataset, temp_dataset)
    rm(temp_dataset)
  }

}

dataset <- unique(dataset)

write.table(dataset, file = "dataset.tsv", sep = "\t")
Chayma Atallah
  • 725
  • 2
  • 13
  • 30

2 Answers2

3

There appears to be a missing column header in your source CSV file. One option here would be to leave your read.csv() call as it is and simply adjust the names of the resulting data frame:

df <- read.csv(file,
               header = TRUE,
               sep = "\t",
               row.names = NULL,
               blank.lines.skip = TRUE,
               fill = TRUE,
               comment.char = "",
               quote = "", stringsAsFactors = FALSE)

names(df) <- c("rowID", names(df)[1:ncol(df)-1])
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • I did that and now I have this error `Error in names(dataset) <- c("rowID", names(dataset)) : 'names' attribute [13] must be the same length as the vector [12]` – Chayma Atallah May 19 '16 at 07:41
  • Sorry, use this: `names(df) <- c("rowID", names(df)[1:ncol(df)-1])` – Tim Biegeleisen May 19 '16 at 07:42
  • thank you so much but now I have this ` Error in match.names(clabs, names(xi)) : names do not match previous names ` – Chayma Atallah May 19 '16 at 07:46
  • Then you must have some other problem in your R script. If you post the relevant code I might be able to help. – Tim Biegeleisen May 19 '16 at 07:47
  • Put this code in your question, not as a comment. What do you plan to do with each `dataset` data frame after reading it from file? This code snippet shows that you are overwriting each one, which doesn't make too much sense to me. – Tim Biegeleisen May 19 '16 at 07:52
  • I am sorry I am still new to stackoverflow, for now i am just merging my tsv files and afterwords I am going to merge the one dataset file of each folder together based on 4 commun columns that ways I have one commun Data Source to work with in my java program ( and I will have to add few calculated columns to the general Datasource later on too) – Chayma Atallah May 19 '16 at 08:01
  • Your code actually looks OK. Are you _certain_ that the column headers are the same in _every_ file? – Tim Biegeleisen May 19 '16 at 08:09
  • Yes they are it is the same columns header affected for all files only the data is devided to files everytime we reach a certain number of lines. – Chayma Atallah May 19 '16 at 08:18
  • I disagree with you. See [here](http://stackoverflow.com/questions/12019461/rbind-error-names-do-not-match-previous-names) for more information. – Tim Biegeleisen May 19 '16 at 08:25
  • Here is a quick fix you can try. Add `names(temp_dataset) <- names(dataset)` to the second `if` condition. – Tim Biegeleisen May 19 '16 at 08:30
  • No it still doesn't take into concideration the last column :/ I don't know why :/ – Chayma Atallah May 19 '16 at 10:12
  • Do your files even have the same _number_ of columns? Did you check that? – Tim Biegeleisen May 19 '16 at 10:12
  • Yep I did, it is impossible for them not to have the same number of columns cause it is the same template everytime I just duplicate it and fill it with data again – Chayma Atallah May 19 '16 at 10:14
  • I sent you an email with samples – Chayma Atallah May 19 '16 at 11:37
2

This is what I had to do to Fix it: set row.names to FALSE

write.table(dataset, file = "data.tsv", sep = "\t", row.names = FALSE)
Chayma Atallah
  • 725
  • 2
  • 13
  • 30