0

Fresh lettuce here so don't laugh at my questions:

Say I have a folder containing 40 individual .txt files and I would like to convert them into .csv format. To have an end product : a new folder with 40 individual .csv files.

I have seen similar question posted and their code, however the code did run but the .csv files is nothing like the orginal .txt file: all the data are scrambled.

Since I want to keep the header, and I want to read all the data/rows in the .txt file. I made some cosmetic changes to the code, still didnt run and returned a warning "Error in file(file, "rt") : cannot open the connection In addition: Warning message: In file(file, "rt") : cannot open file 'C:/Users/mli/Desktop/All TNFa controls in the training patients ctrl_S1018263__3S_TNFaCHx_333nM+0-1ugml_none.txt': Invalid argument"

My code as below:

directory <- "C:/Users/mli/Desktop/All TNFa controls in the training patients"
ndirectory <- "C:/Users/mli/Desktop/All TNFa controls in the training patients/CSV"
file_name <- list.files(directory, pattern = ".txt")
files.to.read <- paste(directory, file_name, sep="\t") 
files.to.write <- paste(ndirectory, paste0(sub(".txt","", file_name),".csv"), sep=",")
for (i in 1:length(files.to.read)) {
  temp <- (read.csv(files.to.read[i], sep="\t", header = T))
  write.csv(temp, file = files.to.write[i])
}
ML33M
  • 341
  • 2
  • 19
  • Try adding "/" at the end of `directory` and `ndirectoy`. From the error, it seems like have a missing "/" here: 'C:/Users/mli/Desktop/All TNFa controls in the training patients ****missing slash `*****` ctrl_S1018263__3S_TNFaCHx_333nM+0-1ugml_none.txt': Invalid argument. Something like that `directory <- "C:/Users/mli/Desktop/All TNFa controls in the training patients/"` – DJV Jan 28 '20 at 15:33
  • 1
    The error is in the paste you use. Set `sep='/'` and not `'\t'` or `','` at lines 4 and 5. Paste is a modification of the path name. – Gowachin Jan 28 '20 at 15:35
  • 2
    To save yourself from unnecessary headache down the road, do not use spaces or special characters in folder names or file names. Use underscore "_" (not dash "-" or dot ".") if you have to – Tung Jan 28 '20 at 15:35
  • Sorry I have tried it. The code still wont run and it reported back the same error message. And I think the / is there the error message warning. somehow my copy paste missed it. – ML33M Jan 28 '20 at 15:37
  • @ML33M: can also take a look at the `vroom` package for speed and versatility https://stackoverflow.com/questions/3397885/how-do-you-read-in-multiple-txt-files-into-r/48105838#48105838 – Tung Jan 28 '20 at 15:39
  • Thank you Tung. The file names are out of my hands, the file name input was tied to the person entering the name on the day. But I will try to make sure they find some simpler ways in the future – ML33M Jan 28 '20 at 15:39
  • Strange, after modification of your code it works. Maybe there is an issue in the read.csv line. Are you sure that it's tabulations that delimite your columns? Can you load a txt file with the rstudio environnement/import dataset to see which code it use to load your dataset? – Gowachin Jan 28 '20 at 15:41
  • Hi Gowachin, your suggestion of the line 4 and 5 are awesome. I changed the "\t" and ',' to '/'. (I kept line 7 sep= '\t'). The code runs and I have got the folder of .csv files. Only one problem, the output CSV has 2 extra columns: the 1st one with no header, but the rows are just numbering for row IDs. The last column titled X.1 and filled with "NA" in all rows. Any idea how to fix that? – ML33M Jan 28 '20 at 15:46
  • Add `row.names= FALSE` inside your write.csv function. For the last column it's hard to know how to deal with it whitout having a look to the data. I bet that some row have a tabulation or space inside it... – Gowachin Jan 28 '20 at 15:50
  • Hi Gowachin, I hope I loaded the txt file correctly by using Rstuido>File>Importing Dataset>From txt (Base)>click the file from the folder. And below is the console dialogue:> `ctrl_S104259_TCL__3S_TNFaCHx_333nM+0.1ugml_none` <- read.delim("C:/Users/mli/Desktop/All TNFa controls in the training patients/ctrl_S104259_TCL__3S_TNFaCHx_333nM+0-1ugml_none.txt", header=FALSE) – ML33M Jan 28 '20 at 15:53
  • Hi Gowachin, the row.names= FALSE worked as you predicted. The last column of "NA" still persisted. I really appreciated your help, and I'm learning so much from all people's input. It is hard to see what might be wrong without seeing the data, but I think I can get away with this column, as my next step is sampling a portion of the data and I will subset a few specific columns by colnames to make a data frame for downstream processing. – ML33M Jan 28 '20 at 15:57
  • Yes I thinks it was loaded correctly, but my hope was that it show the `sep=`... can you open a file in a simple text editor or something to see what character is set between columns? tabulation, space, comma or else? – Gowachin Jan 28 '20 at 15:59
  • Sorry Gowachin for miss understanding your request. I opened the .txt in notepad, it looks like the columns are separated by tab or spaces – ML33M Jan 28 '20 at 16:01

1 Answers1

0

When you paste your path with the name of your file at line 4 and 5, use /, to obtain a new path in a character string. The sep value is what the function will put when it will paste together multiple strings.

> paste('hello','world',sep=" ")
[1] "hello world"

> paste('hello','world',sep="_")
[1] "hello_world"

This is different of the sep value you need in read.csv that define the character between each column of you csv file.

Gowachin
  • 1,251
  • 2
  • 9
  • 17