Convert txt file to csv and extract selected rows

Question

I have 500 files (saved in the same directory).

All these files have the same format, as follows. Although excel can recognize them having 3 columns, R only reads one column if I use x <- read.csv(xxx.txt, header=T). I guess there is a smarter way of doing this.

4077    40770     3.083 
4078    40780     2.985 
4079    40790     2.946 
4080    40800     3.010 
4081    40810     2.956 
4082    40820     3.080 
4083    40830     3.130 
4084    40840     3.167 
4085    40850     3.054

After loading all these files to R, I want to extract only 0, 100th, 200th, 300th,....9000th rows and save them in another directory with the same file name in 500 separated files.

Is it possible to be done automatically with R?

score 2 · Answer 1 · answered Sep 15 '11 at 08:19

2

I think that you would improve your code using the read.table() function to import free formated (delimited) data files instead of the rad.csv(), which was written specifically for comma delimited files. And as DWin pointed you, there is no row 0 in R. You could try this way:

directory <- "your.work.directoty" # where your data files are. 
                                   # It depends on your OS (Windows, Linux, MacOS)
ndirectory <- "your.new.directory"
files <- dir(directory)
files.to.read <- paste(directory, files, sep="/") 
files.to.write <- paste(ndirectory, files, sep="/")

for(i in 1:length(files.to.read) )
{
    d <- read.table(files.to.read[i], header=TRUE)
    temp <- d[c(1,seq(100, 9000, by=100)), ]
    write.table(temp, file=files.to.write[i], 
                row.names=FALSE)
}

Hope this help.

answered Sep 15 '11 at 08:19

Manuel Ramón

2,490
2
18
23

Ramon: thanksfor your script. I've got the same task. But R returns an error: `Error in read.table(files.to.read[i], header = TRUE) : more columns than column names`. how to fix it? – abc Jun 17 '12 at 11:09
1

This error is telling you that one or more columns in your data have no column name. You must revise all your files. You could add a `print(i)` and a `flush console()` after the first `{` in the `for` statement and see which of your files has the problem. Maybe your problem is in the field or the decimal separators. You could define both in the read.table function: `read.table(files.to.read[i], header=TRUE, sep=";", dec=",")`. Hope this helps. – Manuel Ramón Jun 25 '12 at 19:03
thanks, I fixed the error :)! A little question. I want to extract a piece of text in each file and create new ones in a new category with the extracted text. How to intercalate it in the for{} statement? – abc Jun 30 '12 at 20:27
1

Not sure if I understand your question. Do you want to extract text from each file name and set it as a new variable in the data frame? If true, you can use the `str_sub` (see `example(str_sub)`) function from the `stringr` package. Add a new line like `temp$class <- str_sub(files[i], 1, 5)` after defining the `temp` data frame (values 1 and 5 within the `str_sub` are the start and end values; see `?str_sub`). – Manuel Ramón Jul 03 '12 at 06:53

score 1 · Answer 2 · edited May 23 '17 at 12:09

1

Loading:

Create a list with their names most easily done with> list.files( )
Loop through that list using read.table with either no sep argument or sep="\t" (no commas or header in that file as it appears now .)
Perhaps: lapply(flist, function(x) assign(x, read.table(x) ) ...and rbind these together, perhaps with something like bigfile <- do.call(rbind, flist)

There are lots of examples in the rhelp archive and on SO of doing this;

Loop in R loading files

With R, loop over data frames, and assign appropriate names to objects created in the loop

How to read.table() multiple files into a single table in R?

Extracting (There is no zero row in R):

subextract <- bigfile[ seq(1, 9001 , by=100), ]
write.csv(subextract, file="smaller.csv")   # will have commas

edited May 23 '17 at 12:09

Community

1
1

answered Sep 14 '11 at 19:55

IRTFM

258,963
21
364
487

I think that there is something weird going on with the format of this answer. – adamleerich Sep 15 '11 at 01:33
@adamleerich anything in particular? – Roman Luštrik Sep 15 '11 at 12:20
I assumed he was downvoting me because I put an outline in a code box. Surely he wasn't complaining about the formatting of links to other SO answer, since that is hardcoded in the SO interface. – IRTFM Sep 15 '11 at 13:34
I assumed that it was a mistake that your "outline" is in a code box. You intended that? There must be a better way. Maybe a blockquote? That at least wouldn't have syntax highlighting. – adamleerich Sep 15 '11 at 14:36
The ability to create lists with subheadings is not well supported in the user interface. – IRTFM Sep 15 '11 at 17:19

Convert txt file to csv and extract selected rows

2 Answers2