Read multiple text files

Question

I am reading the attached .txt file using the R code below. I have 2200 txt files like this with different station IDs. I need to output only the year for peak flow data available. For example,

Year     Peak 
1929   4050 
1940   7000 
1958   4050 
...

Can somebody help me to modify this code to achive this.

My R code is shown below.

rm(list=ls(all=TRUE)) 
iPath <- 'C:/Desktop/flow_raw/Region-03/' 
mydata <- read.table("02053200-PeakFlow-uptoWY2015.txt", sep="\t", header=TRUE) 
out <- mydata[c(3,5)]

Possible duplicate of [How to read all files in one directory into R at once?](http://stackoverflow.com/questions/21382880/how-to-read-all-files-in-one-directory-into-r-at-once) — phiver, Nov 05 '15 at 20:26
http://stackoverflow.com/questions/5758084/loop-in-r-loading-files — jogo, Mar 23 '17 at 19:22

score 2 · Answer 1 · edited Nov 17 '15 at 00:30

2

I cannot see any attached file.

There are various options to accomplish the task.

library(plyr)   #you only need these packages if you follow my first Option
library(dplyr)

files <- dir("C:/Desktop/flow_raw/Region-03", 
             full.names = TRUE)


# OPT. 1: if you need a Data Frame
df <- lapply(files, function(x) 
      read.table(x, sep = '\t', header = FALSE)[c(3,5)]) %>% 
      plyr::ldply()    #the '.id' argument might be useful

# OPT. 2: if you need a list
listTxt <- lapply(files, function(x) 
           read.table(x, sep = '\t', header = FALSE)[c(3,5)])

NB: If you need a FAST reading function, please, take a look at

data.table::fread

edited Nov 17 '15 at 00:30

user3408139

197
1
12

answered Nov 05 '15 at 21:25

Pasqui

591
4
12

When I try to install the "plyr" packages mentioned above in RStudio, I get the following error message. install.packages("plyr") also installing the dependency ‘Rcpp’ Packages which are only available in source form, and may need compilation of C/C++/Fortran: ‘Rcpp’ ‘plyr’ These will not be installed Can somebody help me to figure this out? Thanks – user3408139 Nov 12 '15 at 00:27
You need to install "Rcpp" first (and in general it is a good idea because Rcpp is now used by more than 500 R packages). If you are not able to install "Rcpp", please, paste the output of `sessionInfo()` here, I will try to help you. – Pasqui Nov 12 '15 at 12:34
I tried installing Rcpp. But I cannot install. Here is the message I got. install.packages("Rcpp") Package which is only available in source form, and may need compilation of C/C++/Fortran: ‘Rcpp’ These will not be installed. – user3408139 Nov 12 '15 at 17:50
that's why I've asked you to paste your `sessionInfo()` here :) – Pasqui Nov 12 '15 at 20:35
NB: different system configurations require different installation steps – Pasqui Nov 12 '15 at 20:52
I was able to install all the packpages but when I view "df" it has all the records catenated. How can I sort flow record for each station? For example I need to output a csv file name as the station ID and flow records in the csv file. – user3408139 Nov 13 '15 at 22:31
1. I will all the group by type of operations using dplyr; e.g., `sortedDf <- df %>% group_by(StationID) %>% arrange(desc(Peak))` and then 2. write the files out splitting by station ID `lapply(split(sortedDf, sortedDf$StationID), function(y) write.csv(x = y[c(1,3)], file = paste0(y$StationID[1], '.csv')))` – Pasqui Nov 14 '15 at 15:42
p.s. You did not give us your 'dput' output, so for the examples I am assuming this one `dput(df)` dput(df) structure(list(Year = c(1929, 1929, 1929, 1929, 1929, 1930, 1931, 1934), StationID = c("WTF2", "WTF2", "WTF2", "WTF3", "WTF3", "WTF3", "YAAI", "YAAI"), Peak = c(500, 4050, 6000, 8000, 7623, 2134, 4578, 6348), .id = c("file1", "file1", "file1", "file2", "file2", "file2", "file3", "file3")), .Names = c("Year", "StationID", "Peak", ".id"), row.names = c(NA, 8L), class = c("tbl_df", "tbl", "data.frame")) – Pasqui Nov 14 '15 at 15:45
Please see the images above of the 'df' output and text file which is reading by the code. As you can see it does not read the station ID (in the above image 02053200). How can I modify the R code to read station ID, year, and flow (i.e 2nd column, 3rd column (year) and 4th column). – user3408139 Nov 16 '15 at 20:35
Hi, I use for my example those columns because that's what your gave us in your example code, if you need the 2nd, the 3rd, and the 4th just edit the reading function as `read.table(x, sep = '\t', header = FALSE)[2:4])` – Pasqui Nov 16 '15 at 22:02
Hi, I made the changes but still it does not read all three columns. Please see the image above. It only reads column 2 but not column 3 and 4. – user3408139 Nov 16 '15 at 23:48
Based only on the 4 lines of data that I can see from your screenshot, the following should be work for you `read.table(x, sep = '\t', header = FALSE, skip = 2)[2:4])`. – Pasqui Nov 17 '15 at 08:46
could please you just attach one real file? – Pasqui Nov 20 '15 at 21:40

score 0 · Answer 2 · answered Nov 05 '15 at 20:29

If I am understanding your question correctly, you want to import 2200 text files at once. For some reason I can't see the attachment, but you should be able to read in the data using the function Corpus from the tm package.

In your case: (the path should lead to a folder where all the text files live)

TextCorpus <- Corpus(DirSource("C:/Desktop/flow_raw/Region-03"))
TextCorpus$content

You should be able to subset these documents. I usually make a list of the documents' content so that you would have a list of 2200 elements containing the original text.

I get the following error message, > install.packages("tm") also installing the dependency ‘slam’ Packages which are only available in source form, and may need compilation of C/C++/Fortran: ‘slam’ ‘tm’ These will not be installed — user3408139, Nov 12 '15 at 00:46

Read multiple text files

2 Answers2