9

I just started using R and I am having trouble performing the following task: I have approximately 130 language samples in separate plain text files sitting in my working directory. What I would like to do is import them using scan and retain their file names. Specifically, what I would like to do is using something like:

Patient01.txt <-scan("./Patient01.txt", what = "character")
Patient02.txt <-scan("./Patient02.txt", what = "character")
...
Patient130.txt <-scan("./Patient130.txt", what = "character")

Is there a way to use a command such as *apply to automate the process?

Jørgen R
  • 10,568
  • 7
  • 42
  • 59
Mike Ferguson
  • 91
  • 1
  • 1
  • 2

2 Answers2

17

Here is one way to automate the process

# read txt files with names of the form Patient*.txt
txt_files = list.files(pattern = 'Patient*.txt');

# read txt files into a list (assuming separator is a comma)
data_list = lapply(txt_files, read.table, sep = ",")

You can change the separator if you know what it is. It is convenient to keep the data as a list of data frames since it is easier to throw into a vectorized operation or loops later.

Ramnath
  • 54,439
  • 16
  • 125
  • 152
  • Thank you for your response. I tried your recommendation but unfortunately it didn't work. I think that the problem is that the language samples are not in table format; they are free speech samples (E.g., "One day I went for a walk.... etc"). Ultimately what I would like to do is analyze the samples using some of the tools in the languageR package. To run the languageR functions though, first, I have to run the following command for each sample. Patient130.txt <-scan("./Patient130.txt", what = "character") Any idea how I can run scan for all samples instead of read.table? – Mike Ferguson Mar 19 '11 at 22:44
  • 1
    @Mike Ferguson: try with `data_list = lapply(txt_files, scan, what = "character")` based on @Ramnath's suggestion. – daroczig Mar 19 '11 at 23:50
  • Did you try Mike's suggestion? lapply is just a wrapper to loop across a list. Effectively, you are doing `lapply(txt_files, function(file) scan(file, what = "character"))` – Ramnath Mar 20 '11 at 19:28
  • @ Ramnath @daroczig That works pretty well. The only problem is that the labeling issue is not solved. When I type in "data_list" I can see the imported language samples but they are labeled [[1]], [[2]], ..., [[103]].If I want to proceed and only analyze selected language samples it is not clear how to do that. – Mike Ferguson Mar 20 '11 at 21:19
  • @Mike Ferguson: try `lapply(txt_files, function(file) assign(file, scan(file, what = "character"), envir = .GlobalEnv))` – daroczig Mar 20 '11 at 23:44
  • @daroczig I apologize if I am not communicating well what I would like to do. When I try your code here's what I get: `> sample_files = list.files(".") > sample_files [1] "testFile.txt" "testFileena.txt" > sample_files = list.files(".") > sample_files [1] "testFile1.txt" "testFile2.txt" > data_list=lapply(sample_files, function(file) assign(file, scan(file, what = "character"), envir = .GlobalEnv))` Read 3 items Read 2 items > data_list [[1]] [1] "first" "you" "have" [[2]] [1] "and" "then" As you can see, I cannot see which patient the sample corresponds to. Any help? – Mike Ferguson Mar 21 '11 at 02:21
  • @Mike Ferguson: with athe above command you should get distinct variables in the global environment named (see: `ls()`) to *Patient01.txt*, *Patient02.txt* and so on. Do not save the result of the list just loop through/run the command. – daroczig Mar 21 '11 at 06:59
  • But you could also set the names of the list with the `names` function. Eg.: `names(data_list) <- txt_files` – daroczig Mar 21 '11 at 08:09
1
            files <- list.files(pattern = 'Patient*.txt')    
            for(i in files) {
            x <- read.table(i, header=TRUE, comment.char = "A", sep="\t")
            assign(i,x)  
            }