R file inputs and histogram

Question

I am a bit new to R and trying to learn but I am confused as to how to fix a problem that I have stumbled upon. I am trying to input multiple files so that I may make one histogram per file. The code works well, especially with just one file, but I have encountered a problem when I enter multiple files.

EDIT: Ending code

library("scales")
library("tcltk")
File.names<-(tk_choose.files(default="", caption="Choose your files", multi=TRUE, filters=NULL, index=1))
Num.Files<-NROW(File.names)
dat <- lapply(File.names,read.table,header = TRUE)
names(dat) <- paste("f", 1:length(Num.Files), sep="")
tmp <- stack(lapply(dat,function(x) x[,14]))
require(ggplot2)
ggplot(tmp,aes(x = values)) + 
    facet_wrap(~ind) +
    geom_histogram(aes(y=..count../sum(..count..)))

There are many things wrong here (several of them you haven't encountered yet because they'll happen after the code that's generating your error) but for the moment, could you clarify how you intend to graph histograms for _more than one_ file? Is it supposed to be one histogram for the 14th column of each file, all together? — joran, Aug 09 '12 at 14:35
@joran I would like to have one histogram per file, which includes all the data from column 14 of that particular file — Stephopolis, Aug 09 '12 at 14:37

score 5 · Accepted Answer · answered Aug 09 '12 at 14:48

5

Well, here's something to get you started (but I can't be sure it will work exactly for you, since your code isn't reproducible):

dat <- lapply(File.names,read.table,header = TRUE)
names(dat) <- paste("f", 1:length(Num.Files), sep="")

tmp <- stack(lapply(dat,function(x) x[,14]))

require(ggplot2)
ggplot(tmp,aes(x = values)) + 
    facet_wrap(~ind) +
    geom_histogram()

Ditch everything your wrote after this line:

File.names<-(tk_choose.files(default="", caption="Choose your files", multi=TRUE, filters=NULL, index=1))

and use the above code instead.

A few other explanations (BlueTrin explained the first error):

for (i in Num.Files){
f<- read.table(File.names[i],header=TRUE)
}

This will loop through your file names and read each one, but it will overwrite the previous file each time through the loop. What you'll be left with is only the last file stored in f.

colnames(f) <- c(1:18)
histoCol <- c(f$'14')

You don't need the c() function here. Just 1:18 is sufficient. But numbers as column names are generally awkward, and should probably be avoided.

answered Aug 09 '12 at 14:48

joran

169,992
32
429
468

Is there a better way to change the column names? As I mentioned in the question, the column names have a tendency to change (and unfortunately I cannot do anything about that) but are always in the same order. Switching the names to numbers was the first thing I thought of, but I would be willing to change that if there is a better programming practice. – Stephopolis Aug 09 '12 at 14:55
@Stephopolis Just preprend a character, like you were trying to do with the file variable names. i.e. `colnames(f) <- paste("X",1:18,sep = "")`. – joran Aug 09 '12 at 14:57
How would you get ggplots to show a histogram with percent on the y axis? I tried to modify your code like this: ggplot(tmp,aes(x = values)) + facet_wrap(~ind) + geom_histogram() + scale_y_continuous(labels=percent_format()) but it just added the % sign at the end of counts. – Stephopolis Aug 09 '12 at 18:31
That did the trick for the percent. Unfortunately though your code is not working particularly nicely when I attempt to use more than one file. I will edit the original question to include it. – Stephopolis Aug 09 '12 at 18:58
As I mentioned I am new to R, so it is hard to be particularly exacting. Do you have any suggestion on how to make my code more reproducible? – Stephopolis Aug 09 '12 at 20:01

BlueTrin · Answer 2 · 2012-08-09T14:57:50.040

f(Num.Files) <- paste("f", 1:length(Num.Files), sep = "") : could not find function "f<-"

This specific error happens because you try to assign a string into the result of a function.

This should load the values into a list:

library("lattice");
library("tcltk");
File.names<-(tk_choose.files(default="", caption="Choose your files", multi=TRUE, filters=NULL, index=1));
Num.Files<-NROW(File.names);

result_list = list();
#f(Num.Files)<-paste("f", 1:length(Num.Files), sep="");
#ls();

for (i in Num.Files) {
    full_path = File.names[i];
    short_name = basename(full_path);
    result_list[[short_name]] = read.table(full_path,header=TRUE);
}

Once you run this program, you can type 'result_list$' without the quotes and press TAB for completion. Alternatively you can use result_list[[1]] for example to access the first table.

result_list is a variable of type list, it is a container which supports indexation by a label, which is the filename in this case. (I replaced the full filename with the short filename as the full filename is a bit ugly in a list but feel free to change it back).

Be careful to not use f as a variable, f is a reserved keyword when you create your function. If you try to replace result_list in the program above with f it should fail to work.

I hope it is enough, with the other solution, to get you started !

I was trying to make a set of f variables so when I looped through to read.table I would have one files worth of data stored in that variable. If that makes sense... — Stephopolis, Aug 09 '12 at 14:51
@Stephopolis: if your tables have the same number of columns or rows and you want to concatenate it, you may want to use rbind and cbind. Use the first iteration to assign the array and then concatenate them, if it is a single time series, you can use result_list = c() and then result_list = c(result_list, newelement) to add values to a vector (c represents more or less a vector of elements) — BlueTrin, Aug 09 '12 at 15:04

R file inputs and histogram

2 Answers2