-2

I want to import multiple data in R and find the average of the third column of each files. I have shown example below.

I have imported multiple files in R using Ramnath's solution from Import multiple text files in R and assign them names from a predetermined list . The code I have used so far is as follows:

#Import mulitple text using following code: files with extension *.dat
txt_files =list.files(pattern='\\.dat$')
data_list=lapply(txt_files,read.table,sep="\t",header=T)

Used Nico's answer to change to data frame from R list to data frame

  # Change the list to dataframe
    hello <- as.data.frame(do.call(rbind,data_list))
    dim(hello)

# Using 12 files I got the following information
> dim(hello)
[1] 58536     1

Each file has 4878 number of rows. This is not what I am looking for. What above code did is merged all the data into one data frame based on rows.

I want it by columns and be able to calculate the average of third column from each file. I want to use the third column of each file and the find the array of average.

The sample of what I want is as follows:

File 1

Lat Long Value
10 12  15
12 13  16

File 2

 Lat Long Value
    10  12 11
    12 13  15  

Final File

Lat Long  Value
10 12  13
12 13  15.5

As you can see for the final file, the first two columns are same, only thing that is different is the third column which is the average of two values from two files. So, I want to use my data to change to the data frame similar to the final file as shown above.

Community
  • 1
  • 1
Jd Baba
  • 5,948
  • 18
  • 62
  • 96
  • Using that it created a data frame but all the data are stacked as rows. At first for each file I had 4878 rows, now I have 58536 rows. I want to find the average of corresponding third column for unique first (Lat) and second column (long). Now, I have 12 data for each lat and long. How can I take average of those values ? – Jd Baba Feb 04 '13 at 04:59
  • Your description and what your example shows are not consistent. Do you want the average value for each `value` column from each file, or the average for the unique combinations of `Lat` and `Long`? You aren't using nicos answer, but mareks..... – mnel Feb 04 '13 at 05:01
  • Given your new comment, your `dim` result doesn't appear to be consistent. What you are trying to do with this large data set is `aggregating`. This is perhaps the most commonly asked question on SO. Look at the function `aggregate`. or Search `[r][aggregate]` on SO – mnel Feb 04 '13 at 05:03

1 Answers1

2

Group by coordinates

Combining things by rows is all right, as long as you don't require your final list to be in any particular order, and don't have different rows with the same coordinates. In that case, you can simply use common coordinates to group rows, and then aggregate over them like this:

aggregate(Value ~ Lat + Lon, hello, mean)

Group by row numbers

If, on the other hand, you have duplicate coordinates, or want the final result to be in the same order as all the inputs, then you should extract the Value column from each data.frame and combine them into a matrix. Then you can compute the mean for each matrix row, and combine those means with the two coordinate columns of any input data frame. This whole approach relies heavily on the order of input data rows, i.e. on the row number of a given place being the same in all files. You could implement it like this:

mean_values <- apply(do.call(cbind, lapply(data_list, function(df) df$Value)), 1, mean)
cbind(data_list[[1]][1:2], Value=mean_values)

Trying this out

Here is an example session of what this looks like on my system:

> data_list <- list(File.1=data.frame(Lat=c(10,12),Lon=c(12,13),Value=c(15,16)),
                    File.2=data.frame(Lat=c(10,12),Lon=c(12,13),Value=c(11,15)))
> hello <- as.data.frame(do.call(rbind,data_list))
> dim(hello)
[1] 4 3
> str(hello)
'data.frame':   4 obs. of  3 variables:
 $ Lat  : num  10 12 10 12
 $ Lon  : num  12 13 12 13
 $ Value: num  15 16 11 15
> aggregate(Value ~ Lat + Lon, hello, mean)
  Lat Lon Value
1  10  12  13.0
2  12  13  15.5
> value_matrix <- do.call(cbind, lapply(data_list, function(df) df$Value))
> value_matrix
     File.1 File.2
[1,]     15     11
[2,]     16     15
> mean_values <- apply(value_matrix, 1, mean)
> cbind(data_list[[1]][1:2], Value=mean_values)
  Lat Lon Value
1  10  12  13.0
2  12  13  15.5

Only a single column?

As you only get a single column from reading your input files, according to your dim output, you should investigate that data frame using head or str to see what went wrong. Most likely, your columns aren't separated by tabs but by commas or spaces or some such. Notice that if you do not spcify sep, then any sequence of spaces and / or tabs will be used as a column separator. Read the documentation for read.table for details.

MvG
  • 57,380
  • 22
  • 148
  • 276
  • Hi MvG, thank you so much for your detailed answer. It seems I am having problem with converting the list to dataframe. The code I used above changes the list to data frame however, dimension doesn't seem to be right. I have actually 3 columns, but the changed dataframe only shows 1 column when I use dim(hello). Since, I have this problem I couldn't test your code. Thanks. – Jd Baba Feb 04 '13 at 07:47
  • wow that's very nice of you to show every details. I really appreciate your time and this nice information. Works perfect now. – Jd Baba Feb 04 '13 at 08:31