reading data from CSV and reshape it in r

Question

I have this data set from 1980 to 2004 for each month (part of it given below)but I don't know how to read it from CSV and convert it to a matrix which has this form: data[lat,lon,time] in which time starts from 1 to(2004-1980)*12

enter image description here ...

Please provide a [minimal, reproducible data set](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610) (i.e. not a screen dump) together with the code you have tried. Thanks! — Henrik, Oct 08 '13 at 00:29
@Henrik :the data is available here for download: ulmo.ucmerced.edu/w_FireData.html the file name is FedFire8004.zip — SaZa, Oct 08 '13 at 01:05
@Ananda Mahto:Just one more question. Is it possible that I convert those files:for e.g. "Acres" from its original format to netcdf format? — SaZa, Oct 08 '13 at 04:22
@user2607526, I'm sorry, but I don't know much about the netcdf format. There are packages for dealing with the format, but I've never used them. That might be a new question in itself. — A5C1D2H2I1M1N2O1R2T1, Oct 08 '13 at 04:28
@Ananda Mahto:I don't know how I appreciate your help. Thanks again. — SaZa, Oct 08 '13 at 04:36
@AnandaMahto: Is there a way that I can reshape the datafram or matrix to this from: .... lon1 lon2 ........ loni lat1 var11 var12 .... var1i lat2 .... lati.....................varii — SaZa, Oct 08 '13 at 20:05
@user2607526: A suggestion, read the answers at the question that Henrik linked to above and try to create a *minimal* example that reproduces your source data and the output you want to see. Writing something like `...lon1 lon2...` is not descriptive enough for us to really be able to give good advice. By the way, are you sure you aren't just looking for the matrices that are read in automatically when you use `load("fedfire8004.rda")` (that is the content found in `fedfire8004$acres` and `fedfire8004$fires`)? — A5C1D2H2I1M1N2O1R2T1, Oct 09 '13 at 04:57

score 2 · Accepted Answer · answered Oct 08 '13 at 02:15

The data are already there in an .rda data file, so reading it in is easy. Starting with a clean workspace, do the following:

load("fedfire8004.rda")
ls()                  ## What objects were read in?
# [1] "fedfire8004"
str(fedfire8004)      ## What does that object look like?
# List of 10
# $ lon  : num [1:24] -124 -124 -122 -122 -120 ...
# $ lat  : num [1:18] 31.5 32.5 33.5 34.5 35.5 36.5 37.5 38.5 39.5 40.5 ...
# $ x    : num [1:25] -125 -124 -123 -122 -121 -120 -119 -118 -117 -116 ...
# $ y    : num [1:19] 31 32 33 34 35 36 37 38 39 40 ...
# $ year : int [1:300] 1980 1980 1980 1980 1980 1980 1980 1980 1980 1980 ...
# $ month: int [1:300] 1 2 3 4 5 6 7 8 9 10 ...
# $ acres: num [1:24, 1:18, 1:300] NA NA NA NA NA NA NA NA NA NA ...
# ..- attr(*, "dimnames")=List of 3
# .. ..$ lon  : chr [1:24] "-124.5" "-123.5" "-122.5" "-121.5" ...
# .. ..$ lat  : chr [1:18] "31.5" "32.5" "33.5" "34.5" ...
# .. ..$ month: chr [1:300] "1980.1" "1980.2" "1980.3" "1980.4" ...
# $ fires: num [1:24, 1:18, 1:300] NA NA NA NA NA NA NA NA NA NA ...
# ..- attr(*, "dimnames")=List of 3
# .. ..$ lon  : chr [1:24] "-124.5" "-123.5" "-122.5" "-121.5" ...
# .. ..$ lat  : chr [1:18] "31.5" "32.5" "33.5" "34.5" ...
# .. ..$ month: chr [1:300] "1980.1" "1980.2" "1980.3" "1980.4" ...
# $ meta : chr "USFS, NPS, BLM, BIA total fires and acres on 1 degree monthly grid 1980-2004"
# $ cite : chr "Westerling, A.L., T.J. Brown, A. Gershunov, D.R. Cayan and M.D. Dettinger, 2003: Climate and Wildfire in the Western United Sta"| __truncated__

As you can see, the core data seems to be the acres and the fires list items. It might be more convenient to reshape those into a long dataset. The most direct way to do this is probably melt from the "reshape2" package.

library(reshape2)
Acres <- melt(fedfire8004$acres)
Fires <- melt(fedfire8004$fires)

Let's view the first few and last few rows of each of these new objects.

head(Acres)
#      lon  lat  month value
# 1 -124.5 31.5 1980.1    NA
# 2 -123.5 31.5 1980.1    NA
# 3 -122.5 31.5 1980.1    NA
# 4 -121.5 31.5 1980.1    NA
# 5 -120.5 31.5 1980.1    NA
# 6 -119.5 31.5 1980.1    NA
tail(Acres)
#           lon  lat   month value
# 129595 -106.5 48.5 2004.12     0
# 129596 -105.5 48.5 2004.12     0
# 129597 -104.5 48.5 2004.12    71
# 129598 -103.5 48.5 2004.12    NA
# 129599 -102.5 48.5 2004.12    NA
# 129600 -101.5 48.5 2004.12    NA
head(Fires)
#      lon  lat  month value
# 1 -124.5 31.5 1980.1    NA
# 2 -123.5 31.5 1980.1    NA
# 3 -122.5 31.5 1980.1    NA
# 4 -121.5 31.5 1980.1    NA
# 5 -120.5 31.5 1980.1    NA
# 6 -119.5 31.5 1980.1    NA
tail(Fires)
#           lon  lat   month value
# 129595 -106.5 48.5 2004.12     0
# 129596 -105.5 48.5 2004.12     0
# 129597 -104.5 48.5 2004.12     2
# 129598 -103.5 48.5 2004.12    NA
# 129599 -102.5 48.5 2004.12    NA
# 129600 -101.5 48.5 2004.12    NA

This is great. I didn't know how to work with .rda file. Thank you so much. — SaZa, Oct 08 '13 at 04:08
@user2607526, no problem. `.rda` is one of the common extensions used for specifying the R data file format. — A5C1D2H2I1M1N2O1R2T1, Oct 08 '13 at 04:20

beroe · Answer 2 · 2013-10-08T00:47:19.063

You should (always) try to reorganize your data so that each column contains one type of information:

Year  Month  Lat  Lon  Value

A python script might be the best way to do this... Once you have it in this style, it will be easy to import and analyze in R.

I made a script that will reorganize your data for you... but it's not clear if it would be easy for you to run it. What system are you on?

Here is the script... the output is below...

#!/usr/bin/env python
import csv

file_obj = open('originaldata.txt', 'r')
Input = csv.reader(file_obj, delimiter='\t')

LineNo = 0
year,month,data = [],[],[]
for items in Input:
    if LineNo == 0:
        lat = items[2:]
    elif LineNo == 1:
        lon = items[2:]
    else:
        year.append(items[0])
        month.append(items[1])
        data.append(items[2:])
    LineNo += 1

# print header
print "%s\t%s\t%s\t%s\t%s"% ("Year","Month","Lat","Lon","Data")
for La,Lo,Ind in zip(lat,lon,range(len(lat))):
    for Y,M,D in zip(year,month,data):
        print "%s\t%s\t%s\t%s\t%s"% (Y,M,La,Lo,D[Ind])

Output from the script:

Year  Month  Lat     Lon    Data
1980    1   31.5    -111.5  0
1980    2   31.5    -111.5  0
1980    3   31.5    -111.5  0
1980    4   31.5    -111.5  0
1980    5   31.5    -111.5  8.1
1980    6   31.5    -111.5  5.1
1980    7   31.5    -111.5  0
1980    8   31.5    -111.5  0
1980    9   31.5    -111.5  0
1980    10  31.5    -111.5  0
1980    11  31.5    -111.5  0
1980    12  31.5    -111.5  0
1981    1   31.5    -111.5  0
1981    2   31.5    -111.5  0
1981    3   31.5    -111.5  0
1981    4   31.5    -111.5  0
1981    5   31.5    -111.5  0
1981    6   31.5    -111.5  0
1981    7   31.5    -111.5  0
1981    8   31.5    -111.5  0
1981    9   31.5    -111.5  0
1981    10  31.5    -111.5  0
1981    11  31.5    -111.5  0
1981    12  31.5    -111.5  0
1980    1   31.5    -110.5  0
1980    2   31.5    -110.5  0
1980    3   31.5    -110.5  0
1980    4   31.5    -110.5  881
1980    5   31.5    -110.5  794.1
1980    6   31.5    -110.5  644.4
1980    7   31.5    -110.5  85.2
1980    8   31.5    -110.5  0.1
1980    9   31.5    -110.5  0
1980    10  31.5    -110.5  0
1980    11  31.5    -110.5  0
1980    12  31.5    -110.5  0
1981    1   31.5    -110.5  0
1981    2   31.5    -110.5  0
1981    3   31.5    -110.5  0
1981    4   31.5    -110.5  0
1981    5   31.5    -110.5  0
1981    6   31.5    -110.5  0
1981    7   31.5    -110.5  0
1981    8   31.5    -110.5  0
1981    9   31.5    -110.5  0
1981    10  31.5    -110.5  0

I am not familiar with Python. Does it take to much time to do it with Python? because I need this to be done by this Wednesday. — SaZa, Oct 08 '13 at 00:17
Well, you could copy-paste if it is a one-shot deal...but it might get a bit long considering the size of your data set. I'll edit my answer to show the format I was envisioning, if it is unclear. — beroe, Oct 08 '13 at 00:22
yes I can copy-paste but I wanted to do copy-paste if there was no other easier way to do this. — SaZa, Oct 08 '13 at 00:27
Don't feel rushed to accept an answer until your problem gets sorted out! ;^) With windows is a bit harder to explain in comments, but do-able. If you want to post your raw data file somewhere I can run it through the script and re-post the new format... — beroe, Oct 08 '13 at 00:49

score 0 · Answer 3 · answered Oct 08 '13 at 00:16

0

Loading is easy

meaningful.name<-read.csv(file.choose(new = FALSE))
meaningful.name<-as.matrix(meaningful.name)
meaningful.name$time<-1:nrow(meaningful.name)

After than I do not know what do you are after, can you please clarify?

answered Oct 08 '13 at 00:16

Josh Guilbert

45
7

I can load the data but I don't want to keep it in original format. I want it to be in format of lat*lon*time. – SaZa Oct 08 '13 at 00:22
Ok. maybe I didn't get what you said. let me try it first.Thanks – SaZa Oct 08 '13 at 00:35
Sorry I am still very confused at what you want to do. The data you have above has lat lon/month on the same column and just 1:9 below them followed by three columns with 31.5 at the top. What does lat, lon stand for and what are the other three columns for or does the sheet keep going? – Josh Guilbert Oct 08 '13 at 00:37
@JoshGuilbert the OP has some data (lat/lon) across the top of each data column, instead of associated with data values. See my answer below for what I think is the proper data format... – beroe Oct 08 '13 at 00:45
@beroe and @ Josh Guilbert: the data is available here for download:http://ulmo.ucmerced.edu/w_FireData.html the file name is FedFire8004.zip so the first part is lat the second part is lon and the rest is data for 1980 to 2004 for that lat-lon – SaZa Oct 08 '13 at 00:56

reading data from CSV and reshape it in r

3 Answers3