1

I have a larger data frame in R that I am looking to do some calculations with by row name. The data frame shown partially below, contains 236 weather stations with each station displayed nine times because each station has nine separate forecast hours for each weather variable. The fact that that each hour is not listed in its own column is irrelevant as I am looking to add the snowfall together and average the wind and temperature. Thus, my final data frame will concatenate the data frame into 236 rows (one for each station) with a total snowfall, average wind speed, and average temperature.

I have tried numerous functions within the apply family, and attempted this within the dplyr package, but it doesn't like row names. I'm also having trouble getting the data frame into an actual format (including R's proper data frame format) so that the row names and weather variables can have calculations done on them together. I listed the row names as a character, and cbind them to my original data frame as a matrix, but that doesn't work either.

Stuck here, any ideas?

Here is a small portion of my data frame (the copy paste functionality here won't let me keep it coherent looking so I added one station as best I could so the quotes aren't in the actual data) I do not have image rights yet. Thanks.

                    snowfall      sfc.wind             Tavg
EET - Alabaster, AL      "0"     "5.606221"       "45.38081"

edit: I was told below how to properly paste in my matrix and here it is.

Sean's code below fixes my issue. It has the green check mark next to it.

                      snowfall  sfc.wind      Tavg
EET - Alabaster, AL  0.00000000 4.5129950 39.490030
EET - Alabaster, AL  0.00000000 4.5047869 36.087611
EET - Alabaster, AL  0.00000000 5.0126637 39.441394
EET - Alabaster, AL  0.00000000 5.0111759 45.682309
EET - Alabaster, AL  0.00000000 2.8716592 42.776499
EET - Alabaster, AL  0.00000000 2.7937856 37.322987
EET - Alabaster, AL  0.00000000 2.5351705 36.701948
EET - Alabaster, AL  0.00000000 1.9576756 34.456469
EET - Alabaster, AL  0.00000000 1.6846636 34.150641
BHM - Birmingham, AL 0.00000000 4.5466909 38.533949
BHM - Birmingham, AL 0.00000000 4.4607041 34.891818
BHM - Birmingham, AL 0.00000000 5.1888168 38.405422
BHM - Birmingham, AL 0.00000000 5.4596529 44.992042
BHM - Birmingham, AL 0.00000000 3.0826392 42.159321
BHM - Birmingham, AL 0.00000000 2.8546392 36.715275
BHM - Birmingham, AL 0.00000000 2.5729845 36.133261
BHM - Birmingham, AL 0.00000000 2.0355549 33.933232
BHM - Birmingham, AL 0.00000000 1.7289972 33.543341
DCU - Decatur, AL    0.00122047 3.6517845 34.109912
DCU - Decatur, AL    0.00000000 3.6832448 31.485904
DCU - Decatur, AL    0.00000000 4.2819648 35.502855
DCU - Decatur, AL    0.00000000 5.2777885 43.234060
DCU - Decatur, AL    0.00003937 3.0233904 40.613362
DCU - Decatur, AL    0.00003937 2.7680023 35.587844
DCU - Decatur, AL    0.00000000 2.0555607 34.899179
DCU - Decatur, AL    0.00000000 1.4499551 32.708740
DCU - Decatur, AL    0.00000000 1.2004947 32.616132
  • http://stackoverflow.com/questions/6289538/aggregate-a-dataframe-on-a-given-column-and-display-another-column might help – user20650 Jan 23 '15 at 23:11
  • Hi, to format the code you can either indent each line by four spaces or highlight all the code and click the curly braces `{}`. Also a better way to share data is using `dput`; as in `dput(yourdata[1:10, ] )` to share the first ten rows.. Also if you add an image to an external hosting website and add the link here, someone will add it – user20650 Jan 23 '15 at 23:14
  • Try using `dput()` to give an example of your data. E.g. post the results of `dput(yourDataFrame[1:3, ])` to give the first few rows. It sounds like your data is not actually a data frame? – Sean Hughes Jan 23 '15 at 23:15
  • 1
    Quick comment on your one row of data - it looks like character rather than numeric. You can check this with `str(yourdata)` – user20650 Jan 23 '15 at 23:20
  • 1
    R data frames can't have duplicate row names, but matrices can, which leads me to suspect your data is a matrix. Try `df <- data.frame(yourMatrix)`, which will drop the row names. Add them as a column with `df <- cbind(rownames(yourMatrix), df)`. Then you at least will have a data frame with the row names as a column to manipulate. – Sean Hughes Jan 23 '15 at 23:23
  • Thank you Sean and User20650. Yes, I've been working with the str() for quite a while trying to find a suitable format so that calculations can be made. Let me try your code here Sean. – adamweather33 Jan 23 '15 at 23:27
  • You say you're having problems with repeated row names and you only show one row. That does not help people solve the problem – Rich Scriven Jan 23 '15 at 23:29
  • Once you get a data frame with the data and locations as columns, this code will generate the results that (I think) you want: `library(dplyr) df %>% group_by(nameOfWeatherStationsColumn) %>% summarize( snowfall = sum(snowfall), wind = mean(sfc.wind), temp = mean(Tavg) )` If your data columns are still character or factor, use `df$snowfall <- as.numeric(as.character(df$snowfall)` on each column to make them numeric. – Sean Hughes Jan 23 '15 at 23:45
  • Sean! that code worked perfectly! Thank you so much. One thing I did change though is the cbind code. I omitted the rownames sections and it worked fine as: df <- cbind( myrownames, df) – adamweather33 Jan 24 '15 at 00:35
  • Also, I'm happy that it works with dplyr. A good chunk of my previous code that allows me to create this data frame is written with dpylr. The original data scrape uses the rNOMADS package which can pull any of NOAA's Numerical Weather Prediction data in case any of you are interested. – adamweather33 Jan 24 '15 at 00:39
  • @SeanHughes or Adam; maybe good to post an answer, if it is solved. Adam, it may also be goor for you to edit your question with some more example data so that it may be useful for future users/searches . bw – user20650 Jan 24 '15 at 00:46
  • 1
    absolutely, good idea. Can you show me again how to post code properly? I am attempting to put a sample code with the {} {sample<- code } and it doesn't work. Also, should i copy and paste the output of (dput(yourdata[1:10, ] ) into a text box here? That didn't work either. I'd like to be able to post my data frame properly. – adamweather33 Jan 24 '15 at 01:17
  • 1
    The icons above an open editing box include one that that looks like paired curley braces. Select your code block and then click that icon. – IRTFM Jan 24 '15 at 01:22
  • Hi Adam, as BondedDaust says, you can just copy and paste the results from `dput` into your question. You then highlight this and look for the curly braces icon (ujst above the active question area). The same applies to code - just copy and paste it to your question, highlight it, and hit the curly braces icon. – user20650 Jan 24 '15 at 01:42

2 Answers2

1

Try using tapply and assuming that your dataframe is named dat:

with(dat, tapply(snowfall, rownames(dat) , 
                    function(x) sum(as.numeric(x),na.rm=TRUE) )
    )
#-----------
EET - Alabaster, AL    
                     0 
IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • yep, that works well, but dplyr does all the variables in a few lines of code. I appreciate the post. Glad to see it works in base code as well. – adamweather33 Jan 24 '15 at 01:18
1

R data frames can't have duplicate row names, but matrices can. You need a data frame, so you can have data of different types in different columns. When you convert a matrix with duplicate row names to a data frame, the row names get dropped, so you need to add them back as a column.

df <- data.frame(yourMatrix) # convert to data frame, drop row names
df <- cbind(station = rownames(yourMatrix), df) # add row names as column 

To apply operations to all rows with the same weather station, use dplyr.

library(dplyr)
df %>% 
    group_by(nameOfWeatherStationsColumn) %>% 
        summarize( 
            snowfall = sum(snowfall), 
            wind = mean(sfc.wind), 
            temp = mean(Tavg) 
        )
Sean Hughes
  • 328
  • 3
  • 7