-1

I have a data frame called "data", that has "date, month, discharge, and station" columns. Another data frame called "perc" that has "month, W1_Percentile, and B1_Percentile" columns. W1_Percentile and B1_Percentile are the monthly percentile values for each of the gauging stations. I want my final output to have columns same as in df(data) with an additional column for "Percentile" that will have the percentile values for the respective month and gauging station (percentile values of each gauging station for the respective months is stored in df(perc)). What steps should I follow?

Here is the sample of input data:

date <- as.Date(c('1950-03-12','1954-03-23','1991-06-27','1997-09-04','1991-06-27','1987-05-06','1987-05-29','1856-07-08','1993-06-04', '2001-09-19','2001-05-06','2001-05-27'))
month <- c('Mar','Mar','Jun','Sep','Jun','May','May','Jul','Jun','Sep','May','May')
disch <- c(125,1535,1654,154,4654,453,1654,145,423,433,438,6426)
station <- c('W1','W1','W1','W1','W1','W1','B1','B1','B1','B1','B1','B1')
data <- data.frame("Date"= date, "Month" = month,"Discharge"=disch,"station"=station)

      Date Month Discharge station
1  1950-03-12   Mar       125      W1
2  1954-03-23   Mar      1535      W1
3  1991-06-27   Jun      1654      W1
4  1997-09-04   Sep       154      W1
5  1991-06-27   Jun      4654      W1
6  1987-05-06   May       453      W1
7  1987-05-29   May      1654      B1
8  1856-07-08   Jul       145      B1
9  1993-06-04   Jun       423      B1
10 2001-09-19   Sep       433      B1
11 2001-05-06   May       438      B1
12 2001-05-27   May      6426      B1

Month <- c('Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec')
W1 <- c(106,313,531.40,164.10,40,23.39,18.30,24,16,16,12,34)
B1 <- c(1330,1550,1948,1880,1260,853.15,680.15,486.10,503,625,738,1070)
perc <- data.frame("Month"=Month,"W1_Percentile"=W1,"B1_Percentile"=B1)

 Month W1_Percentile B1_Percentile
1    Jan        106.00       1330.00
2    Feb        313.00       1550.00
3    Mar        531.40       1948.00
4    Apr        164.10       1880.00
5    May         40.00       1260.00
6    Jun         23.39        853.15
7    Jul         18.30        680.15
8    Aug         24.00        486.10
9    Sep         16.00        503.00
10   Oct         16.00        625.00
11   Nov         12.00        738.00
12   Dec         34.00       1070.00

This is how I want the final output to look like:

         Date Month Discharge station Percentile
1  1950-03-12   Mar       125      W1     531.40
2  1954-03-23   Mar      1535      W1     531.40
3  1991-06-27   Jun      1654      W1      23.39
4  1997-09-04   Sep       154      W1      16.00
5  1991-06-27   Jun      4654      W1      23.39
6  1987-05-06   May       453      W1      40.00
7  1987-05-29   May      1654      B1    1260.00
8  1856-07-08   Jul       145      B1     680.15
9  1993-06-04   Jun       423      B1     853.15
10 2001-09-19   Sep       433      B1     503.00
11 2001-05-06   May       438      B1    1260.00
12 2001-05-27   May      6426      B1    1260.00
Roger
  • 33
  • 4
  • You can literally just say `data$Percentile <- percent` – heds1 Jun 11 '19 at 21:52
  • Yes.. this question (even after the edit) does not make much sense. In `perc` there is not no data for A1, and your Percent data is sequentially same as the Percentile column of your expected output. – ghosh'. Jun 11 '19 at 21:54
  • @ghos-h : You're right, the question made no sense to me as well after re-reading it! Apologies for having several errors in the input data and not being clear with the question. I've edited it, hopefully it's much easier to understand now. – Roger Jun 12 '19 at 16:05
  • @Gregor : Apologies for the errors in the input data and for not being very clear with the question. I've edited the code and made it slightly cleaner now. Very new to coding, R, and SO... Slowly learning the basics. Once again, apologies for a clumsy presentation and hopefully I can get some solution. – Roger Jun 12 '19 at 16:16
  • @heds1: Thanks for the suggestion. However that's not exactly how I want the output to look like. Apologies for not being clear with the question and errors in the input data. I've edited the post. Hopefully, its more clear now. – Roger Jun 12 '19 at 16:28

1 Answers1

0

We need to first convert your perc data into a long format so that we have the columns we want to add to data, then it's a simple join:

library(tidyr)
library(dplyr)

# make the column names the same as the values in data
names(perc)[2:3] = c("W1", "B1")
# convert to long format
perc_long = gather(perc, key = "station", value = "percentile", W1, B1)

# join
left_join(data, perc_long)
# Joining, by = c("Month", "station")
#          Date Month Discharge station percentile
# 1  1950-03-12   Mar       125      W1     531.40
# 2  1954-03-23   Mar      1535      W1     531.40
# 3  1991-06-27   Jun      1654      W1      23.39
# 4  1997-09-04   Sep       154      W1      16.00
# 5  1991-06-27   Jun      4654      W1      23.39
# 6  1987-05-06   May       453      W1      40.00
# 7  1987-05-29   May      1654      B1    1260.00
# 8  1856-07-08   Jul       145      B1     680.15
# 9  1993-06-04   Jun       423      B1     853.15
# 10 2001-09-19   Sep       433      B1     503.00
# 11 2001-05-06   May       438      B1    1260.00
# 12 2001-05-27   May      6426      B1    1260.00

There are many ways to do these operations, it's essentially a combination of two R-FAQs. For additional reference see

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294