1

I have the following dataframes. I would like to match values in column X across the two data frames (values of X across the two columns are the same) and add the values in Cost column and create a new dataframe that contains column X, column lat, column long and then shows consolidated/summed up values as shown below

data1
        X    Cost    lat       long        
    1 RA123  60     -113.8       NA
    2 RA124  50     -113.8       NA
    3 RB131  40     -113.8       NA
    4 RB132  30     -113.8       NA
    5 RB133  20     -113.8       NA
    6 RC134  10     -113.8       NA
    7 RC135  No     -113.9       NA
    8 RD136         -113.7       NA

data2
   X     Cost    lat       long        
1 RA123  10    -113.8      NA
2 RA124  20    -113.8       NA
3 RB131  40    -113.8      NA
4 RB132  30    -113.8       NA
5 RB133  50    -113.8       NA
6 RC134  60    -113.8       NA
7 RC135  No     -113.9       NA
8 RD136         -113.7       NA

consolidated # this dataframe contains sum of cost from dataframe 1 and 2 but also has original columns

       X     Cost    lat       long        
    1 RA123  70     -113.8       NA
    2 RA124  70     -113.8       NA
    3 RB131  80     -113.8       NA
    4 RB132  60     -113.8       NA
    5 RB133  70     -113.8       NA
    6 RC134  70     -113.8       NA
    7 RC135  No     -113.9       NA
    8 RD136         -113.7       NA

Some values in "Cost" are text strings and some are blank/no values. I'd like to ignore text strings for now and where there is no value/empty cell, I'd like an NA.

Update: I used the following code to get the summed up values for Cost and retain all original columns and overlook text/string and NA values i.e. wherever there is a text string, the value shows NA. The empty cells where there was no value entered also shows NA:

Consolidated$Cost<- as.numeric(as.character(Consolidated$cost.x)) + as.numeric(as.character(Consolidated$cost.y))

Now, what I need to do be able to do is wherever there were text strings, I need to be able to display "TextValue" as the value as opposed to NA. Just to highlight, whatever the string value might be, it will be the same for any given value of X across data1 and data2 both.

Update: I've had to rename those rows in consolidated column as "TextValue" using:

Consolidated$Cost[is.na(Consolidated$Cost)] <- "TextValue" 

I was hoping to do so within the first step itself. But I guess this works too.

oivemaria
  • 453
  • 4
  • 20
  • Tell us more about column `X`, is it the same in `data1` and `data2`? – Julius Vainora Feb 09 '16 at 22:23
  • Oh yes, sorry. Column in X is the same. May not be in the same order but the X values will be same across the two columns – oivemaria Feb 09 '16 at 22:23
  • `data2$Cost <-data2$Cost + data1$Cost[match(data2$X, data1$X)]` – Stephen Henderson Feb 09 '16 at 22:26
  • I just ran this code and got an error and noted that there are some string values and empty cells in Cost column as well. I want string values to remain as it is and empty cells to display NA. Also, your code creates cost in the same dataset data2. I want a fresh dataframe that contains original columns and the cost column should be the summed up/consolidated value of the original 2 dataframes – oivemaria Feb 09 '16 at 22:36
  • The solution I found, not considering non numeric values, was: data2[data2$X %in% data1$X,]$Cost <- data2[data2$X %in% data1$X,]$Cost + data1[data1$X %in% data2$X,]$Cost – pnunes540 Sep 30 '17 at 17:53

1 Answers1

1

I think you could do this with a simple merge and addition. First to setup a subset of your data, with the aforemention empty string values:

data1 <- data.frame(X=c("RA123", "RA124", "RA131"),
                    COST=c("60", "50", ""),
                    lat=c(-113.8,-113.8, -113.8),
                    long = c("NA", "NA", "NA"))
data2 <- data.frame(X=c("RA123", "RA124", "RA131"),
                    COST=c("10", "20", "30"),
                    lat=c(-113.8,-113.8, -113.8),
                    long = c("NA", "NA", "NA"))

Then you can remove the empty string values from your data frames by doing:

data1 <-subset(data1,COST!="")

Then you can convert your factor COST column to a numeric as described in this SO post:

data1$COST <- as.numeric(as.character(data1$COST))
data2$COST <- as.numeric(as.character(data2$COST))

This will replace your empty string values with "NA" numeric values. Once in that form a numeric "NA" + any other numeric just equals another "NA". So you can continue on with the merge I originally suggested:

combined <- merge(data1,data2, by="X")
combined$COST <- combined$COST.x + combined$COST.y

From there you can drop columns using a subset(combined, select = -c(lat.x,long.x,...) ). You also can add the all=TRUE or all.x=TRUE or all.y=TRUE to change the behavior in terms of keeping the values from data1 or data2. See this SO question for more details on the merge function.

Community
  • 1
  • 1
JHowIX
  • 1,683
  • 1
  • 20
  • 38
  • Thanks. Merging is fine by challenge is when adding up cost values - some values in that column are text and some cells are blank. I tried is.numeric but it won't overlook text and wont put NA where the cells are empty – oivemaria Feb 09 '16 at 23:14
  • should be easily accomplished with some pre-prossing, see edits – JHowIX Feb 10 '16 at 00:31
  • Thanks - but what Im trying to get is to not sum those value of cost where there is a string or an NA but I still want those X values for which cost has a string or an NA to show up in the consolidated dataset. Subsetting removes those values which isn't what I want – oivemaria Feb 10 '16 at 04:25
  • see edits, please let me know if you have any problems – JHowIX Feb 12 '16 at 15:06
  • Thanks - i used part of your suggestion and merged the items but had to rename those rows in the consolidated column that had text values in the individual columns i.e. textvalue1 in C1 and textvalue 2 in C2, therefore consolidated column had NA. I renamed those values in consolidated column as "text value". Updated the code in description to show what i did – oivemaria Feb 13 '16 at 14:18