Plots from R Loop using two, different sized, dataframes

Question

I have 2 dataframes of different sizes - one being about 300 lines and the other about 30 lines. The sizes will vary depending on the input selected. I have successfully constructed R-code that will plot the output of a loop for each input, but I can't figure out how to put all the iterations onto a single chart. There are numerous articles about multiple plots, but I have not had success with any so far, that is different sized dataframes, plotting all iterations (of different sizes) on one chart (not multiple charts on one page - one chart on one page). Below is the code used to generate the individual charts - I just can't figure out how to get them all on the same chart....

WellS <- rep(WellSelect[i], length(EW))
WellC <- rep(WellSelect[i], length(X))
dfSurvey <- data.frame(Well = WellS, MD = MD, EW = EW, NS = NS, TVD = TVD)
dfCalc <- data.frame(Well = WellC, Perf = P, X = X, Y = Y, TVDp = TVDp)

The code above compiles calculations not shown here into the dataframes dfSurvey and dfCalc. Note that "WellSelect" is the primary variable that drives the input of raw data for the calculations. There could be anywhere from 2 to 4000+ unique "WellSelect" possibilities, each having 2 dataframes of the sizes mentioned in the first sentence - all unique to the "WellSelect". Everything works with the exception of the chart mentioned. I've tried to bind the dataframes, but don't know how to do it on different sized df's.

pname <- paste0(dfSurvey$Well[i])
p <- ggplot() + geom_point(data = dfSurvey, aes(x=EW, y=NS), shape = 2,    size = 2, color = "blue1") +
  geom_point(data = dfCalc, aes(x=X, y=Y), shape = 17, size = 5, color = "Chartreuse3") +
  ggtitle(pname)
ggsave(paste0(pname, ".png"), p)
print(p)

Note that "dfSurvey" is the larger dataframe and "dfCalc" is the smaller. I'd appreciate some guidance.

EDITED TO INCLUDE DATASET AND EXAMPLE PLOTS:

Here is the plot I currently get with the coding:

Each "WellSelect" currently generates its own plot

This is what I am trying to achieve:

Combined plot, note red triangles represent "dfCalc" and solid lines are "dfSurvey"

There are abbreviated example datasets for "DS" and "Perf" at these links:

DS (note that plotted variables are EW vs. NS): https://drive.google.com/open?id=0B5pFHCTpv6BWTUh3MWJoaVhaT0kxZzJFVWJ4QTFaM0Q5S29j

Perf: https://drive.google.com/open?id=0B5pFHCTpv6BWMjhLZnF3Zk9mM0hZaXYxLWVKUlBnWXlPQ0xB

I have included the entire breadth of the code below which should run using the files above with the culmination being the individual plots as shown previously. The code is not efficient, I know, but I am new to this so I just need something that works for now.

library(ggplot2)

DS <- read.csv(file = "DirectionalSurveys.csv")
Perf <- read.csv(file = "Perforation.csv")

colnames(DS) <- c "IDWELL", "API", "WellName", "Division", "MD", "INCL", "AZIM", "NS", "EW", "TVD", "DLS")
colnames(Perf) <- c("IDWELL", "API", "WellName", "County", "MidPerfMD", "MidPerfTVD")

WellSelect <- c("LINDA GREATHOUSE BRK 1", "LINDA GREATHOUSE BRK 3", "LINDA GREATHOUSE BRK 5", "LINDA GREATHOUSE BRK 205",
            "BARRY GREATHOUSE A 5", "BARRY GREATHOUSE A 10", "BARRY GREATHOUSE B 3")

for(i in seq_along(WellSelect)) {

    S <- DS$MD[DS$WellName == WellSelect[i]]
    P <- Perf$MidPerfMD[Perf$WellName == WellSelect[i]]
    INCL <- DS$INCL[DS$WellName == WellSelect[i]]
    AZIM <- DS$AZIM[DS$WellName == WellSelect[i]]
    NS <- DS$NS[DS$WellName == WellSelect[i]]
    EW <- DS$EW[DS$WellName == WellSelect[i]]
    TVD <- DS$TVD[DS$WellName == WellSelect[i]]

    #Subset to get the survey depths deeper than "P"
    resultGT <- outer(S, P, '>=')
    resultGT[resultGT == FALSE] <- 50
    rownames(resultGT) <- paste0(S)
    colnames(resultGT) <- paste0("P=", P)
    minGT <- as.numeric(rownames(resultGT)[apply(resultGT , 2, which.min)])

    #P is mid-perf MD for each stage, Deep is Survey depth below P, Shallow is Survey depth above P

    deep <- S[match(minGT, S)]
    shallow <- S[match(minGT, S) - 1]

    #Subset "DS" to WellSelect
    Sub1 <- DS[DS$WellName == WellSelect[i], ]

    #Subset Sub1 to get the Survey data
    Sub2 <- Sub1[ , 5]

    #Match deep and shallow to the Survey depths to get location in DS
    deepRow <- match(deep, Sub2)
    shallowRow <- match (shallow, Sub2)

    #Pull the other data for deep and shallow from DS
    deepData <- Sub1[deepRow, ]
    shallowData <- Sub1[shallowRow, ]

    #Calculate Survey Variables

    AA29 <- 2*3.1416/360
    AY <- shallowData[ , "INCL"] + ((P - shallowData[ , "MD"]) / (shallowData[ , "MD"] - deepData[ , "MD"]) * (shallowData[ , "INCL"] - deepData[ , "INCL"] ))
    AZ <- shallowData[ , "AZIM"] + ((P - shallowData[ , "MD"]) / (shallowData[ , "MD"] - deepData[ , "MD"]) * (shallowData[ , "AZIM"] - deepData[ , "AZIM"] ))
    BA <- 0.000001 + acos(cos(AY * AA29 - shallowData[ , "INCL"] * AA29) - sin(shallowData[ , "INCL"] * AA29) * sin(AY * AA29) * (1 - cos(shallowData[ , "AZIM"] * AA29 - AZ * AA29)))
    BB <- 2 / BA * (tan(BA / 2))

    ##NOTE:  "X" and "Y" below are the plotted variables for the red triangles shown on the plots previously##

    Y <- (P - shallowData[ , "MD"]) * ((sin(AY * AA29) * cos(AZ * AA29)) + (sin(shallowData[ , "INCL"] * AA29) * cos(shallowData[ , "AZIM"] * AA29))) / 2 * BB + shallowData[ , "NS"]
    X <- (P - shallowData[ , "MD"]) * ((sin(AY * AA29) * sin(AZ * AA29)) + (sin(shallowData[ , "INCL"] * AA29) * sin(shallowData[ , "AZIM"] * AA29))) / 2 * BB + shallowData[ , "EW"]
    TVDp <- (P - shallowData[ , "MD"]) * (cos(AY * AA29) + cos(shallowData[ , "INCL"] * AA29)) / 2 * BB + shallowData[ , "TVD"]

    #***********************************************************#
    #Calculations all done, now on to the graphing process......#
    #***********************************************************#

    #fill in "WellSelect to match length of dataframe
    WellS <- rep(WellSelect[i], length(EW))
    WellC <- rep(WellSelect[i], length(X))

    #build dataframes for plots        
    dfSurvey <- data.frame(Well = WellS, MD = S, EW = EW, NS = NS, TVD = TVD)
    dfCalc <- data.frame(Well = WellC, Perf = P, X = X, Y = Y, TVDp = TVDp)
    dfSurvey <- dfSurvey[order(dfSurvey$Well, dfSurvey$MD), ]
    dfCalc <- dfCalc[order(dfCalc$Well, dfCalc$Perf), ]

    ###WORKS!!!! but just coded to save each plot and not combine
    pname <- paste0(dfSurvey$Well[i])
    p <- ggplot() + geom_point(data = dfSurvey, aes(x=EW, y=NS), shape = 2, size = 2, color = "blue1") +
    geom_point(data = dfCalc, aes(x=X, y=Y), shape = 17, size = 5, color = "Chartreuse3") + ggtitle(pname)
    ggsave(paste0(pname, ".png"), p)
    print(p)
}

Hope this is useful. Please let me know if you need anything else. Thanks for the help!

Please create and include a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with a section of your data so we can attempt to solve the problem — Jack Brookes, Mar 27 '18 at 22:22
Thanks for the reply. Please see the edit above which includes the complete code, pictures of current and desired output and hyperlinks to example datasets. — JDC, Mar 28 '18 at 13:45
Where in your plotting are there lines? You only plot points. — Parfait, Mar 28 '18 at 14:25
Points are the best format for "dfCalc". Line or point is acceptable for "dfSurvey". So I just left them both as points. Feel free to change "dfSurvey" to line if you wish — JDC, Mar 28 '18 at 14:32

Parfait · Accepted Answer · 2018-03-28T15:51:25.040

Consider binding all dataframes in a compiled, singular dataframe and use group and colour argument of ggplot:

Specifically replace for loop:

for(i in seq_along(WellSelect)) {
    ...
}

With an lapply to build a list of dataframes and remove all plotting lines (done once later):

df_lists <- lapply(seq_along(WellSelect), function(i) {
    # ... same code

    # build dataframes for plots        
    dfSurvey <- data.frame(Well = WellS, MD = S, EW = EW, NS = NS, TVD = TVD)
    dfCalc <- data.frame(Well = WellC, Perf = P, X = X, Y = Y, TVDp = TVDp)
    dfSurvey <- dfSurvey[order(dfSurvey$Well, dfSurvey$MD), ]
    dfCalc <- dfCalc[order(dfCalc$Well, dfCalc$Perf), ]

    return(list(dfSurvey, dfCalc))   
}

# COMPILED DATAFRAMES
dfSurveyAll <- do.call(rbind, lapply(df_lists, "[[", 1))

dfCalcAll <- do.call(rbind, lapply(df_lists, "[[", 2))

Then run one singular plot with group and colour arguments

p <- ggplot() + 
       geom_point(data = dfSurveyAll, aes(x=EW, y=NS, group="Well", colour="Well"), 
                 shape = 2, size = 2) +
       geom_point(data = dfCalcAll, aes(x=X, y=Y,  group="Well", colour="Well"), 
                  shape = 17, size = 5) + ggtitle(pname)    
p

There is even room to use by as you are subsetting the DS dataframe by WellName factor. So below blocks inside for loop:

for(i in seq_along(WellSelect)) {
    S <- DS$MD[DS$WellName == WellSelect[i]]
    P <- Perf$MidPerfMD[Perf$WellName == WellSelect[i]]
    INCL <- DS$INCL[DS$WellName == WellSelect[i]]
    AZIM <- DS$AZIM[DS$WellName == WellSelect[i]]
    NS <- DS$NS[DS$WellName == WellSelect[i]]
    EW <- DS$EW[DS$WellName == WellSelect[i]]
    TVD <- DS$TVD[DS$WellName == WellSelect[i]]
    ...
    Sub1 <- DS[DS$WellName == WellSelect[i], ]
    ...
    WellS <- rep(WellSelect[i], length(EW))
    WellC <- rep(WellSelect[i], length(X)
    ...
}

Can be replaced with by where its argument, sub, is a subsetted dataframe with exception of Perf (a separate dataframe). Here, by returns a named list of inner lists of two datafames or equivalent structure as lapply above.

df_lists <- by(DS, DS$WellName, FUN=function(sub) {

    S <- sub$MD
    P <- Perf$MidPerfMD[Perf$WellName == sub$WellName[1]]
    INCL <- sub$INCL
    AZIM <- sub$AZIM
    NS <- sub$NS
    EW <- sub$EW
    TVD <- sub$TVD

    ...
    Sub1 <- sub
    ...
    WellS <- rep(sub$WellName[1], length(EW))
    WellC <- rep(sub$WellName[1], length(X)

    # build dataframes for plots        
    # ... same as lapply above

})

thanks for the reply. I have tried binding the dataframes together but keep getting error message due to the difference in the dimensions of the dataframes, but I will try your proposed changes. Also I have attempted to use the "Apply" family numerous times to no avail. I do recognize that I am inexperienced with R and most likely my failures have been operator error! — JDC, Mar 28 '18 at 15:04
so I did the modifications you suggested with regard to the `lapply` and `rbind` functions. I had to tweak a couple of parentheses and remove the `colour = Well`, but it worked perfectly after that! I will try the remainder of your suggestion, but right now it is working great! Thanks a bunch for your quick assistance and spot-on feedback! — JDC, Mar 28 '18 at 15:43
ok I will check on the `aes()`. Thanks again @Parfait. PS - apparently I can't upvote yet since I'm new, but I did give you some clicks that apparently are recorded somewhere.... — JDC, Mar 28 '18 at 15:56

Plots from R Loop using two, different sized, dataframes

1 Answers1