1

I know that it might not look like it from this question, but I've actually been programming for over 20 years, but I'm new to R. I'm trying to move away from Excel and to automate creation of about 100 charts I currently do in Excel by hand. I've asked two previous questions about this: here and here. Those solutions work for those toy examples, but when I try the exact same code on my own full program, they behave very differently and I'm completely befuddled as to why. When I run the program below, the testplot.png file is just a plot of the line, without the stacked bar chart.

So here is my (full) code as cut down as I can make it. If anyone wants to critique my programming, go ahead. I know that the comments are light, but that's to try to shorten it for this post. Also, this does actually download the USDA PSD database which is about 20MB compressed and is 170MB uncompressed...sorry but I would love someone's help on this!

Edit, here are str() outputs of both 'full' data and 'toy' data. The toy data works, the full data doesn't.

> str(melteddata)
Classes ‘data.table’ and 'data.frame':  18 obs. of  3 variables:
 $ Year    : int  1 2 3 4 5 6 1 2 3 4 ...
 $ variable: Factor w/ 3 levels "stocks","exports",..: 1 1 1 1 1 1 2 2 2 2 ...
 $ Qty     : num  2 4 3 2 4 3 4 8 6 4 ...
 - attr(*, ".internal.selfref")=<externalptr> 
> str(SoySUHist)
Classes ‘data.table’ and 'data.frame':  159 obs. of  3 variables:
 $ Year    : int  1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 ...
 $ variable: Factor w/ 3 levels "Stocks","DomCons",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ Qty     : num  0.0297 0.0356 0.0901 0.1663 0.3268 ...
 - attr(*, ".internal.selfref")=<externalptr> 
> str(linedata)
Classes ‘data.table’ and 'data.frame':  6 obs. of  2 variables:
 $ Year: int  1 2 3 4 5 6
 $ Qty : num  15 16 15 16 15 16
 - attr(*, ".internal.selfref")=<externalptr> 
> str(SoyProd)
Classes ‘data.table’ and 'data.frame':  53 obs. of  2 variables:
 $ Year: int  1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 ...
 $ Qty : num  701 846 928 976 1107 ...
 - attr(*, ".internal.selfref")=<externalptr> 
> 





library(data.table)
library(ggplot2)
library(ggthemes)
library(plyr)

toyplot <- function(plotdata,linedata){
  plotCExp <- ggplot(plotdata) +
    geom_bar(aes(x=Year,y=Qty,factor=variable,fill=variable), stat="identity") +
    geom_line(data=linedata, aes(x=Year,y=Qty)) # <---- comment out this line & the stack plot works

  ggsave(plotCExp,filename = "ggsavetest.png", width=7, height=5, units="in")

}

convertto <- function(value,crop,unit='BU'){
  if (unit=='BU' & ( crop=='WHEAT' | crop=='SOYBEANS')){
    value = value * 36.7437
  } 
  return(value)
}

# =====================================
# Download Data (Warning...large download!) 
# =====================================
system("curl https://apps.fas.usda.gov/psdonline/download/psd_alldata_csv.zip | funzip > DATA/psd.csv")
tmp <- fread("DATA/psd.csv")
PSD = data.table(tmp)
rm(tmp)
setkey(PSD,Country_Code,Commodity_Code,Attribute_ID)
tmp=unique(PSD[,.(Commodity_Description,Attribute_Description,Commodity_Code,Attribute_ID)])
tmp[order(Commodity_Description)]
names(PSD)[names(PSD) == "Market_Year"] = "Year"
names(PSD)[names(PSD) == "Value"] = "Qty"

PSDCmdtyAtt = unique(PSD[,.(Commodity_Code,Attribute_ID)])

# Soybean Production, Consumpion, Stocks/Use
SoyStocks = PSD[list("US",2222000,176),.(Year,Qty)]  # Ending Stocks
SoyExp    = PSD[list("US",2222000,88),.(Year,Qty)]  # Exports
SoyProd   = PSD[list("US",2222000,28),.(Year,Qty)]   # Total Production
SoyDmCons = PSD[list("US",2222000,125),.(Year,Qty)]  # Total Dom Consumption

SoyStocks$Qty = convertto(SoyStocks$Qty,"SOYBEANS","BU")/1000
SoyExp$Qty    = convertto(SoyExp$Qty,"SOYBEANS","BU")/1000
SoyProd$Qty   = convertto(SoyProd$Qty,"SOYBEANS","BU")/1000
SoyDmCons$Qty = convertto(SoyDmCons$Qty,"SOYBEANS","BU")/1000

# Stocks/Use
SoySUPlot <- SoyExp
names(SoySUPlot)[names(SoySUPlot) == "Qty"] = "Exports"
SoySUPlot$DomCons    = SoyDmCons$Qty
SoySUPlot$Stocks     = SoyStocks$Qty
SoySUHist <- melt(SoySUPlot,id.vars="Year")
SoySUHist$Qty = SoySUHist$value/1000
SoySUHist$value <- NULL
SoySUPlot$StocksUse = 100*SoySUPlot$Stocks/(SoySUPlot$DomCons+SoySUPlot$Exports)

SoySUPlot$Production = SoyProd$Qty/1000
SoySUHist$variable <- factor(SoySUHist$variable, levels = rev(levels(SoySUHist$variable)))
SoySUHist = arrange(SoySUHist,variable)

toyplot(SoySUHist,SoyProd)
Community
  • 1
  • 1
YourEconProf
  • 191
  • 2
  • 9
  • Learn to create a minimal reproducible example. That's a useful and important skill for a programmer. You are expected to provide one when asking a question at Stack Overflow. Downloading a huge dataset is not part of a minimal reproducible example for a question about plotting. – Roland Oct 16 '16 at 00:46
  • http://stackoverflow.com/a/5963610/1412059 – Roland Oct 16 '16 at 00:48
  • I did, if you look at both of the links above, there are two minimal, easily reproducible examples. However, when it is applied to the full dataset, those solutions don't work. Hence the attached, fully repoducible, but not minimal, solution. – YourEconProf Oct 16 '16 at 00:54
  • That's what I'm talking about. You should investigate what is different in your real data compared to your toy data. The str function is useful for this. – Roland Oct 16 '16 at 01:06
  • Do you have any suggestions on how to interpret the output of str()? Admittedly, I'm new to R, but these look pretty similar. (I'm posting output to main post) – YourEconProf Oct 16 '16 at 02:44
  • PROC CONTENTS from SAS, does provide all variable information, type etc. – n1tk Oct 16 '16 at 03:18
  • First str () does say that you have a dataframe with 3 variables and 18 observations, each variable type like for $year is int , $variable is factor, $qty is numwricetc. So str () does give structure of the data.frame, variable type data, what data does contain, if any missing etc. – n1tk Oct 16 '16 at 03:21

1 Answers1

2

All right, I'm feeling generous. Your example code contains a lot of fluff that should not be in a minimal reproducible example and your system call is not portable, but I had a look anyway.

The good news: Your code works as expected.

Let's plot only the bars:

ggplot(SoySUHist) +
  geom_bar(aes(x=Year,y=Qty,factor=variable,fill=variable), stat="identity")

plot 1

Now only the lines:

ggplot(SoySUHist) +
  geom_line(data=SoyProd, aes(x=Year,y=Qty))

plot 2

Now compare the scales of the y-axes. If you plot both together, the bars get plotted, but they are so small that you can't see them. You need to rescale:

ggplot(SoySUHist) +
  geom_bar(aes(x=Year,y=Qty,factor=variable,fill=variable), stat="identity") +
  geom_line(data=SoyProd, aes(x=Year,y=Qty/1000)) 

plot 3

Roland
  • 127,288
  • 10
  • 191
  • 288
  • Dammit. Yes I actually found that late last night before reading this today and am **really** embarrassed. But thank you so much. Post a Venmo and I'll send you coffee/beer money. – YourEconProf Oct 17 '16 at 15:43