0

I have an iteration/scaling problem.

I have a dataframe = geocoded It has information about 12 local areas = LA

I can subset this data and write the results of this subset to multiple files

    ## read data in from geocoded file
    geocoded1<-read.csv("S:/somestuff/geocoded 2015 - 2018.csv",na.strings=c(""," ","N/A"))
    geocoded<-subset(geocoded1,geocoded1$CONFIDENCE !="Discarded")

    #split geocoded data by LA 
    x <-split(geocoded,list(geocoded$LA),drop = TRUE,sep = "_")


    #Split geocoded data by LA and Final
    #split(x, f, drop = FALSE, sep = ".", lex.order = FALSE, .)
    y<-split(geocoded,list(geocoded$LA,geocoded$DISEASE), drop = TRUE, sep = "_")


#write CSV files of geocoded to file locations
lapply(names(x), function(name) write.csv(x[[name]], file = paste('S:/some stuff/LA/',name,".csv",sep="")))
lapply(names(y),function(name) write.csv(y[[name]], file = paste('S:/some stuff/LAFinal/',name,".csv",sep="")))

I can write the results of this subset to the global environment (do i need to?)

#write the results of subsetting data into x and y to the global environment
list2env(x,envir = .GlobalEnv)
list2env(y,envir = .GlobalEnv)

and i can plot as a stacked bar, in a facet wrap, for each of these data frames

# Stacked Bar Plot with Colors and Legend
bm<-ggplot(data =DATA,aes(x=MONTH,fill=FILL))+geom_bar()
bm +facet_wrap("~YEAR,ncol = 5)

And I could go through them manually (NO I CAN'T THERE'S >100 in there!).

How can I plot the contents of x or y in the same way I wrote them to a file? I used lapply there. Is there a way lapply or similar can say either: for all the names in x plot stacked bar charts faceted by year or for all the data frames in the global environment faceted by year

I was planning on plotting a stacked bar chart, faceted by year, as there are 4 years; with the same scale on the y, with the x showing the month of the year, and the fill being based on another column (gender for example). I'd like to standardise the appearance of each plot so they were transparent backgrounds.

thanks in advance

edit:

# Stacked Bar Plot with Colors and Legend
bm<-ggplot(data =LADISEASE1,aes(x=MONTH,fill=FILL))+geom_bar()
bm +facet_wrap("~YEAR,ncol = 5)

When I split by LA and DISEASE I generate 20 disease dataframes per LA (as long as they're not null, so ~200).

Edit again: Using the data from the comment

DISEASE = c("Marco Polio","Marco Polio","Marco Polio","Marco Polio","Marco Polio",
            "Mumps","Mumps","Mumps","Mumps","Mumps",
            "Chicky Pox","Chicky Pox","Chicky Pox","Chicky Pox","Chicky Pox")
YEAR = c(2011, 2012, 2013, 2014, 2015,
            2011, 2012, 2013, 2014, 2015,
            2011, 2012, 2013, 2014, 2015)
MONTH =c(1,2,3,4,5,6,7,8,9,10,11,12,1,12)
LA = c("A","B","C")

VALUE = c(82,89,79,51,51,
          79,91,69,89,78,
          71,69,95,61,87)

What I can do to the single dataframe is this

#split geocoded data by LA 
LA <-split(geocoded,list(geocoded$LA),drop = TRUE,sep = "_")
str(LA)

Which splits the large data frame into the 12 areas + missing.

I guess the problem I'm trying (and failing at describing) to solve is how to create a panel of 20 timeline charts for each LA for each year. For example area a, infections 1-20, for each year between 2015-2018.

Do I facet the charts on year and infection, or slice the data frame first and then facet the chart?

The example shown is great! And it made me think, I should do that to. So that a person could quickly see the number of cases per year.

It's so easy to slice a data frame and make new ones, I got carried away a bit. All I need to do is work on the one data frame but output the charts as a graphic I can paste/write to a document.

damo
  • 463
  • 4
  • 14
  • The "DATA" dataframe is used in the ggplot method, but I don't see it defined anywhere in your prior code snippets. Can you please provide us with a small example of the data set you're trying to plot? Or are you saying you have 100 different data frames that you're trying to plot? – LetEpsilonBeLessThanZero Apr 26 '18 at 14:44
  • I think you need [this](https://stackoverflow.com/questions/34241954/saving-plots-within-lapply). otherwise revise you question. – Roman Apr 26 '18 at 14:46
  • I thought I could do that with lappy. OK. Let me try it all out! – damo Apr 26 '18 at 14:49
  • I would recommend against splitting this into 20 different dataframes. Why not just have one dataframe with a column for disease? This would make plotting with ggplot easy. – LetEpsilonBeLessThanZero Apr 26 '18 at 15:08
  • I'm almost certainly doing this in the wrong way. I would like to generate a set of facet charts for a disease for the years 2015-2018. For each LA. I'm definitely not doing this the programmatic way am I? I'm still stuck in an excel mindset. Each LA should have 19 facet charts (1 for each disease category, one of the categories is "NA") showing a stacked barchart for each year (2015-18). Is there a way to write that? – damo Apr 26 '18 at 15:13
  • Yeah, I'll write something up that I think will show you a much more efficient way of doing this. I'm going to have to create my own data set though, since I don't have yours. – LetEpsilonBeLessThanZero Apr 26 '18 at 15:16
  • I'll see if I can get that up tomorrow, I've got to get on with #dadding now. – damo Apr 26 '18 at 15:18

1 Answers1

0

I would suggest not to split into different data frames, if possible. Instead keep all data in a single data frame and facet over the DISEASE variable in order to get separate charts for each disease. Maybe the following code will give you an idea of another path you could take to get the end result you want:

library(tidyverse)

DISEASE = c("Marco Polio","Marco Polio","Marco Polio","Marco Polio","Marco Polio",
            "Mumps","Mumps","Mumps","Mumps","Mumps",
            "Chicky Pox","Chicky Pox","Chicky Pox","Chicky Pox","Chicky Pox")
YEAR = c(2011, 2012, 2013, 2014, 2015,
            2011, 2012, 2013, 2014, 2015,
            2011, 2012, 2013, 2014, 2015)
VALUE = c(82,89,79,51,51,
          79,91,69,89,78,
          71,69,95,61,87)

DATA = data.frame(DISEASE, YEAR, VALUE)

plot = ggplot(DATA) +
  geom_bar(aes(x=YEAR, y=VALUE), stat="identity") +
  facet_grid(~DISEASE)

print(plot)

enter image description here

LetEpsilonBeLessThanZero
  • 2,395
  • 2
  • 12
  • 22
  • Hmm. I wanted to plot month on the x axis, and have a panel of charts for a year. How would I do that? How do I filter the data in the ggplot section so that =="2017" works ? – damo Apr 29 '18 at 10:10
  • something like this? https://stackoverflow.com/questions/18165578/subset-and-ggplot2 am i even close? – damo Apr 29 '18 at 10:17