1

I have two formats of my Mortality Data, one in the list form you get it from The Human Mortality Database, with Male, Female and Combined data all in columns. The other format is separated into Male and Female matrices, with just Age, year and the mortality rate in each matrix.

The first format is along the lines of

Year Age   Female     Male    Total  
1961  99     0.3       0.4     0.3  
1961  98     0.4       0.5     0.4  

etc.

The second format I separated to get data in the form of:

 Age 1961  1962  1963 .....  
  0  0.02  0.02  0.02 ...  
  1  0.002 0.002 0.002....  

etc.

I would like to be able to plot a heatmap so I can look at the cohort effects etc.

I have tried various methods found by searching online but these aren't working for the way my data is presented. The heatmaps I've produced come out completely red. Can anyone help?

I've tried this:

rnames <- France[,1]   #assign labels in column 1 to "rnames"
mat_data <- data.matrix(France[,2:ncol(France)])
rownames(mat_data) <- rnames #assign row names
col_breaks = c(seq(-1,0,length=100),  # for red
  seq(0,0.8,length=100),              # for yellow
  seq(0.8,1,length=100))              # for green
my_palette <- colorRampPalette(c("red", "yellow", "green"))(n = 299)
png("location",    # create PNG for the heat map        
  width = 5*300,        # 5 x 300 pixels
  height = 5*300,
  res = 300,            # 300 pixels per inch
  pointsize = 8)        # smaller font size

heatmap.2(mat_data,
cellnote=mat_data,
main="Correlation",
notecol="black",
trace="none",
margins =c(12,9),
col=my_palette,
breaks=col_breaks,
dendrogram="row",
Colv="NA")
dev.off()

Which creates a solid red heatmap, with the year listed along the bottom, and then the word Age next to the years, and then the actual ages listed along the y-axis. It also gives me an error code:

Error in seq.default(min.raw, max.raw, by = min(diff(breaks)/4)) : 
invalid (to - from)/by in seq(.)

Does anyone know of a better way of producing the heatmap or what I've done wrong here?

Emma
  • 33
  • 6
  • 1
    Please read up on [ask] and how to create a [reproducible example in R](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). This includes (some) data, preferebly the output of a call to dput, and your own code. – Heroka Nov 04 '15 at 15:35
  • @Heroka I wasn't sure what to include since all the code that I'd tried was really long and failed so much? – Emma Nov 04 '15 at 15:39
  • We need to see at least an example of the data, and what you've tried. If things are failing, we need to see where and what the error is. – TayTay Nov 04 '15 at 15:42
  • @Emma at least some data is needed. If it's a lot of code you've already tried you don't need to post all of it of course, but maybe the most recent try? – Heroka Nov 04 '15 at 15:45
  • @Heroka I've included my closest attempt now, I'll shortly include a sample of the data! Not sure what's allowed due to copyright laws etc. though – Emma Nov 04 '15 at 15:51
  • Which line is breaking it? R doesn't like one of your `seq` functions, but the three I see here work fine on my end. Is there one in the code that's not shown? – TayTay Nov 04 '15 at 15:54
  • 1
    @Emma you may also post simulated data, if copyright is a concern. If you do so, make sure that types/variable names match. – Heroka Nov 04 '15 at 15:55
  • And do you have an example of the output you have in mind? – Heroka Nov 04 '15 at 15:59
  • @Tgsmith61591 it's the `Colv="NA"` line, it takes my `seq` functions ok I think – Emma Nov 04 '15 at 16:00
  • @Heroka http://momentumpublishing.co.uk/weknow0.co.uk/wp/wp-content/uploads/2014/03/Heat-map-cohort-path.jpg similar to this – Emma Nov 04 '15 at 16:02
  • So you don't need the dendrogram your code currently produces? – Heroka Nov 04 '15 at 16:03
  • @EmmaThomas it's the `Colv="NA"` line because that's the end of the function call to `heatmap.2`, therefore the error is in one of the args passed to the function... one of the many, many args passed :-) – TayTay Nov 04 '15 at 16:08
  • @Heroka not really, more was just following the code at the time. I'll take that bit out! – Emma Nov 04 '15 at 16:15
  • @Heroka turns out I can't just remove it! what should that bit be instead? – Emma Nov 04 '15 at 16:16
  • @Tgsmith61591 so what should I do sorry? Pretty new to R so this is all confusing to me! – Emma Nov 04 '15 at 16:17
  • @Emma your expected output has an x-axis for year and an y-axis for age, so to me it looks quite different from the plot your code produces (with the columns) – Heroka Nov 04 '15 at 16:19
  • @EmmaThomas what is in your `France` frame? – TayTay Nov 04 '15 at 16:20
  • @Heroka I know! That's what the problem is! I have no idea how to do what I'm trying to get to! – Emma Nov 04 '15 at 16:24
  • @Tgsmith61591 the first table I gave, I've also tried it for the second table – Emma Nov 04 '15 at 16:25

2 Answers2

1

Is this in any way helpful? I based it on what your data looks like, and generated some data to match. Then I started with a plot with 'year' on the x-axis and 'age' on the y-axis and a square (geom_tile) for each point. Those squares are coloured according to the 'total'. It doesn't have any polygons like the example you gave, but I think with your real data it would enable you to look for cohort effects.

#generate some data ranging from 0 to 0.1
set.seed(1000)
France <- expand.grid(Year=1961:2000,Age=20:98)
France$Female <- runif(nrow(France),0,0.05)
France$Male <- runif(nrow(France),0,0.05)
France$Total <- France$Male + France$Female


library(ggplot2)

p1 <- ggplot(France, aes(x=Year,y=Age,fill=Total)) + 
  geom_tile()+ 
  scale_fill_gradientn(colours=rainbow(10))
p1

enter image description here

Heroka
  • 12,889
  • 1
  • 28
  • 38
  • ok, so I tried running the code for my data but it says `Error in eval(expr, envir, enclos) : object 'year' not found` , is the year coming from my data? I tried `France$year` and the same for age and total but it comes up with `Error in exists(name, envir = env, mode = mode) : argument "env" is missing, with no default`, do you know what this means? – Emma Nov 04 '15 at 16:42
  • It means I'm sloppy with capitalization. Will fix (I used variable names without capitals). – Heroka Nov 04 '15 at 16:44
  • oh but that is exactly what I'm wanting though! – Emma Nov 04 '15 at 16:45
  • that's working beautifully now, thank you! What are the options with the colour scheme, it's coming out very red due to most mortality rates being between 0 and 0.1! Guess there's not a lot you can do about that though. It seems that the 0-0.1 is red as well as above 0.45? – Emma Nov 04 '15 at 16:52
  • My mistake, I used a wrong call to `scale_gradientn` based on another solution-idea (which was false). You can play around with it, use different colours. – Heroka Nov 04 '15 at 16:59
  • If it works for you, could you accept it? Keeps others from spending time on it. – Heroka Nov 04 '15 at 19:12
1

From the source code:

z <- seq(min.raw, max.raw, by=min(diff(breaks)/4))

The heatmap.2 code is internally calling the seq function and produces the error you're experiencing:

Error in seq.default(min.raw, max.raw, by = min(diff(breaks)/4)) : 
    invalid (to - from)/by in seq(.)

What are min.raw and max.raw, though? Scroll up a bit (line 640) and you'll see they are the min and max of the breaks arg you passed in (which in this case is -1 and 1 respectively). The by parameter in the internal seq function evaluates to 0:

min(diff(breaks)/4)

In fact, you can replicate this error if you try to construct a seq function with these parameters:

> seq(-1, 1, by=0)
Error in seq.default(-1, 1, by = 0) : invalid (to - from)/by in seq(.)

There are two implications here: first of all, you've uncovered a cornercase that breaks that code and this is a bug that should probably be reported on the github repository (i.e., if this evaluates to 0, use some pre-defined by param). Secondly, you could use a uniform break parameter or just not define it. It is, afterall, an optional parameter. From the documentation:

breaks
(optional) Either a numeric vector indicating the splitting points for binning x
into colors, or a integer number of break points to be used, in which case the break
points will be spaced equally between min(x) and max(x).

By leaving breaks blank or providing a single value, you shouldn't encounter this problem.

TayTay
  • 6,882
  • 4
  • 44
  • 65
  • Thank you very much, when you say leaving `breaks` blank what do you mean exactly? do I set the `breaks=0` ,leave it blank or remove it? Or something entirely? – Emma Nov 04 '15 at 16:58
  • Just remove `breaks=col_breaks` from the call entirely. So it would be: `heatmap.2(mat_data, cellnote=mat_data, main="Correlation", notecol="black", trace="none", margins =c(12,9), col=my_palette, dendrogram="row", Colv="NA")` Give that a try and see what happens. – TayTay Nov 04 '15 at 17:00
  • There isn't any output anymore? I tried doing `p <- heatmap.2(mat_data, cellnote=mat_data, main="Correlation", notecol="black", trace="none", margins =c(12,9), col=my_palette, dendrogram="row", Colv="NA")` And then doing `p` but it just comes up with lots of numbers! – Emma Nov 04 '15 at 17:42