Thursday, June 23, 2011

Scatterplot with marginal histograms

The scatterplot is one of the most ubiquitous, and useful graphics. It's also very basic. One of its shortcomings is that it can hide important aspects of the marginal distributions of the two variables. To address this weakness, you can add a histogram of each margin to the plot. We demonstrate using the SF-36 MCS and PCS subscales in the HELP data set.

scatterhist = function(x, y, xlab="", ylab=""){
 zones=matrix(c(2,0,1,3), ncol=2, byrow=TRUE)
 layout(zones, widths=c(4/5,1/5), heights=c(1/5,4/5))
 xhist = hist(x, plot=FALSE)
 yhist = hist(y, plot=FALSE)
 top = max(c(xhist$counts, yhist$counts))
 par(mar=c(3,3,1,1))
 plot(x,y)
 par(mar=c(0,3,1,1))
 barplot(xhist$counts, axes=FALSE, ylim=c(0, top), space=0)
 par(mar=c(3,0,1,1))
 barplot(yhist$counts, axes=FALSE, xlim=c(0, top), space=0, horiz=TRUE)
 par(oma=c(3,3,0,0))
 mtext(xlab, side=1, line=1, outer=TRUE, adj=0,     at=.8 * (mean(x) - min(x))/(max(x)-min(x)))
 mtext(ylab, side=2, line=1, outer=TRUE, adj=0,     at=(.8 * (mean(y) - min(y))/(max(y) - min(y))))
 }

ds = read.csv("http://www.math.smith.edu/r/data/help.csv")
with(ds, scatterhist(mcs, pcs, xlab="MCS", ylab="PCS"))

No comments:

Post a Comment