Using R snippets written by other developers can be unendingly maddening. There are a variety of reasons for this, most of which boil down to a simple issue: most code is written such that a system must be configured in precisely the same way as the code’s author’s machine. Anyone who’s ever seen a line like this:
knows what I am talking about. To use this without modification, you must:
read.xls("C:/Users/MCaine/ code/R/projecteuler/ someotherdirectory/data.xls")
knows what I am talking about. To use this without modification, you must:
- Use Windows.
- Have exactly the directory structure specified by the address (which is highly unlikely, unless you were the one who wrote it).
- Have the gdata package installed and included in the project (which is both unlikely, and difficult to know without already being a regular user of the package).
You can see how it would already be easier to just change the address to whatever works on your machine. For this problem, I’m afraid I have no easy solution. My preferred approach is to provide URLs, so that the directory structure doesn’t depend on the user’s machine, but this obviously provides its own host of problems. However, I can take aim at this third issue.
Package management in R is a silly thing, both because it is so easy and it is so easy to screw up. Most people who write R write it like analysts: they write exactly enough code the get the desired output on their machine, and leave it at that. When such R Code is made public, it tends to be very difficult to use it in actual replication. But there are some simple ways around that.
getPackage <- function(pkg){
if(!require(pkg, character.only=TRUE)){
install.packages(pkg)
library(pkg, character.only=TRUE)
}
return(TRUE)
}
Consider this gist. Instead of calling library(package), which fails if the library is not installed on the user’s machine, getPackage(package) invokes that safer require function, which doesn’t fail when the local machine doesn’t have the requisite package. Instead, it returns false, which triggers the R script to download the package from the user’s default CRAN mirror and then bring it into the user’s working session. If anything goes wrong, then it throws an error, but it won’t do that for anything silly like the user not having a package that is a mere two commands away.
One note: because of the esoteric manner in which R treats package names, you must pass this function a string and not a package name. If you’re not on top of your type-know-how, this means that
getPackage(plyr)
will fail. You should instead writegetPackage("plyr")
.
Now, I’m sure there’s a very good reason hidden deep in R’s core that this is a bad way to do things, but it has saved me time and headaches. R is recognized as a difficult language both within the programming world (for its strange inconsistencies and its non-grown-up hacker culture), and outside of it (because programming is hard). I wish that more R functions were written in the defensive way to decrease the cognitive barriers to using R for the latter group.
No comments:
Post a Comment