R Functions

The R scripts that I upload may contain two functions which cannot be found in standard packages. I usually source an R script from the local directory together with main packages; the contents of this script are explained here.

First, for conveniently bringing together multiple plots in R, the multiplot function is provided by the website Cookbook for R, and used in my scripts.

Second, in my quantitative analyses, I use Amelia II for multiple imputation to deal with missing data, and Zelig for combined results from multiply imputed datasets. For faster calculation of model statistics as arithmetic average of statistics from each imputed dataset, I wrote a simple function to extract needed values:

z.modelstats <- function(x, n) {
 n=length(x$all)
 Rsq <- rep(0, n)
 for (i in 1:n) {
   Rsq[i] <- x$all[[i]]$r.squared
 }
 ARsq <- rep(0, n)
 for (i in 1:n) {
   ARsq[i] <- x$all[[i]]$adj.r.squared
 }
 RSE <- rep(0, n)
 for (i in 1:n) {
   RSE[i] <- x$all[[i]]$sigma
 }
 FStat <- rep(0, n)
 for (i in 1:n) {
   FStat[i] <- x$all[[i]]$fstatistic[[1]]
 }
 restab <- matrix(
   c("N", "R-sq", "Adj R-sq", "Res St Er", "F-stat", "DF", "p-value", "p-test", 
     x$all[[3]]$df[1] + x$all[[3]]$df[2], 
     round(mean(Rsq), 2), 
     round(mean(ARsq), 2), 
     round(mean(RSE), 2), 
     round(mean(FStat), 2), 
     paste(x$all[[3]]$fstatistic[[2]], x$all[[3]]$fstatistic[[3]], sep="; "), 
     round(pf(mean(FStat), x$all[[3]]$fstatistic[[2]], x$all[[3]]$fstatistic[[3]], 
              lower.tail=F), 3), 
   ifelse(pf(mean(FStat), x$all[[3]]$fstatistic[[2]], x$all[[3]]$fstatistic[[3]], 
             lower.tail=F)<0.001, "<0.001***", 
         ifelse(pf(mean(FStat), x$all[[3]]$fstatistic[[2]], 
                                x$all[[3]]$fstatistic[[3]], 
                   lower.tail=F)<0.01, "<0.01**", 
               ifelse(pf(mean(FStat), x$all[[3]]$fstatistic[[2]], 
                                      x$all[[3]]$fstatistic[[3]], 
                         lower.tail=F)<0.05, "<0.05*", 
                     ifelse(pf(mean(FStat), x$all[[3]]$fstatistic[[2]], 
                                            x$all[[3]]$fstatistic[[3]], 
                               lower.tail=F)<0.1, "<0.1.", ">0.1"))))), 
 nrow=8, ncol=2)
 return(restab)
}

The function z.modelstats takes the summary of a Zelig object as the primary argument x. Although there is no need to enter the second argument n for the number of imputed datasets, it can be used to calculate average statistics of the first n models. The function returns an 8×2 matrix which gives the number of observations, R-squared, adjusted R-squared, Residual standard error, F statistic, degrees of freedom for F-test, resulting p value, and its significance level. Please note that this function does not work with the latest version of Zelig; this demonstration is only meant to clarify the scripts provided in relevant pages.

Third, for OLS regression diagnostics, I generate partial regression plots (added variable plots). The function avPlots from the package car is widely used for this purpose. However, I want my plot design to be consistent throughout, and in general I use ggplot2. For this reason, I created a function for generating partial regression plots with a simple ggplot2 design:

gg.avPlot <- function (y, x, covs, data, weight) {
 
 p0 <- paste(covs, collapse=" + ")
 p1 <- paste("y", " ~ ", p0)
 p2 <- paste("x", " ~ ", p0)
 
 fit.y <- do.call("lm", list(formula=as.formula(p1), 
                             data=as.name("data"), 
                             weights=as.name("weight")))
 fit.x <- do.call("lm", list(formula=as.formula(p2), 
                             data=as.name("data"), 
                             weights=as.name("weight")))
 df <- data.frame(fit.y$res, fit.x$res, weight)
 
 require(ggplot2)
 ggplot(df, aes(x=fit.x.res, y=fit.y.res, weight=weight)) + 
   geom_point(size=3, alpha=0.33, colour="#5882FA") + 
   geom_point(data=df[c(91,196,255,497,654,692),], 
              aes(x=fit.x.res, y=fit.y.res), 
              colour="purple", size=3, shape=18) + 
   geom_smooth(method="lm", colour="black", fill="#848484", level=0.99, size=0.75) + 
   stat_smooth(method="loess", colour="red", se=F, size=0.5)
 
}

The function gg.avPlot takes five arguments. The argument y is the dependent variable of the main model under scrutiny, the argument x is the independent variable with which the partial regression is to be plotted, the argument covs takes a string vector of the names of all other covariates in the model, the argument data defines dataframe of the covariates, and the argument weight defines the weights. A few remarks are needed about the use of arguments:

  • First two arguments are not specified with respect the data, so they should be entered in the form dataframe$variable (if they are not independent vectors or attached to the environment). This is not necessary for other covariates, only their names should be entered in the form c(“cov1”, “cov2”, …).
  • If there are factors among independent variables to be plotted, they should be treated as numeric. If there are categorical variables, manually dummy coding and including other dummies with covariates will give the most accurate results.
  • If there are no weights in the model, the easiest solution is removing all references from the function before running it, since this possibility is not considered in the above code for the sake of simplicity and for the special needs of my analyses.

Also note that this formulation takes into account specific features of the data and the model that I analyse. For instance, the second geom_point highlights potentially influential cases with regard to the explanatory model. The colours and the theme are also my personal choice; these can be modified according to general ggplot procedures.