Thursday, August 19, 2010

apply & sweep : Column-wise & Row-wise operations in R

How do you do operations on rows and columns in R? apply and sweep are the main tools. Here is an example of "autoscaling" (mean center and divide by standard deviation) on each column:

[I verified that this is correct on oocalc]

# 1 -> across the rows
# 2 -> across the columns 
m <- matrix(data=c(1,2,3,4,4.5,6,7,8,9,10,11,13), nrow=4, ncol=3)
m
#      [,1] [,2] [,3]
# [1,]    1  4.5    9
# [2,]    2  6.0   10
# [3,]    3  7.0   11
# [4,]    4  8.0   13

colmeans <- apply(mymatrix, 2, mean)  # the column-wise means
# mc = the column-wise mean centered data
mc <- sweep(m, 2, colmeans, "-")   # subtract is the default

col_stdev <- apply(m, 2, sd)  # column-wise standard deviations

mcstd <- sweep(mc, 2, col_stdev, "/")  # divide by standard deviations

mcstd
#            [,1]       [,2]      [,3]
# [1,] -1.1618950 -1.2558275 -1.024695
# [2,] -0.3872983 -0.2511655 -0.439155
# [3,]  0.3872983  0.4186092  0.146385
# [4,]  1.1618950  1.0883839  1.317465

Thursday, August 5, 2010

Regression line, R^2 (Pearson's correlation coefficient), slope, and y intercept in R

A basic feature in Excel and OpenOffice Calc is to find and plot the regression line. Here is an example of doing this in R. This is more work in R, but you are sacrificing for more power.

Here is how we get the data out:

x = c(1,2,3,4,5)
y = c(3,4,5,5.1,6.2)
pearsons_r = cor(x,y)
r_squared = pearsons_r^2
fit = lm(y~x) # notice the order of variables
y_intercept = as.numeric(fit$coeff[1])
slope = as.numeric(fit$coeff[2])

Here is how we plot it:

plot(x,y)
abline(lm(y~x)) # again, notice the order
function_string = paste("f(x) = ", slope, "x + ", y_intercept,  sep="")
r_sq_string = paste("R^2 =", r_squared)
display_string = paste(function_string, r_sq_string, sep="\n")
mtext(display_string, side=3, adj=1)  # top right outside of the margin