Monday, January 28, 2008

data.frame in R (the key object)

I've been wading through R trying to figure out how it all works. It looks like the most important data structure is the data frame, and here I demo how to make a dataframe that a person can then play with. Yes, you can import data from a spreadsheet, but a person should understand the underlying data structure.

A data.frame is just a list of vectors or matrices where each has the same number of rows (think of the vectors as columns). It really is just the same as data in a spreadsheet:


In fact, when one reads in this type of info, one gets a data.frame back:

df = read.csv(".csv", quote="") # <-reads an unquoted csv file with headers

So, assuming we have some data like the above, now lets make it in R:

> var1 <- c(1,0,3) # '<-' is the assignment operator (like a directional '=')
# 'c()' is the concatenate operator (makes lists of things)
> var2 <- c(3,2,8)
> var3 <- array( c(4,3,2), c(3) ) # here's how you make a more mathematical vector

> fac1 <- c("hot", "cold", "cold")
> fac2 <- c("TX","WI","UT")

# alright, lets add all this stuff together into a dataframe:

> df <- data.frame(var1,var2,var3,fac1,fac2, row.names=NULL, check.rows=TRUE)

# creates this:
var1 var2 var3 fac1 fac2
1 1 3 4 hot TX
2 0 2 3 cold WI
3 3 8 2 cold UT

# want to see the names of the columns in the dataset:
> names(df)
[1] "var1" "var2" "var3" "fac1" "fac2"

# change the names:
> names(df) <- c("height", "weight", "nose.length", "temp", "state")
> df
height weight nose.length temp state
1 1 3 4 hot TX
2 0 2 3 cold WI
3 3 8 2 cold UT

# so, now we have a data.frame, the preferred data structure in R. Let the magic begin:

> plot(df)

1 comment:

EGP said...

Thanks, it is helpful to know the why behind things.