Renzhi
April 18, 2017
Examples of Matrices
Matrices are vectors with a dimension attribute. The dimension attribute is itself an integer vector of length 2 (number of rows, number of columns)
m <- matrix(nrow = 2, ncol = 3) # create matrix with 2 row and 3 columnm
## [,1] [,2] [,3]## [1,] NA NA NA## [2,] NA NA NA
dim(m) # get dimentions
## [1] 2 3
attributes(m) # get attribute of m
## $dim## [1] 2 3
m <- matrix(1:6, nrow = 2, ncol = 3) # create matrix with valuesm
## [,1] [,2] [,3]## [1,] 1 3 5## [2,] 2 4 6
m[1,2] # get the element at first row and second column
## [1] 3
m <- 1:10 # get vectorsm
## [1] 1 2 3 4 5 6 7 8 9 10
dim(m)<- c(2,5) # create matrix directly from vectors by adding dimension attributem
## [,1] [,2] [,3] [,4] [,5]## [1,] 1 3 5 7 9## [2,] 2 4 6 8 10
x <- 1:3y <- 10:12cbind(x,y) # create matrix by column binding
## x y## [1,] 1 10## [2,] 2 11## [3,] 3 12
rbind(x,y) # create matrix by row binding
## [,1] [,2] [,3]## x 1 2 3## y 10 11 12
Examples of Factors
Factors are used to represent categorical data and can be unordered or ordered. One can think of a factor as an integer vector where each integer has a label. Factors are important in statistical modeling and are treated specially by modelling functions like lm() and glm(). Using factors with labels is better than using integers because factors are self-describing. Having a variable that has values “Male” and “Female” is better than a variable that has values 1 and 2.
x <- factor(c("yes", "yes", "no", "yes", "no")) x
## [1] yes yes no yes no ## Levels: no yes
table(x)
## x## no yes ## 2 3
unclass(x) # See the underlying representation of factor
## [1] 2 2 1 2 1## attr(,"levels")## [1] "no" "yes"
x <- factor(c("yes", "yes", "no", "yes", "no"), levels <- c("yes", "no")) # The order of the levels of a factor can be set using the levels argument to factor()x
## [1] yes yes no yes no ## Levels: yes no
Examples of Missing values
Missing values are denoted by NA or NaN for q undefined mathematical operations. is.na() is used to test objects if they are NA is.nan() is used to test for NaN A NaN is also NA but the converse is not true
x <- c(1, 2, NA, 10, 3) ## Create a vector with NAs in itis.na(x)
## [1] FALSE FALSE TRUE FALSE FALSE
is.nan(x)
## [1] FALSE FALSE FALSE FALSE FALSE
x <- c(1, 2, NaN, NA, 4) ## Now create a vector with both NA and NaN valuesis.na(x)
## [1] FALSE FALSE TRUE TRUE FALSE
is.nan(x)
## [1] FALSE FALSE TRUE FALSE FALSE
Examples of Data Frames
Data frames are used to store tabular data in R. Data frames are represented as a special type of list where every element of the list has to have the same length Each element of the list can be thought of as a column and the length of each element of the list is the number of rows. Data frames have a special attribute called row.names which indicate information about each row of the data frame.
x <- data.frame(foo = 1:4, bar = c(T, T, F, F)) # create a data framex
## foo bar## 1 1 TRUE## 2 2 TRUE## 3 3 FALSE## 4 4 FALSE
nrow(x)
## [1] 4
ncol(x)
## [1] 2
data.matrix(x) # convert data frame to a matrix
## foo bar## [1,] 1 1## [2,] 2 1## [3,] 3 0## [4,] 4 0
Examples of names
R objects can have names, which is very useful for writing readable code and self-describing objects
x <- 1:3names(x)
## NULL
names(x) <- c("New York", "Seattle", "Los Angeles") # set the names for vector xx
## New York Seattle Los Angeles ## 1 2 3
x <- list("Los Angeles" = 1, Boston = 2, London = 3) # list can also have names x
## $`Los Angeles`## [1] 1## ## $Boston## [1] 2## ## $London## [1] 3
m <- matrix(1:4, nrow = 2, ncol = 2)dimnames(m) <- list(c("a", "b"), c("c", "d")) # Matrices can have both column and row names.m
## c d## a 1 3## b 2 4
colnames(m) <- c("h", "f") # set column names rownames(m) <- c("x", "z") # set row namesm
## h f## x 1 3## z 2 4
Examples of subsetting operation
There are three operators that can be used to extract subsets of R objects. • The [ operator always returns an object of the same class as the original. It can be used to select multiple elements of an object. • The [[ operator is used to extract elements of a list or a data frame. It can only be used to extract a single element and the class of the returned object will not necessarily be a list or data frame. • The $operator is used to extract elements of a list or data frame by literal name. Its semantics are similar to that of [[.
x <- c("a", "b", "c", "c", "d", "a") x[1] ## Extract the first element
## [1] "a"
x[2] ## Extract the second element
## [1] "b"
x[1:4] ## extract multiple elements
## [1] "a" "b" "c" "c"
x[c(1, 3, 4)]
## [1] "a" "c" "c"
u <- x> "a"x[u] ## extract elements of a vector that satisfy a given condition.
## [1] "b" "c" "c" "d"
x[x>"a"]
## [1] "b" "c" "c" "d"
x <- matrix(1:6, 2, 3) x
## [,1] [,2] [,3]## [1,] 1 3 5## [2,] 2 4 6
x[1, 2] ## get row 1 column 2 element in matrix x
## [1] 3
x[2, 1]
## [1] 2
x[1, ] ## Extract the first row
## [1] 1 3 5
x[, 2] ## Extract the second column
## [1] 3 4
x[1, 2, drop = FALSE] ## turn off the default returning vector
## [,1]## [1,] 3
x[1, ]
## [1] 1 3 5
x[1, , drop = FALSE]
## [,1] [,2] [,3]## [1,] 1 3 5
x <- list(foo = 1:4, bar = 0.6) x
## $foo## [1] 1 2 3 4## ## $bar## [1] 0.6
x[[1]] ## get the first element in list use [[]]
## [1] 1 2 3 4
x[["bar"]] ## get the element bar
## [1] 0.6
x$bar
## [1] 0.6
x <- list(foo = 1:4, bar = 0.6, baz = "hello") # create a listname <- "foo"x[[name]] ## computed index for "foo"
## [1] 1 2 3 4
x$foo ## get the element with name foo
## [1] 1 2 3 4
x <- list(a = list(10, 12, 14), b = c(3.14, 2.81)) # create a nested listx[[c(1, 3)]] ## Get the 3rd element of the 1st element
## [1] 14
x[[1]][[3]] ## same as above
## [1] 14
x[[c(2, 1)]]## 1st element of the 2nd element
## [1] 3.14
x <- list(aardvark = 1:5) ## create a new listx$a ## partial matching of a list element name
## [1] 1 2 3 4 5
x[["a"]] ## by default, exact matching of a list element name
## NULL
x[["a", exact = FALSE]] ## partial matching of a list element name
## [1] 1 2 3 4 5
Examples of removing NA values
A common task in data analysis is removing missing values (NAs).
x <- c(1, 2, NA, 4, NA, 5)bad <- is.na(x)print(bad)
## [1] FALSE FALSE TRUE FALSE TRUE FALSE
x[!bad] ## removing the NA values
## [1] 1 2 4 5
## creating two vectors with missing values, now we want to take subset with no missing values in both vectorsx <- c(1, NA,3, 4, NA, 5) y <- c("a", "b", NA, "d", NA, "f")good <- complete.cases(x, y)good
## [1] TRUE FALSE FALSE TRUE FALSE TRUE
x[good] # good cases in x
## [1] 1 4 5
y[good] # good cases in y
## [1] "a" "d" "f"
Examples of vectorized operations
Many operations in R are vectorized, meaning that operations occur in parallel in certain R objects. This allows you to write code that is efficient, concise, and easier to read than in non-vectorized languages.
x <- 1:4 y <- 6:9z <- x + y z
## [1] 7 9 11 13
x >= 2
## [1] FALSE TRUE TRUE TRUE
x-y
## [1] -5 -5 -5 -5
x*y
## [1] 6 14 24 36
x <- matrix(1:4, 2, 2)x
## [,1] [,2]## [1,] 1 3## [2,] 2 4
y <- matrix(rep(10, 4), 2, 2) x
## [,1] [,2]## [1,] 1 3## [2,] 2 4
## element-wise multiplicationx*y
## [,1] [,2]## [1,] 10 30## [2,] 20 40
## element-wise divisionx/y
## [,1] [,2]## [1,] 0.1 0.3## [2,] 0.2 0.4
## true matrix multiplicationx %*% y
## [,1] [,2]## [1,] 40 40## [2,] 60 60