Home | Prev | Next

1 Working with vectors

1.1 Intro

Vectors are the building blocks of R. They are 1D arrays that contain data of all the same type - so coercion of data types happens.

  • R is a 1-based array system so a vector starts at position 1
  • Items in a vector can have names but can also be referred to by position

1.2 Create

You can create vectors using the c() function or via a sequence (:). Values in a vector can be given names - this can be useful when subsetting values.

handCrafted<-c(1,2,3,4)
seqCrafted<-1:4
named<-c(a=1,b=2,c=3,d=4)
named
## a b c d 
## 1 2 3 4

1.3 Filter

Alternatively called subsetting, we can filter a vector by using positive notations of position, negative notations of position, the name of a value, or providing a boolean value for each item in the vector.

handCrafted[1]
## [1] 1
seqCrafted[-1]
## [1] 2 3 4
named["b"]
## b 
## 2
handCrafted[c(rep(TRUE,3),FALSE)]
## [1] 1 2 3

1.4 Update

You can update one or more values in a vector by assigning the new values into the desired subset.

handCrafted
## [1] 1 2 3 4
handCrafted[2]<-99
handCrafted
## [1]  1 99  3  4
named
## a b c d 
## 1 2 3 4
named["a"]<-99
named
##  a  b  c  d 
## 99  2  3  4
# Delete by subsetting without value
seqCrafted
## [1] 1 2 3 4
seqCrafted<-seqCrafted[-4]
seqCrafted
## [1] 1 2 3
# Append by creating a vector combining the original and the additional values
ordered
## function (x, ...) 
## factor(x, ..., ordered = TRUE)
## <bytecode: 0x38b7130>
## <environment: namespace:base>
ordered<-c(ordered,5)
ordered
## [[1]]
## function (x, ...) 
## factor(x, ..., ordered = TRUE)
## <bytecode: 0x38b7130>
## <environment: namespace:base>
## 
## [[2]]
## [1] 5

1.5 Manipulate

You can manipulate a vector using functions. You can overwrite any object created - this will change mode, class, etc as R is loosely typed and doesn’t require such things to be specified and fixed up front.

mode(seqCrafted)
## [1] "numeric"
seqCrafted<-as.character(seqCrafted)
mode(seqCrafted)
## [1] "character"

1.6 Order

Ordering of records works by providing the position numbers of values in a vector and then using those to produce a vector with the original components in new locations

preOrder<-sample(letters, 6)
preOrder
## [1] "n" "a" "x" "z" "s" "q"
# Get order the values should appear in to be alphabetised
order(preOrder)
## [1] 2 1 6 5 3 4
# Use it to sort a vector
ordered<-preOrder[order(preOrder)]
ordered
## [1] "a" "n" "q" "s" "x" "z"
# Alternatively, use the sort() function for brevity
sorted<-sort(preOrder)
sorted
## [1] "a" "n" "q" "s" "x" "z"

1.7 Metadata

You can extract various pieces of information about a vector

names(handCrafted)
## NULL
names(named)
## [1] "a" "b" "c" "d"
dim(named)
## NULL
dimnames(named)
## NULL
length(named)
## [1] 4
class(named)
## [1] "numeric"
mode(named)
## [1] "numeric"
attributes(named)
## $names
## [1] "a" "b" "c" "d"
str(named)
##  Named num [1:4] 99 2 3 4
##  - attr(*, "names")= chr [1:4] "a" "b" "c" "d"

1.8 Exercises

  1. Create a vector containing upper case and lower case variants of the alphabet
  2. Create a new vector with a random sample of your new letter vector
  3. Filter out any lowercase letters

1.9 Answers

#1
lets<-c(LETTERS,letters)
#2
lets<-sample(lets,50)
#3
lets<-lets[tolower(lets)!=lets]
lets
##  [1] "M" "R" "B" "J" "D" "Z" "K" "F" "N" "Y" "Q" "H" "E" "U" "C" "V" "T"
## [18] "O" "X" "S" "L" "I" "G" "P" "A"

2 Working with lists

2.1 Intro

Lists hold multiple objects together and form the basis of complex data objects like data.frames and model results.

2.2 Create

You can create lists using the list() function or via a sequence (:). Objects in a list can be given names.

basicList<-list(c(1,2,3,4),LETTERS[5:8], rnorm(5))
namedList<-list(p1=c(1,2,3,4),p2=LETTERS[5:8])

2.3 Filter

Alternatively called subsetting, we can filter a list by using positive notations of position, negative notations of position, or the name of a element, or providing a boolean value for each item in the list. We can also use list[[ "elementname" ]] or `list$elementname for specifically detailing a single element.

basicList[1]
## [[1]]
## [1] 1 2 3 4
basicList[-1]
## [[1]]
## [1] "E" "F" "G" "H"
## 
## [[2]]
## [1] -0.05155527  0.62289312 -0.31181880 -0.90420387 -1.40879209
namedList["p2"]
## $p2
## [1] "E" "F" "G" "H"
basicList[c(TRUE,FALSE)]
## [[1]]
## [1] 1 2 3 4
## 
## [[2]]
## [1] -0.05155527  0.62289312 -0.31181880 -0.90420387 -1.40879209
basicList[[1]]
## [1] 1 2 3 4
basicList[[-1]]
## Error in basicList[[-1]]: attempt to select more than one element in get1index <real>
namedList[["p2"]]
## [1] "E" "F" "G" "H"
namedList$p2
## [1] "E" "F" "G" "H"
basicList[[c(TRUE,TRUE)]]
## [1] 1

2.4 Update

You can update one or more objects in a list by assigning the new values into the desired object using the same subsetting capabilities as noted in the Filter section.

basicList[[1]]<-8:12
basicList[1]
## [[1]]
## [1]  8  9 10 11 12
# Elements in a list can be removed by making them NULL
basicList[2]
## [[1]]
## [1] "E" "F" "G" "H"
basicList[2]<-NULL
basicList[2]
## [[1]]
## [1] -0.05155527  0.62289312 -0.31181880 -0.90420387 -1.40879209
# Append by creating a list combining the original and the additional values
basicList[[3]]<-LETTERS[5:8]

2.5 Manipulate

You can manipulate a list using functions, and also manipulate the objects stored in the list.

lapply(basicList,mode)
## [[1]]
## [1] "numeric"
## 
## [[2]]
## [1] "numeric"
## 
## [[3]]
## [1] "character"

2.6 Order

Ordering of objects in a list is rarely required but you can do it with the order() function.

unorderedList<-list(p2=c(1,2,3,4),p1=LETTERS[5:8])
unorderedList[order(names(unorderedList))]
## $p1
## [1] "E" "F" "G" "H"
## 
## $p2
## [1] 1 2 3 4

2.7 Metadata

You can extract various pieces of information about a list

names(basicList)
## NULL
names(namedList)
## [1] "p1" "p2"
dim(namedList)
## NULL
dimnames(namedList)
## NULL
length(namedList)
## [1] 2
class(namedList)
## [1] "list"
mode(namedList)
## [1] "list"
attributes(namedList)
## $names
## [1] "p1" "p2"
str(namedList)
## List of 2
##  $ p1: num [1:4] 1 2 3 4
##  $ p2: chr [1:4] "E" "F" "G" "H"

2.8 Exercise

Create a linear regression model (lm()) for the iris dataset and extract the fitted.values element

2.9 Answers

irisLM<-lm(Sepal.Width~Sepal.Length, iris)
head(irisLM$fitted.values)
##        1        2        3        4        5        6 
## 3.103334 3.115711 3.128088 3.134277 3.109523 3.084769

3 Working with tables

3.1 Intro

Tables or data.frame’s as the base structure is called in R can hold multiple columns of different data types. Normally, data.table or dplyr would be taught to super-charge data.frames (see Steph’s extended session “Cut the R learning curve” for more on these) but to reduce dependencies and make it easier to transfer code between different systems, we’ll use just base R.

  • data.frames have a coordinate system like excel so df[ 1 , 2 ] selects the intersection of row 1 and column 2
  • Rows and columns can be referenced by name
  • A data.frame is actuallly a list that is presented like a table

3.2 Create

You can create data.frames using the data.frame() function.

  • data.frame() will error if you try storing something odd in it. Use as.data.frame() to coerce
df<-data.frame(a=1:4, b=LETTERS[5:8], c=rnorm(4),row.names = letters[9:12])
df
##   a b          c
## i 1 E  0.2731886
## j 2 F -0.8659110
## k 3 G  1.0832867
## l 4 H  0.7483340

3.3 Filter

Alternatively called subsetting, we can filter a data.frame by using positive notations of position, negative notations of position, or the name of a element, or providing a boolean value for each item in the list.

To filter using a condition requires the creation of a boolean vector based on a specific column. columns can be treated as vectors by using df$colname

df[1, ]
##   a b         c
## i 1 E 0.2731886
df[ ,1]
## [1] 1 2 3 4
df[1,1]
## [1] 1
df[-(3:4),]
##   a b          c
## i 1 E  0.2731886
## j 2 F -0.8659110
df[,"a"]
## [1] 1 2 3 4
df[ , c(TRUE, TRUE, FALSE)]
##   a b
## i 1 E
## j 2 F
## k 3 G
## l 4 H
df[df$a<4,]
##   a b          c
## i 1 E  0.2731886
## j 2 F -0.8659110
## k 3 G  1.0832867

3.4 Update

You can update values in a data.frame by referencing it’s position using the same subsetting capabilities as noted in the Filter section.

df[1,1]<-2
df
##   a b          c
## i 2 E  0.2731886
## j 2 F -0.8659110
## k 3 G  1.0832867
## l 4 H  0.7483340
# Columns in a data.frame can be removed by making them NULL
df[,2]
## [1] E F G H
## Levels: E F G H
df[,2]<-NULL
df[,2]
## [1]  0.2731886 -0.8659110  1.0832867  0.7483340
# Rows in a data.frame can be removed by subsetting without them
df[2,]
##   a         c
## j 2 -0.865911
df<-df[-2,]
df[2,]
##   a        c
## k 3 1.083287
# Appends can happen in a variety of ways
superDF<-data.frame(df,d=5:7)
superDF
##   a         c d
## i 2 0.2731886 5
## k 3 1.0832867 6
## l 4 0.7483340 7
df$newcol<-5:7
df
##   a         c newcol
## i 2 0.2731886      5
## k 3 1.0832867      6
## l 4 0.7483340      7
df[4,]<-c(1,1,1)
df
##   a         c newcol
## i 2 0.2731886      5
## k 3 1.0832867      6
## l 4 0.7483340      7
## 4 1 1.0000000      1

3.5 Order

Ordering of data.frames works by providing the position numbers of row numbers in a vector and then using these to return data.frame rows in a specific order

# Get order the values should appear in
order(df$c)
## [1] 1 3 4 2
# Use it to sort a table
ordered<-df[order(df$c),]
ordered
##   a         c newcol
## i 2 0.2731886      5
## l 4 0.7483340      7
## 4 1 1.0000000      1
## k 3 1.0832867      6

3.6 Metadata

You can extract various pieces of information about a list

names(df)
## [1] "a"      "c"      "newcol"
dim(df)
## [1] 4 3
dimnames(df)
## [[1]]
## [1] "i" "k" "l" "4"
## 
## [[2]]
## [1] "a"      "c"      "newcol"
length(df)
## [1] 3
class(df)
## [1] "data.frame"
mode(df)
## [1] "list"
attributes(df)
## $names
## [1] "a"      "c"      "newcol"
## 
## $row.names
## [1] "i" "k" "l" "4"
## 
## $class
## [1] "data.frame"
str(df)
## 'data.frame':    4 obs. of  3 variables:
##  $ a     : num  2 3 4 1
##  $ c     : num  0.273 1.083 0.748 1
##  $ newcol: num  5 6 7 1

3.7 Exercises

  1. Make a copy of the iris dataset to play with
  2. Add a column with the estimated area of the sepals
  3. Filter out any record with a petal width lower than average
  4. Sort by species name in descending order

3.8 Answers

#1 
myIris<-iris
#2
myIris$Sepal.Area<-myIris$Sepal.Width * myIris$Sepal.Length
#3
avg<-mean(myIris$Petal.Width)
myIris<-myIris[myIris$Petal.Width>=avg,]
#4
myIris<-myIris[order(myIris$Species,decreasing = TRUE),]
head(myIris)
##     Sepal.Length Sepal.Width Petal.Length Petal.Width   Species Sepal.Area
## 101          6.3         3.3          6.0         2.5 virginica      20.79
## 102          5.8         2.7          5.1         1.9 virginica      15.66
## 103          7.1         3.0          5.9         2.1 virginica      21.30
## 104          6.3         2.9          5.6         1.8 virginica      18.27
## 105          6.5         3.0          5.8         2.2 virginica      19.50
## 106          7.6         3.0          6.6         2.1 virginica      22.80