1 Working with vectors

1.1 Intro

Vectors are the building blocks of R. They are 1D arrays that contain data of all the same type - so coercion of data types happens.

R is a 1-based array system so a vector starts at position 1
Items in a vector can have names but can also be referred to by position

1.2 Create

You can create vectors using the c() function or via a sequence (:). Values in a vector can be given names - this can be useful when subsetting values.

handCrafted<-c(1,2,3,4)
seqCrafted<-1:4
named<-c(a=1,b=2,c=3,d=4)
named

## a b c d 
## 1 2 3 4

1.3 Filter

Alternatively called subsetting, we can filter a vector by using positive notations of position, negative notations of position, the name of a value, or providing a boolean value for each item in the vector.

handCrafted[1]

## [1] 1

seqCrafted[-1]

## [1] 2 3 4

named["b"]

## b 
## 2

handCrafted[c(rep(TRUE,3),FALSE)]

## [1] 1 2 3

1.4 Update

You can update one or more values in a vector by assigning the new values into the desired subset.

handCrafted

## [1] 1 2 3 4

handCrafted[2]<-99
handCrafted

## [1]  1 99  3  4

named

## a b c d 
## 1 2 3 4

named["a"]<-99
named

##  a  b  c  d 
## 99  2  3  4

# Delete by subsetting without value
seqCrafted

## [1] 1 2 3 4

seqCrafted<-seqCrafted[-4]
seqCrafted

## [1] 1 2 3

# Append by creating a vector combining the original and the additional values
ordered

## function (x, ...) 
## factor(x, ..., ordered = TRUE)
## <bytecode: 0x38b7130>
## <environment: namespace:base>

ordered<-c(ordered,5)
ordered

## [[1]]
## function (x, ...) 
## factor(x, ..., ordered = TRUE)
## <bytecode: 0x38b7130>
## <environment: namespace:base>
## 
## [[2]]
## [1] 5

1.5 Manipulate

You can manipulate a vector using functions. You can overwrite any object created - this will change mode, class, etc as R is loosely typed and doesn’t require such things to be specified and fixed up front.

mode(seqCrafted)

## [1] "numeric"

seqCrafted<-as.character(seqCrafted)
mode(seqCrafted)

## [1] "character"

1.6 Order

Ordering of records works by providing the position numbers of values in a vector and then using those to produce a vector with the original components in new locations

preOrder<-sample(letters, 6)
preOrder

## [1] "n" "a" "x" "z" "s" "q"

# Get order the values should appear in to be alphabetised
order(preOrder)

## [1] 2 1 6 5 3 4

# Use it to sort a vector
ordered<-preOrder[order(preOrder)]
ordered

## [1] "a" "n" "q" "s" "x" "z"

# Alternatively, use the sort() function for brevity
sorted<-sort(preOrder)
sorted

## [1] "a" "n" "q" "s" "x" "z"

1.7 Metadata

You can extract various pieces of information about a vector

names(handCrafted)

## NULL

names(named)

## [1] "a" "b" "c" "d"

dim(named)

## NULL

dimnames(named)

## NULL

length(named)

## [1] 4

class(named)

## [1] "numeric"

mode(named)

## [1] "numeric"

attributes(named)

## $names
## [1] "a" "b" "c" "d"

str(named)

##  Named num [1:4] 99 2 3 4
##  - attr(*, "names")= chr [1:4] "a" "b" "c" "d"

1.8 Exercises

Create a vector containing upper case and lower case variants of the alphabet
Create a new vector with a random sample of your new letter vector
Filter out any lowercase letters

1.9 Answers

#1
lets<-c(LETTERS,letters)
#2
lets<-sample(lets,50)
#3
lets<-lets[tolower(lets)!=lets]
lets

##  [1] "M" "R" "B" "J" "D" "Z" "K" "F" "N" "Y" "Q" "H" "E" "U" "C" "V" "T"
## [18] "O" "X" "S" "L" "I" "G" "P" "A"

2 Working with lists

2.1 Intro

Lists hold multiple objects together and form the basis of complex data objects like data.frames and model results.

2.2 Create

You can create lists using the list() function or via a sequence (:). Objects in a list can be given names.

basicList<-list(c(1,2,3,4),LETTERS[5:8], rnorm(5))
namedList<-list(p1=c(1,2,3,4),p2=LETTERS[5:8])

2.3 Filter

Alternatively called subsetting, we can filter a list by using positive notations of position, negative notations of position, or the name of a element, or providing a boolean value for each item in the list. We can also use list[[ "elementname" ]] or `list$elementname for specifically detailing a single element.

basicList[1]

## [[1]]
## [1] 1 2 3 4

basicList[-1]

## [[1]]
## [1] "E" "F" "G" "H"
## 
## [[2]]
## [1] -0.05155527  0.62289312 -0.31181880 -0.90420387 -1.40879209

namedList["p2"]

## $p2
## [1] "E" "F" "G" "H"

basicList[c(TRUE,FALSE)]

## [[1]]
## [1] 1 2 3 4
## 
## [[2]]
## [1] -0.05155527  0.62289312 -0.31181880 -0.90420387 -1.40879209

basicList[[1]]

## [1] 1 2 3 4

basicList[[-1]]

## Error in basicList[[-1]]: attempt to select more than one element in get1index <real>

namedList[["p2"]]

## [1] "E" "F" "G" "H"

namedList$p2

## [1] "E" "F" "G" "H"

basicList[[c(TRUE,TRUE)]]

## [1] 1

2.4 Update

You can update one or more objects in a list by assigning the new values into the desired object using the same subsetting capabilities as noted in the Filter section.

basicList[[1]]<-8:12
basicList[1]

## [[1]]
## [1]  8  9 10 11 12

# Elements in a list can be removed by making them NULL
basicList[2]

## [[1]]
## [1] "E" "F" "G" "H"

basicList[2]<-NULL
basicList[2]

## [[1]]
## [1] -0.05155527  0.62289312 -0.31181880 -0.90420387 -1.40879209

# Append by creating a list combining the original and the additional values
basicList[[3]]<-LETTERS[5:8]

2.5 Manipulate

You can manipulate a list using functions, and also manipulate the objects stored in the list.

lapply(basicList,mode)

## [[1]]
## [1] "numeric"
## 
## [[2]]
## [1] "numeric"
## 
## [[3]]
## [1] "character"

2.6 Order

Ordering of objects in a list is rarely required but you can do it with the order() function.

unorderedList<-list(p2=c(1,2,3,4),p1=LETTERS[5:8])
unorderedList[order(names(unorderedList))]

## $p1
## [1] "E" "F" "G" "H"
## 
## $p2
## [1] 1 2 3 4

2.7 Metadata

You can extract various pieces of information about a list

names(basicList)

## NULL

names(namedList)

## [1] "p1" "p2"

dim(namedList)

## NULL

dimnames(namedList)

## NULL

length(namedList)

## [1] 2

class(namedList)

## [1] "list"

mode(namedList)

## [1] "list"

attributes(namedList)

## $names
## [1] "p1" "p2"

str(namedList)

## List of 2
##  $ p1: num [1:4] 1 2 3 4
##  $ p2: chr [1:4] "E" "F" "G" "H"

2.8 Exercise

Create a linear regression model (lm()) for the iris dataset and extract the fitted.values element

2.9 Answers

irisLM<-lm(Sepal.Width~Sepal.Length, iris)
head(irisLM$fitted.values)

##        1        2        3        4        5        6 
## 3.103334 3.115711 3.128088 3.134277 3.109523 3.084769

3 Working with tables

3.1 Intro

Tables or data.frame’s as the base structure is called in R can hold multiple columns of different data types. Normally, data.table or dplyr would be taught to super-charge data.frames (see Steph’s extended session “Cut the R learning curve” for more on these) but to reduce dependencies and make it easier to transfer code between different systems, we’ll use just base R.

data.frames have a coordinate system like excel so df[ 1 , 2 ] selects the intersection of row 1 and column 2
Rows and columns can be referenced by name
A data.frame is actuallly a list that is presented like a table

3.2 Create

You can create data.frames using the data.frame() function.

data.frame() will error if you try storing something odd in it. Use as.data.frame() to coerce

df<-data.frame(a=1:4, b=LETTERS[5:8], c=rnorm(4),row.names = letters[9:12])
df

##   a b          c
## i 1 E  0.2731886
## j 2 F -0.8659110
## k 3 G  1.0832867
## l 4 H  0.7483340

3.3 Filter

Alternatively called subsetting, we can filter a data.frame by using positive notations of position, negative notations of position, or the name of a element, or providing a boolean value for each item in the list.

To filter using a condition requires the creation of a boolean vector based on a specific column. columns can be treated as vectors by using df$colname

df[1, ]

##   a b         c
## i 1 E 0.2731886

df[ ,1]

## [1] 1 2 3 4

df[1,1]

## [1] 1

df[-(3:4),]

##   a b          c
## i 1 E  0.2731886
## j 2 F -0.8659110

df[,"a"]

## [1] 1 2 3 4

df[ , c(TRUE, TRUE, FALSE)]

##   a b
## i 1 E
## j 2 F
## k 3 G
## l 4 H

df[df$a<4,]

##   a b          c
## i 1 E  0.2731886
## j 2 F -0.8659110
## k 3 G  1.0832867

3.4 Update

You can update values in a data.frame by referencing it’s position using the same subsetting capabilities as noted in the Filter section.

df[1,1]<-2
df

##   a b          c
## i 2 E  0.2731886
## j 2 F -0.8659110
## k 3 G  1.0832867
## l 4 H  0.7483340

# Columns in a data.frame can be removed by making them NULL
df[,2]

## [1] E F G H
## Levels: E F G H

df[,2]<-NULL
df[,2]

## [1]  0.2731886 -0.8659110  1.0832867  0.7483340

# Rows in a data.frame can be removed by subsetting without them
df[2,]

##   a         c
## j 2 -0.865911

df<-df[-2,]
df[2,]

##   a        c
## k 3 1.083287

# Appends can happen in a variety of ways
superDF<-data.frame(df,d=5:7)
superDF

##   a         c d
## i 2 0.2731886 5
## k 3 1.0832867 6
## l 4 0.7483340 7

df$newcol<-5:7
df

##   a         c newcol
## i 2 0.2731886      5
## k 3 1.0832867      6
## l 4 0.7483340      7

df[4,]<-c(1,1,1)
df

##   a         c newcol
## i 2 0.2731886      5
## k 3 1.0832867      6
## l 4 0.7483340      7
## 4 1 1.0000000      1

3.5 Order

Ordering of data.frames works by providing the position numbers of row numbers in a vector and then using these to return data.frame rows in a specific order

# Get order the values should appear in
order(df$c)

## [1] 1 3 4 2

# Use it to sort a table
ordered<-df[order(df$c),]
ordered

##   a         c newcol
## i 2 0.2731886      5
## l 4 0.7483340      7
## 4 1 1.0000000      1
## k 3 1.0832867      6

3.6 Metadata