Vectors are the building blocks of R. They are 1D arrays that contain data of all the same type - so coercion of data types happens.
You can create vectors using the c()
function or via a sequence (:
). Values in a vector can be given names - this can be useful when subsetting values.
handCrafted<-c(1,2,3,4)
seqCrafted<-1:4
named<-c(a=1,b=2,c=3,d=4)
named
## a b c d
## 1 2 3 4
Alternatively called subsetting, we can filter a vector by using positive notations of position, negative notations of position, the name of a value, or providing a boolean value for each item in the vector.
handCrafted[1]
## [1] 1
seqCrafted[-1]
## [1] 2 3 4
named["b"]
## b
## 2
handCrafted[c(rep(TRUE,3),FALSE)]
## [1] 1 2 3
You can update one or more values in a vector by assigning the new values into the desired subset.
handCrafted
## [1] 1 2 3 4
handCrafted[2]<-99
handCrafted
## [1] 1 99 3 4
named
## a b c d
## 1 2 3 4
named["a"]<-99
named
## a b c d
## 99 2 3 4
# Delete by subsetting without value
seqCrafted
## [1] 1 2 3 4
seqCrafted<-seqCrafted[-4]
seqCrafted
## [1] 1 2 3
# Append by creating a vector combining the original and the additional values
ordered
## function (x, ...)
## factor(x, ..., ordered = TRUE)
## <bytecode: 0x38b7130>
## <environment: namespace:base>
ordered<-c(ordered,5)
ordered
## [[1]]
## function (x, ...)
## factor(x, ..., ordered = TRUE)
## <bytecode: 0x38b7130>
## <environment: namespace:base>
##
## [[2]]
## [1] 5
You can manipulate a vector using functions. You can overwrite any object created - this will change mode, class, etc as R is loosely typed and doesn’t require such things to be specified and fixed up front.
mode(seqCrafted)
## [1] "numeric"
seqCrafted<-as.character(seqCrafted)
mode(seqCrafted)
## [1] "character"
Ordering of records works by providing the position numbers of values in a vector and then using those to produce a vector with the original components in new locations
preOrder<-sample(letters, 6)
preOrder
## [1] "n" "a" "x" "z" "s" "q"
# Get order the values should appear in to be alphabetised
order(preOrder)
## [1] 2 1 6 5 3 4
# Use it to sort a vector
ordered<-preOrder[order(preOrder)]
ordered
## [1] "a" "n" "q" "s" "x" "z"
# Alternatively, use the sort() function for brevity
sorted<-sort(preOrder)
sorted
## [1] "a" "n" "q" "s" "x" "z"
You can extract various pieces of information about a vector
names(handCrafted)
## NULL
names(named)
## [1] "a" "b" "c" "d"
dim(named)
## NULL
dimnames(named)
## NULL
length(named)
## [1] 4
class(named)
## [1] "numeric"
mode(named)
## [1] "numeric"
attributes(named)
## $names
## [1] "a" "b" "c" "d"
str(named)
## Named num [1:4] 99 2 3 4
## - attr(*, "names")= chr [1:4] "a" "b" "c" "d"
#1
lets<-c(LETTERS,letters)
#2
lets<-sample(lets,50)
#3
lets<-lets[tolower(lets)!=lets]
lets
## [1] "M" "R" "B" "J" "D" "Z" "K" "F" "N" "Y" "Q" "H" "E" "U" "C" "V" "T"
## [18] "O" "X" "S" "L" "I" "G" "P" "A"
Lists hold multiple objects together and form the basis of complex data objects like data.frames and model results.
You can create lists using the list()
function or via a sequence (:
). Objects in a list can be given names.
basicList<-list(c(1,2,3,4),LETTERS[5:8], rnorm(5))
namedList<-list(p1=c(1,2,3,4),p2=LETTERS[5:8])
Alternatively called subsetting, we can filter a list by using positive notations of position, negative notations of position, or the name of a element, or providing a boolean value for each item in the list. We can also use list[[ "elementname" ]]
or `list$elementname for specifically detailing a single element.
basicList[1]
## [[1]]
## [1] 1 2 3 4
basicList[-1]
## [[1]]
## [1] "E" "F" "G" "H"
##
## [[2]]
## [1] -0.05155527 0.62289312 -0.31181880 -0.90420387 -1.40879209
namedList["p2"]
## $p2
## [1] "E" "F" "G" "H"
basicList[c(TRUE,FALSE)]
## [[1]]
## [1] 1 2 3 4
##
## [[2]]
## [1] -0.05155527 0.62289312 -0.31181880 -0.90420387 -1.40879209
basicList[[1]]
## [1] 1 2 3 4
basicList[[-1]]
## Error in basicList[[-1]]: attempt to select more than one element in get1index <real>
namedList[["p2"]]
## [1] "E" "F" "G" "H"
namedList$p2
## [1] "E" "F" "G" "H"
basicList[[c(TRUE,TRUE)]]
## [1] 1
You can update one or more objects in a list by assigning the new values into the desired object using the same subsetting capabilities as noted in the Filter section.
basicList[[1]]<-8:12
basicList[1]
## [[1]]
## [1] 8 9 10 11 12
# Elements in a list can be removed by making them NULL
basicList[2]
## [[1]]
## [1] "E" "F" "G" "H"
basicList[2]<-NULL
basicList[2]
## [[1]]
## [1] -0.05155527 0.62289312 -0.31181880 -0.90420387 -1.40879209
# Append by creating a list combining the original and the additional values
basicList[[3]]<-LETTERS[5:8]
You can manipulate a list using functions, and also manipulate the objects stored in the list.
lapply(basicList,mode)
## [[1]]
## [1] "numeric"
##
## [[2]]
## [1] "numeric"
##
## [[3]]
## [1] "character"
Ordering of objects in a list is rarely required but you can do it with the order()
function.
unorderedList<-list(p2=c(1,2,3,4),p1=LETTERS[5:8])
unorderedList[order(names(unorderedList))]
## $p1
## [1] "E" "F" "G" "H"
##
## $p2
## [1] 1 2 3 4
You can extract various pieces of information about a list
names(basicList)
## NULL
names(namedList)
## [1] "p1" "p2"
dim(namedList)
## NULL
dimnames(namedList)
## NULL
length(namedList)
## [1] 2
class(namedList)
## [1] "list"
mode(namedList)
## [1] "list"
attributes(namedList)
## $names
## [1] "p1" "p2"
str(namedList)
## List of 2
## $ p1: num [1:4] 1 2 3 4
## $ p2: chr [1:4] "E" "F" "G" "H"
Create a linear regression model (lm()
) for the iris dataset and extract the fitted.values
element
irisLM<-lm(Sepal.Width~Sepal.Length, iris)
head(irisLM$fitted.values)
## 1 2 3 4 5 6
## 3.103334 3.115711 3.128088 3.134277 3.109523 3.084769
Tables or data.frame’s as the base structure is called in R can hold multiple columns of different data types. Normally, data.table or dplyr would be taught to super-charge data.frames (see Steph’s extended session “Cut the R learning curve” for more on these) but to reduce dependencies and make it easier to transfer code between different systems, we’ll use just base R.
df[ 1 , 2 ]
selects the intersection of row 1 and column 2You can create data.frames using the data.frame()
function.
data.frame()
will error if you try storing something odd in it. Use as.data.frame()
to coercedf<-data.frame(a=1:4, b=LETTERS[5:8], c=rnorm(4),row.names = letters[9:12])
df
## a b c
## i 1 E 0.2731886
## j 2 F -0.8659110
## k 3 G 1.0832867
## l 4 H 0.7483340
Alternatively called subsetting, we can filter a data.frame by using positive notations of position, negative notations of position, or the name of a element, or providing a boolean value for each item in the list.
To filter using a condition requires the creation of a boolean vector based on a specific column. columns can be treated as vectors by using df$colname
df[1, ]
## a b c
## i 1 E 0.2731886
df[ ,1]
## [1] 1 2 3 4
df[1,1]
## [1] 1
df[-(3:4),]
## a b c
## i 1 E 0.2731886
## j 2 F -0.8659110
df[,"a"]
## [1] 1 2 3 4
df[ , c(TRUE, TRUE, FALSE)]
## a b
## i 1 E
## j 2 F
## k 3 G
## l 4 H
df[df$a<4,]
## a b c
## i 1 E 0.2731886
## j 2 F -0.8659110
## k 3 G 1.0832867
You can update values in a data.frame by referencing it’s position using the same subsetting capabilities as noted in the Filter section.
df[1,1]<-2
df
## a b c
## i 2 E 0.2731886
## j 2 F -0.8659110
## k 3 G 1.0832867
## l 4 H 0.7483340
# Columns in a data.frame can be removed by making them NULL
df[,2]
## [1] E F G H
## Levels: E F G H
df[,2]<-NULL
df[,2]
## [1] 0.2731886 -0.8659110 1.0832867 0.7483340
# Rows in a data.frame can be removed by subsetting without them
df[2,]
## a c
## j 2 -0.865911
df<-df[-2,]
df[2,]
## a c
## k 3 1.083287
# Appends can happen in a variety of ways
superDF<-data.frame(df,d=5:7)
superDF
## a c d
## i 2 0.2731886 5
## k 3 1.0832867 6
## l 4 0.7483340 7
df$newcol<-5:7
df
## a c newcol
## i 2 0.2731886 5
## k 3 1.0832867 6
## l 4 0.7483340 7
df[4,]<-c(1,1,1)
df
## a c newcol
## i 2 0.2731886 5
## k 3 1.0832867 6
## l 4 0.7483340 7
## 4 1 1.0000000 1
Ordering of data.frames works by providing the position numbers of row numbers in a vector and then using these to return data.frame rows in a specific order
# Get order the values should appear in
order(df$c)
## [1] 1 3 4 2
# Use it to sort a table
ordered<-df[order(df$c),]
ordered
## a c newcol
## i 2 0.2731886 5
## l 4 0.7483340 7
## 4 1 1.0000000 1
## k 3 1.0832867 6
You can extract various pieces of information about a list
names(df)
## [1] "a" "c" "newcol"
dim(df)
## [1] 4 3
dimnames(df)
## [[1]]
## [1] "i" "k" "l" "4"
##
## [[2]]
## [1] "a" "c" "newcol"
length(df)
## [1] 3
class(df)
## [1] "data.frame"
mode(df)
## [1] "list"
attributes(df)
## $names
## [1] "a" "c" "newcol"
##
## $row.names
## [1] "i" "k" "l" "4"
##
## $class
## [1] "data.frame"
str(df)
## 'data.frame': 4 obs. of 3 variables:
## $ a : num 2 3 4 1
## $ c : num 0.273 1.083 0.748 1
## $ newcol: num 5 6 7 1
iris
dataset to play with#1
myIris<-iris
#2
myIris$Sepal.Area<-myIris$Sepal.Width * myIris$Sepal.Length
#3
avg<-mean(myIris$Petal.Width)
myIris<-myIris[myIris$Petal.Width>=avg,]
#4
myIris<-myIris[order(myIris$Species,decreasing = TRUE),]
head(myIris)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Area
## 101 6.3 3.3 6.0 2.5 virginica 20.79
## 102 5.8 2.7 5.1 1.9 virginica 15.66
## 103 7.1 3.0 5.9 2.1 virginica 21.30
## 104 6.3 2.9 5.6 1.8 virginica 18.27
## 105 6.5 3.0 5.8 2.2 virginica 19.50
## 106 7.6 3.0 6.6 2.1 virginica 22.80