Steph Locke
2017-04-20
Learn about R and do useful day-job stuff with it
RODBC
data.table
ggplot2
rmarkdown
and knitr
shiny
I’ve heard about this new thing called R. Apparently everyone’s using it so we need to. Make me something using it, by tomorrow!
I Google frantically and find out…
Eep, but I’m not a stats bod or a programmer, what will I do? Do my usual stuff, but swisher!
OK, I need to get this thing called R installed on my laptop!
install.packages(c(
"RODBC", # Database access
"data.table", # Table manipulation
"ggplot2", # Charting
"scales", # Scale manipulations
"rmarkdown", # Presentation writing
"knitr", # Presentation production
"shiny" # Reporting
))
I’ll report on global bike sales from our company’s database
# Getting data
library("RODBC")
# Manipulating data
library("data.table")
# Build a connection to the DB for reuse
# Driver names can vary eg
# {ODBC Driver 11 for SQL Server}
azure <- odbcDriverConnect(
"Driver={SQL Server Native Client 11.0};
Server=mhknbn2kdz.database.windows.net;
Database=AdventureWorks2012;
Uid=sqlfamily;
Pwd=sqlf@m1ly;")
Order <- data.table( sqlQuery( azure,
"SELECT * FROM [Sales].[SalesOrderHeader]"))
Territory<- data.table( sqlQuery( azure,
"SELECT * FROM [Sales].[SalesTerritory]"))
Region <- data.table( sqlQuery( azure,
"SELECT * FROM [Person].[CountryRegion]"))
Hmm, I need to put all the data into a “flat-file” view
DT[i, j, by]
DT[WHERE | JOIN | ORDER ,
SELECT | UPDATE ,
GROUP][...]
setkey
Y[X]
to join Y on to X# Specify the columns to join by
setkey(Order , TerritoryID)
setkey(Territory , TerritoryID)
# Reads as join Territory to the Order table
# Seperate table so we don't have to start again
OrderTerritory <- Territory[Order]
# Specify the columns to join by
setkey(OrderTerritory , CountryRegionCode)
setkey(Region , CountryRegionCode)
# Reads as join Region to the OrderTerritory table
# Will overwrite as it's our desired state
OrderTerritory <- Region[OrderTerritory]
OK, let’s produce some tables!
OrderTerritory[ , #Nothing in 1st position
.( Value = sum(TotalDue) ,
Volume = .N ), # Aggregations
by = Name # Group by, no alias
]
Name | Value | Volume |
---|---|---|
Australia | 5985718 | 3419 |
Canada | 8465891 | 2049 |
Germany | 2660008 | 1296 |
France | 4042225 | 1362 |
United Kingdom | 4119535 | 1610 |
United States | 35057232 | 5996 |
OrderTerritory[order(-TotalDue)[1:5],
# Picks first 5 rows of TotalDue desc
.( SalesOrderNumber, # multi-select
AccountNumber,
Region=Name, # Aliasing colum
TotalDue,
OrderDate
)
]
SalesOrderNumber | AccountNumber | Region | TotalDue | OrderDate |
---|---|---|---|---|
SO3737 | 10-4020-000072 | United States | 165028.7 | 2006-09-01 |
SO3711 | 10-4020-000239 | United States | 158056.5 | 2006-09-01 |
SO3697 | 10-4020-000024 | United States | 145741.9 | 2006-09-01 |
SO8164 | 10-4020-000546 | United States | 145454.4 | 2007-08-01 |
SO13492 | 10-4020-000085 | France | 137721.3 | 2007-11-01 |
OrderTerritory[ ,
.SD[which.max(TotalDue),
# .SD is subset of rows within group
# which.max() returns row
# no of largest val
.( SalesOrderNumber,
AccountNumber,
TotalDue,
OrderDate
)
],
by=.(Region=Name)
#Aliasing in groupung
]
Region | SalesOrderNumber | AccountNumber | TotalDue | OrderDate |
---|---|---|---|---|
Australia | SO9908 | 10-4020-000015 | 71729.86 | 2007-09-01 |
Canada | SO2949 | 10-4020-000227 | 135606.68 | 2006-07-01 |
Germany | SO11596 | 10-4020-000302 | 117506.12 | 2007-10-01 |
France | SO13492 | 10-4020-000085 | 137721.31 | 2007-11-01 |
United Kingdom | SO9915 | 10-4020-000502 | 130249.26 | 2007-09-01 |
United States | SO3737 | 10-4020-000072 | 165028.75 | 2006-09-01 |
I need to make some pretties!
ggplot2
as much easier than base graphicsoptiRum::multiplot()
library(ggplot2)
ggplot(OrderTerritory # dataset
,aes(x=Name, y=..count..))+ # values
geom_bar()+ # chart type
theme_minimal()+ # themeing
labs(x="Region",title="All time sales volumes")
library(scales)
ggplot(OrderTerritory
,aes(x=Name, y=TotalDue))+
geom_bar(stat="identity")+ # use value
theme_minimal()+
scale_y_continuous(label=dollar)+ # customisation
labs(x="Region",title="All time sales value")
ggplot(OrderTerritory
,aes(x=year(OrderDate), y=TotalDue,
colour=Name))+
geom_bar(stat="identity")+
theme_minimal()+
scale_y_continuous(label=dollar)+
facet_wrap(~Name,ncol = 2) + # trellis
xlab("Order Year")
I need a slide deck for the boss to take use!
rmarkdown
to produce SamplePresentation.Rmd
I need a report for people to play with
shiny
to produce interactive online reportslibrary("shiny")
#Run the shiny app (app.R) in the current folder
runApp(".")
Phew, he’s got something to show to the Board & a web page he can play with all day long.
We learnt along the way:
RODBC
data.table
ggplot2
rmarkdown
shiny
Not bad for a day’s work!
OK, some basics covered but where to go from here? (F
= Free, *
=Discount, HO
=Hands on)
F/* HO
F HO
F
F
F
This presentation is available on github.com/stephlocke/Rtraining. All the code is available for you to take a copy and play with to help you learn on the go.
If you have any questions, contact me!