Executing R from SQL Server requires the execution of a stored procedure called sp_execute_external_script
that takes three main parameters:
@language
This gets set to R but isn’t isn’t it an interesting parameter with lots of possibilities for the future?@script
This is the R code to be executed@input_data_1
This is a SQL statement that will produce the starting dataset in RUnfortunately, the return schema is not inherited by SQL Server so you (ideally) should provide an output specification via WITH RESULT SETS
library(RODBC)
#host<-"rmsftdp.northeurope.cloudapp.azure.com"
host<-"104.45.80.173"
db<-"AWDW16"
uname<-"Rdemo"
pwd<-"tugait"
connstring<-sprintf("DRIVER={ODBC Driver 11 for SQL Server};Server=%1$s;Database=%2$s;Uid=%3$s;Pwd=%4$s;",host,db,uname,pwd)
conn<-odbcDriverConnect(connstring)
# convenience function
sqlServer<-function(sql) sqlQuery(conn, sql)
sqlServer(
"EXECUTE sp_execute_external_script
@language = N'R'
, @script = N'OutputDataSet <- InputDataSet'
, @input_data_1 = N'SELECT 1 as Col'
WITH RESULT SETS ((col int not null)) "
)
## col
## 1 1
R can come in handy when you want to do something but it’s really ugly in SQL to achieve.
For instance, if I had to produce monthly sales figures per reseller with months as columns, I’d have to use a dynamic PIVOT effect in SQL. In R the complexity is much reduced because pivoting is really simple in R
res<-sqlServer(
"
EXECUTE sp_execute_external_script
@language = N'R'
, @script = N'
# My preferred tbl manip pkg
library(data.table)
setDT(InputDataSet)
# Pivot!
OutputDataSet <- dcast(
InputDataSet[
order(OrderDateKey)
,.(MonthTotal=sum(SalesAmount) )
,.(ResellerName,Month= paste0(CalendarYear,\"-\", MonthNumberOfYear)
) ]
, ResellerName ~ Month)
# Need to add our headers as an extra row to save worrying about specifying them
cols<-as.list(colnames(OutputDataSet))
OutputDataSet<-rbindlist(list(cols,OutputDataSet), use.names=FALSE, fill=FALSE)
'
, @input_data_1 = N'select
r.ResellerName,
f.OrderDateKey,
d.CalendarYear,
d.MonthNumberOfYear,
f.SalesAmount
from FactResellerSales f
inner join DimReseller r on f.ResellerKey=r.ResellerKey
inner join DimDate d on d.DateKey=f.OrderDateKey'
"
)
knitr::kable(head(res[,1:6]))
.1 | .2 | .3 | .4 | .5 | |
---|---|---|---|---|---|
ResellerName | 2010-12 | 2011-1 | 2011-10 | 2011-11 | 2011-12 |
A Bike Store | NA | 10993.3942 | 26119.0572 | NA | NA |
A Great Bicycle Company | NA | NA | NA | 1677.8356 | NA |
A Typical Bike Shop | NA | NA | NA | 29522.6946 | NA |
Acceptable Sales & Service | NA | 419.4589 | 838.9178 | NA | NA |
Accessories Network | NA | NA | NA | NA | NA |
lm
) model that predicts daily SalesAmount from daily sales volumes. Return the coefficients.res<-sqlServer(
"
EXECUTE sp_execute_external_script
@language = N'R'
, @script = N'
mod<-lm(SalesAmount ~ Volume, InputDataSet)
OutputDataSet<-data.frame(coefficients(mod), names(coefficients(mod)))
', @input_data_1 = N'select
sum(f.SalesAmount) as SalesAmount,
count(*) as Volume
from FactResellerSales f
inner join DimReseller r on f.ResellerKey=r.ResellerKey
group by OrderDateKey'
"
)
res
## .1
## 1 395086.637 (Intercept)
## 2 1062.314 Volume