SQL Server and R Integration – Part 2 – Running R Code

May 27, 2017

In a previous post I covered setting up SQL Server for R language integration. Now that we are set up, we can put this functionality to use.

Simple R Function Call:
First we’ll get the average from a list of numbers retrieved from SQL Server using one of R’s built-in functions. First we’ll make a list of numbers.

drop table if exists dbo.RTest;
create table dbo.RTest (
Measure tinyint not null

insert into dbo.RTest(Measure) values (1);
insert into dbo.RTest(Measure) values (3);
insert into dbo.RTest(Measure) values (4);
insert into dbo.RTest(Measure) values (7);
insert into dbo.RTest(Measure) values (8);
insert into dbo.RTest(Measure) values (10);
go 10000

select count(*) as RecordCount from dbo.RTest;

This will give us a list of 60,000 numbers. Now it is easy enough to find the average of all of these numbers in T-SQL:

select avg(Measure) from dbo.RTest;

So a few things about R before we run some R code. R is case sensitive. We use When results are returned from R to SQL Server, they will need to be as a data frame.

Let’s use use R to find the average of the ‘Measure’ values in our test table. Here is the stored procedure call:

EXEC sp_execute_external_script  
  @script=N'OutputDataSet <-data.frame(mean(InputDataSet$Measure))',      
  @input_data_1=N'SELECT Measure FROM dbo.RTest'   
  WITH RESULT SETS (([Average] tinyint NULL)); 

The ‘sp_execute_external_script’ stored procedure is used to execute R code. In the @language parameter we specify ‘R’ since we want to run R code. In the @input_data_1 parameter, we specify the data set that we want to pass in, in this case we select all of the records from our 1 column table. The @script parameter is where we place the R code. Any data returned by the code in the @input_data_1 parameter is visible in our R code as ‘InputDataSet’ (We can use a different name by specifying in a different parameter, but we’ll go with the defaults to keep it simple).
Even though our input data set has only one attribute (Measure), it should be specified, InputDataSet$Measure. We use the R function mean to find the mean value of all of the Measure values. So even though it is a scalar value being returned, it needs to be converted to a data frame, which is assigned to the default output variable OutputDataSet.
For the output, we use RESULT SETS to specify the name, data type and nullability of each attribute in the data set. In this case, we get the value 5 as ‘Average’.

Installing R Packages:
One of the powerful things about R is all of the external libraries of functionality that are freely available to use. If we want to explore machine learning or advanced statistical functions, we’ll need to find and install these packages.
RStudio Support has a list of the most popular R packages.
Once we find a package that we’d like to use, we’ll need to download and install it. We’ll install stringr, which has regular expression and other string manipulation functionality.
Running install.packages in the sp_execute_external_script proc didn’t work for me, I got a message that the package wasn’t available.
SQL Server Central – Installing R packages in SQL Server R Services gives some other options to install packages.
I ended up using the R command line option to install stringr. Since the package was dependent on two other packages, they were retrieved and installed as well.

Other Links:
MSDN – Using R Code in Transact-SQL
Simple Talk

SQL Server and R Integration – Setup

February 27, 2017

A new feature in SQL Server 2016 is the ability to run R scripts within SSMS. For those not familiar with the R language, it is an open source language used to performed statistical computations, to make use of machine learning algorithms, and for data visualization.
In this post, I’ll go through a quick explanation on installing and setting up the services.

The R language comes with a great deal of built-in functionality, but its real power comes from the hundreds of packages available, created by the R community to extend the functionality of R. R is an interpreted language, using vectors and data frames (similar to tables) as the primary data structures.
R can connect to SQL Server to retrieve data, but there are a few drawbacks. For one, R works with data in memory, so your dataset is limited to the amount of memory available in your workspace. Also, R is single-threaded, so we can’t take advantage of distributing the workload. Working with R within SQL Server will allow us to use a larger dataset as well as take care of

The R components for SQL Server are installed from SQL Server setup. If the services weren’t installed with the initial installation of the database engine, you can go back and add the components.
One option is to install a stand alone R Server, that would be used as a separate instance to use in analyzing data using R. There is also an option to install in-database R services. This optional also installs an additional service, the SQL Server Launchpad, that allows integration with R. In my testing, I elected to work with the in-database component with my existing SQL Server instance.

Once the components have been installed, the ‘External Scripts Enabled’ setting will need to be set to true.

EXEC sp_configure 'external scripts enabled', 1;

We’ll also need to restart the database instance to have this setting take effect. We also need to make sure that the SQL Server Launchpad service is running.

We’ll make a call to the sp_execute_external_script stored procedure to make sure that R is properly set up and that we can make a call.
I found this sample code on MSDN that will allow us to test the R set-up without writing R code or retrieving data from tables right now.

EXEC sp_execute_external_script  
  @language =N'R',    
  @input_data_1 =N'SELECT 1 AS hello'    
  WITH RESULT SETS (([hello] int not null));    

If everything is setup correctly, we should get a single value of 1 returned with the column header ‘Hello’.

Next Steps:
In the next post, I’ll put together some actual R code and look at some of the things we can do with SQL Server data and R.

Additional Information:
SQL Server Central: Introduction to Microsoft R Services in SQL Server 2016
Syncfusion E-Book: R Succinctly – Introduction to the R language – Free, login required
Coursera – Data Science: Series of courses in Data Science, several deal with learning R.

Python Data Access Introduction

June 23, 2014

In researching some aspects of data science, I ran across this article Python Displacing R As The Programming Language For Data Science. Instead of using a domain specific language like R, people are preferring to use a general purpose language for the same functions, so I wanted to get familiar with the basics of Python.

Installing Python:
We need to install Python, as well as an editor or IDE for coding. JetBrains (the creators of ReSharper) have a Python IDE PyCharm, and they make a community edition available.
I went with the latest edition of Python (3.4.1), although the version 2 is still available.
Download Python 3.4.1
IDE – Jetbrains – PyCharm
I’m a big fan of using Chocolatey for installs, so the commands for the two needed components are:
cinst PyCharm-community
cinst python

Getting Started:
Python is an interpreted, dynamically typed and strongly typed language. It is case sensitive and everything is an object.
Comments are preceded with a #.
To get help on an object: help(object).
dir(object) will list all of an object’s methods.
On opening the IDE, we need to select a Python Interpreter – Browse to the intalled Python.exe.

Code Examples:
I wanted to run through some examples with working with data. For the first, I read data from a file (using the results from the 2013 Atlanta Falcons season) to calculate the average scores.
The data file is at 2013 Falcons Results

from statistics import mean
import csv

filePath = "E:/2013FalconsResults.txt"

falconsScores = []
opponentsScores = []

with open(filePath, "r") as f:
    reader = csv.DictReader(f, delimiter = "\t")
    for row in reader:
        print(row["WeekNumber"] + " : " + row["Opponent"])

print ("Falcons average score: " + str(mean(falconsScores)))
print ("Opponents average score: " + str(mean(opponentsScores)))		

For the 2nd example, we’ll connect to a database to get the same Falcons’ results and display them. The data file is available at 2013 Falcons Results, this can be imported into a ‘FalconsResults2013’ table in your database.

import pyodbc

connectionString = 'DRIVER={SQL Server};SERVER=ServerName;DATABASE=DatabaseName;Trusted_Connection=yes;'

db = pyodbc.connect(connectionString)
cursor = db.cursor()
cursor.execute ('select WeekNumber, GameDate, Opponent, Result, FalconsScore, OpponentScore, HomeGame from FalconsResults2013')
rs = cursor.fetchall()
for row in rs:
    print ('Week #' + str(row[0]) + ': Falcons ' + str(row[4]) + ' ' + row[2] + ' ' + str(row[5]))

Python Documentation:

Learn Python The Hard Way:

Python in 10 minutes:

R Programming Introduction

May 18, 2014

Here’s a quickstart tutorial in the R programming language. R is useful for working with statistics and producing charts and graphs of that data. R is a case-sensitive interpreted language.

First, download and install the R package. There is also plenty of documentation available at this site as well.
Then download and install the R Studio IDE.
If you use Chocolatey both R and R Studio are available as packages:
cinst R.Project
cinst R.Studio

R Studio:
Once we open R Studio we’ll see several windows:
Console – To enter commands and run code
Environment – To view variable values – Includes a Data Import button
History – List of all previous commands
Packages tab – To view installed and available packages (libraries) – Clicking on a library hyperlink will take you to a help file for that package.

Tutorial – Data Import:
We’ll run through an example where we import data and perform some calculations. In this example, I’m importing a text file with the results of the Atlanta Falcons 2013 season. The file is available at:
2013 Falcons Results

In the Console window, run the command:


To get the working directory. Copy the file there, or use:


To set a new working directory. Use forward slash / as path separator.

To read the data into a Data Frame (Data Table) variable, run:

games = read.delim("2013FalconsResults.txt", header = TRUE, sep = "\t", quote = "", dec = ".", fill = FALSE)

For the ‘Read’ parameters:
header = TRUE – First row contains column names
sep – Delimiter (In this case, tab delimited)
quote – Character used to denote strings
dec – symbol for decimal places
fill – If true, will pad strings to equal length

In the Environment tab, you’re able to expand the ‘games’ variable to see the values. The values are arranged by column, as indicated by the import file headers.
We can also run the command:


To write the values to the console window. Or to display only the values from a specified column, we run:


Which will display the values in the 5th column, the Falcons’ score from each game.

We’ll pick a column that we want to run some calculations on. First, we will select a column and write its values to a Vector variable (one dimensional array of the same data type)

scores = as.vector(as.matrix(games[5]))

Or, using some shortcuts, the same line is:

scores = c(t(games[5]))

We first have to convert the Data Frame into a Matrix and then to a Vector.
We could also create a vector with the c command:

a <- c(1,2,3,4,5)

To create a sequence of 5 numbers. <- is the assignment symbol.

With our scores, we can calculate the highest score:
The lowest score:
Or the mean score:

R Studio also has built-in functionality to visualize data.
plot(scores): Show all values from a vector plotted against the index (Week #).
hist(scores): Generated a Histogram.

R Studio allows plots to be saved as Image or PDF

Other Commands:
length(a) – to return # of elements in a
help(a) – to get help on command a
Functions: f <- function(z,y) { return (z-y) } : f(10, 7) – returns 3
data() – to get list of built-in datasets

Other links:
Intro To R – John Cook
R Tutorial
R Project Manuals