SQL Graph

September 19, 2017

One of the new feature in SQL Server 2017 (I’m using RC2) is the addition of graph database capabilities.

What is a Graph Database?
The Neo4j site has a good basic explanation on what a graph database is. Neo4j is the leading Graph database system available. The main concepts are nodes and edges. A node represents an entity and can have properties stored along with it. Edges represent relationships between nodes, and edges can also have properties as well.

Table Creation:
SQL Server implements Nodes and Edges as relational tables. As an example, I’ll create a Player table to store data on players for a football team. The only thing new here is the ‘As Node’ statement at the end of the table creation.

drop table if exists dbo.Player;

create table dbo.Player(
Team char(3) not null,
Number tinyint not null,
FirstName varchar(50),
LastName varchar(50),
Position char(2),
primary key(Team, Number)
) as Node;

insert into dbo.Player values ('ATL', 2, 'Matt', 'Ryan', 'QB');
insert into dbo.Player values ('ATL', 11, 'Julio', 'Jones', 'WR');
insert into dbo.Player values ('ATL', 21, 'Desmond', 'Trufant', 'CB');
insert into dbo.Player values ('ATL', 44, 'Vic', 'Beasley', 'LB');

select * from dbo.Player;

When we selection from this Node table, notice that a ‘Node Id’ column has been added. This stores JSON data with some information on each record, including a numeric Node Id for each record.

{"type":"node","schema":"dbo","table":"Player","id":0}

Edges are created the same way as a Node, with a ‘As Edge’ statement.

drop table if exists dbo.PlatoonsWith;

create table dbo.PlatoonsWith(
PlatoonName varchar(10) not null
) as Edge;

select * from dbo.PlatoonsWith;

As an Edge table, there are attributes added to the table when created. There is the Edge ID, From Id and To Id. The From and To hold the node IDs that are being related.
Once the tables have been created, they can be viewed in SSMS under ‘Tables’, under a ‘Graph Tables’ directory that is new for SQL Server 2017.

Creating Relationships:
We can create relationships between nodes by inserting records into the Edge table. We’ll insert the IDs for the two Nodes that we want to relate, plus any Edge properties. In this example, we’ll relate the two offensive players (Ryan and Jones).

declare @Node1 nvarchar(1000);
declare @Node2 nvarchar(1000);

select @Node1 = [$node_id_4FE94CB5420E440DA5CE8FE7781FA404]
from dbo.Player
where Team = 'ATL' and Number = 2;

select @Node2 = [$node_id_4FE94CB5420E440DA5CE8FE7781FA404]
from dbo.Player
where Team = 'ATL' and Number = 11;

insert into dbo.PlatoonsWith([$from_id_64FB3EBE4E3449D5A59D79FB0F8FA857], [$to_id_6A42816479484605BCD86836DFE8122B], PlatoonName)
values (@Node1, @Node2, 'Offense');

select * from dbo.PlatoonsWith;

Querying Graph Data:
The MATCH keyword was added to T-SQL to aid in querying graph data. The Node table can be joined to the Edge table without the usual JOIN..ON clause, instead using the old style Table1, Table2 syntax.
For an example, we’ll find the offensive player joined to Matt Ryan.

select p1.FirstName + ' ' + p1.LastName as Player1, p1.Position as Player1Position,
	pw.PlatoonName, p2.FirstName + ' ' + p2.LastName as Player2, p2.Position as Player2Position
from dbo.Player as p1, dbo.PlatoonsWith as pw, dbo.Player as p2
where match(p1-(pw)->p2)
	and p1.Number = 2
	and p2.Team = 'ATL';

Deleting Nodes:
There isn’t any sort of protection against deleting nodes. Even if an Edge record refers to a Node, you’re still allowed to delete that Node. The Edge record with the orphaned Node will be left in place.

Wrap Up:
A script with all of the SQL from this post is posted on GitHub.
The Graph functions are definitely useful, but hopefully some improvements will be made. I’m not sure why the auto-generated columns include the extra characters appended to the column names, that does make it a little harder to query, since you have to go retrieve the column name after the table is created. It would also be useful if there were some sort of visualization tool to view the Nodes and their relations.

Links:
SQL Graph Overview – Microsoft
SQL Graph – Niko Neugebauer

Advertisements

SQL Server 2017 – New T-SQL Functions

August 14, 2017

Here are some new T-SQL functions added to SQL Server 2017. I’m running the RC 2 version.

TRIM:
SQL Server has separate functions for removing spaces at the start (LTRIM) and end (RTRIM) of a string. 2017 adds the TRIM function to remove leading and trailing spaces together.

declare @Test varchar(20) = '  Test string   ';
select @Test as OriginalString, trim(@Test) as TrimmedString;

Result:
OriginalString TrimmedString
Test string Test string

TRIM

CONCAT_WS:
CONCAT_WS stands for concatenate with separator. The first argument is a string that will be used as the separator. Any number of string arguments can follow and these string will be combined into one string, with the designated separator between each value.

select concat_ws(',', 'First', 'Second', 'Third', 'Fourth');

Result:
First,Second,Third,Fourth

CONCAT_WS

TRANSLATE:
TRANSLATE can be thought of as an extension of REPLACE. We can define a list of characters that we want replaced in a string, and then a list of the characters to replace them with.

select translate('[Test string]', '[]', '()');

Result: (Test string)
TRANSLATE

STRING_AGG:
STRING_AGG is similar to CONCAT_WS, except it will concatenate values from a column instead of a list of values. The second argument is the string to use as the value separator. We can also use the WITHIN GROUP clause to specify the order of the items in the delimited list.

drop table if exists #Test;
create table #Test(Team varchar(20) not null primary key);
insert into #Test values('Falcons'), ('Saints'), ('Panthers'), ('Buccaneers');
select string_agg(Team, ':') from #Test;
select string_agg(Team, ':') within group(order by Team desc) from #Test;

Result:
Buccaneers:Falcons:Panthers:Saints
Saints:Panthers:Falcons:Buccaneers

STRING_AGG

Links:
SQL Server 2017 New Features
SQL Shack – New 2017 String Functions


Number To Text

June 25, 2017

I ran across a programming challenge question on Reddit, asking to create a program that will take a number and convert it to its text representation, so 11 to Eleven, 103 to One Hundred Three, etc.

I’ve posted a script on Github to accomplish this in T-SQL.

Previously, I had created a similar script to convert a Roman numeral to a number.


SSMS 2017 Table Properties

June 4, 2017

A few new things I’ve noticued snder table properties in SSMS 2016 and 2017.
Table Properties – Microsoft Docs

1) I noticed a new page for ‘Security Predicates’ and wasn’t quite sure what this displays. I finally figured out this refers to (Row Level Security. A security policy is created, which links a table to the function that will determine if a row will be displayed or not. As part of the policy a predicate is created, either a Filter (filter rows read) or a Block (block a write) predicate.
This information is also available by querying the sys.security_predicates view.

2) Under ‘General’, there is a ‘GraphDB’ section with two properties ‘Table is a node table’ and ‘Table is an edge table’. In the upcoming release of the SQL Server 2017 engine there will be graph database functionality added.
Microsoft Docs – SQL Graph
It looks like SQL Server will implement the graph database with separate tables for nodes and edges.


SQL Server and R Integration – Part 2 – Running R Code

May 27, 2017

In a previous post I covered setting up SQL Server for R language integration. Now that we are set up, we can put this functionality to use.

Simple R Function Call:
First we’ll get the average from a list of numbers retrieved from SQL Server using one of R’s built-in functions. First we’ll make a list of numbers.

drop table if exists dbo.RTest;
go
create table dbo.RTest (
Measure tinyint not null
);
go

insert into dbo.RTest(Measure) values (1);
insert into dbo.RTest(Measure) values (3);
insert into dbo.RTest(Measure) values (4);
insert into dbo.RTest(Measure) values (7);
insert into dbo.RTest(Measure) values (8);
insert into dbo.RTest(Measure) values (10);
go 10000

select count(*) as RecordCount from dbo.RTest;
go

This will give us a list of 60,000 numbers. Now it is easy enough to find the average of all of these numbers in T-SQL:

select avg(Measure) from dbo.RTest;

So a few things about R before we run some R code. R is case sensitive. We use When results are returned from R to SQL Server, they will need to be as a data frame.

Let’s use use R to find the average of the ‘Measure’ values in our test table. Here is the stored procedure call:

EXEC sp_execute_external_script  
  @language=N'R',    
  @script=N'OutputDataSet <-data.frame(mean(InputDataSet$Measure))',      
  @input_data_1=N'SELECT Measure FROM dbo.RTest'   
  WITH RESULT SETS (([Average] tinyint NULL)); 

The ‘sp_execute_external_script’ stored procedure is used to execute R code. In the @language parameter we specify ‘R’ since we want to run R code. In the @input_data_1 parameter, we specify the data set that we want to pass in, in this case we select all of the records from our 1 column table. The @script parameter is where we place the R code. Any data returned by the code in the @input_data_1 parameter is visible in our R code as ‘InputDataSet’ (We can use a different name by specifying in a different parameter, but we’ll go with the defaults to keep it simple).
Even though our input data set has only one attribute (Measure), it should be specified, InputDataSet$Measure. We use the R function mean to find the mean value of all of the Measure values. So even though it is a scalar value being returned, it needs to be converted to a data frame, which is assigned to the default output variable OutputDataSet.
For the output, we use RESULT SETS to specify the name, data type and nullability of each attribute in the data set. In this case, we get the value 5 as ‘Average’.

Installing R Packages:
One of the powerful things about R is all of the external libraries of functionality that are freely available to use. If we want to explore machine learning or advanced statistical functions, we’ll need to find and install these packages.
RStudio Support has a list of the most popular R packages.
Once we find a package that we’d like to use, we’ll need to download and install it. We’ll install stringr, which has regular expression and other string manipulation functionality.
Running install.packages in the sp_execute_external_script proc didn’t work for me, I got a message that the package wasn’t available.
SQL Server Central – Installing R packages in SQL Server R Services gives some other options to install packages.
I ended up using the R command line option to install stringr. Since the package was dependent on two other packages, they were retrieved and installed as well.

Other Links:
MSDN – Using R Code in Transact-SQL
Simple Talk


Calendar Table And Marking Holidays

March 15, 2017

A lot of Data Marts or reporting databases will include a Calendar or Dates table to list all days within a certain range, along with data on those days (Day of the week, quarter, etc.).
I’ve posted my script on GitHub that will generate these records.
At a previous job, we wanted to track days the office was closed so that we could measure how many business days certain functions took. Determining weekends was easy, but calculating holidays took a little more effort. You could just manually enter the holidays for each year, but it is possible to calculate the holidays for each year in one pass.
Some holidays are the same date every year, like Christmas.

UPDATE dbo.Calendar SET IsHoliday = 1 WHERE [Month] = 12 AND [Day] = 25;

Others are on a specific day of the week, like Labor Day being the first Monday in September (This script is marking US holidays).

UPDATE c
SET IsHoliday = 1
FROM dbo.Calendar as c
JOIN (
	SELECT DateKey,
		ROW_NUMBER() OVER (PARTITION BY [Year] ORDER BY DateKey) AS RowNumber
	FROM dbo.Calendar
	WHERE [Month] = 9
		AND [WeekDay] = 2
) AS z
ON z.DateKey = c.DateKey
	AND z.RowNumber = 1;

This statement will find the first Monday of each September for the range of dates in the table, and mark it as a holiday.
Another method I saw used was to look for a Monday in September where the day was between 1 and 7, since the first Monday of the month will always be in that range.
Of course, the hardest holiday to calculate is Easter. Luckily, I found an algorithm on the US Naval Observatory site to determine the date. I’ve included a function in the Calendar script to calculate the date.
Another thing to consider (which I didn’t include in this script) is if a holiday falls on a weekend, a business may choose the nearest weekday to observe that holiday.


BETWEEN

March 6, 2017

While writing a query recently, I made use of the BETWEEN operator, which will match all values within a specified range. Without thinking, I put the greater value first:
WHERE RecordId BETWEEN 100 and 90

I was somewhat surprised that no records were returned, I knew there should be matches. Reversing the order gave me the results I expected:
WHERE RecordID BETWEEN 90 and 100.

So running the query and getting the execution plan, I saw that the WHERE clause was transformed to:
WHERE RecordID >= 90 and RecordID <= 100

Seeing this, it becomes obvious why the order would matter.

MSDN – BETWEEN