SQL Graph

September 19, 2017

One of the new feature in SQL Server 2017 (I’m using RC2) is the addition of graph database capabilities.

What is a Graph Database?
The Neo4j site has a good basic explanation on what a graph database is. Neo4j is the leading Graph database system available. The main concepts are nodes and edges. A node represents an entity and can have properties stored along with it. Edges represent relationships between nodes, and edges can also have properties as well.

Table Creation:
SQL Server implements Nodes and Edges as relational tables. As an example, I’ll create a Player table to store data on players for a football team. The only thing new here is the ‘As Node’ statement at the end of the table creation.

drop table if exists dbo.Player;

create table dbo.Player(
Team char(3) not null,
Number tinyint not null,
FirstName varchar(50),
LastName varchar(50),
Position char(2),
primary key(Team, Number)
) as Node;

insert into dbo.Player values ('ATL', 2, 'Matt', 'Ryan', 'QB');
insert into dbo.Player values ('ATL', 11, 'Julio', 'Jones', 'WR');
insert into dbo.Player values ('ATL', 21, 'Desmond', 'Trufant', 'CB');
insert into dbo.Player values ('ATL', 44, 'Vic', 'Beasley', 'LB');

select * from dbo.Player;

When we selection from this Node table, notice that a ‘Node Id’ column has been added. This stores JSON data with some information on each record, including a numeric Node Id for each record.

{"type":"node","schema":"dbo","table":"Player","id":0}

Edges are created the same way as a Node, with a ‘As Edge’ statement.

drop table if exists dbo.PlatoonsWith;

create table dbo.PlatoonsWith(
PlatoonName varchar(10) not null
) as Edge;

select * from dbo.PlatoonsWith;

As an Edge table, there are attributes added to the table when created. There is the Edge ID, From Id and To Id. The From and To hold the node IDs that are being related.
Once the tables have been created, they can be viewed in SSMS under ‘Tables’, under a ‘Graph Tables’ directory that is new for SQL Server 2017.

Creating Relationships:
We can create relationships between nodes by inserting records into the Edge table. We’ll insert the IDs for the two Nodes that we want to relate, plus any Edge properties. In this example, we’ll relate the two offensive players (Ryan and Jones).

declare @Node1 nvarchar(1000);
declare @Node2 nvarchar(1000);

select @Node1 = [$node_id_4FE94CB5420E440DA5CE8FE7781FA404]
from dbo.Player
where Team = 'ATL' and Number = 2;

select @Node2 = [$node_id_4FE94CB5420E440DA5CE8FE7781FA404]
from dbo.Player
where Team = 'ATL' and Number = 11;

insert into dbo.PlatoonsWith([$from_id_64FB3EBE4E3449D5A59D79FB0F8FA857], [$to_id_6A42816479484605BCD86836DFE8122B], PlatoonName)
values (@Node1, @Node2, 'Offense');

select * from dbo.PlatoonsWith;

Querying Graph Data:
The MATCH keyword was added to T-SQL to aid in querying graph data. The Node table can be joined to the Edge table without the usual JOIN..ON clause, instead using the old style Table1, Table2 syntax.
For an example, we’ll find the offensive player joined to Matt Ryan.

select p1.FirstName + ' ' + p1.LastName as Player1, p1.Position as Player1Position,
	pw.PlatoonName, p2.FirstName + ' ' + p2.LastName as Player2, p2.Position as Player2Position
from dbo.Player as p1, dbo.PlatoonsWith as pw, dbo.Player as p2
where match(p1-(pw)->p2)
	and p1.Number = 2
	and p2.Team = 'ATL';

Deleting Nodes:
There isn’t any sort of protection against deleting nodes. Even if an Edge record refers to a Node, you’re still allowed to delete that Node. The Edge record with the orphaned Node will be left in place.

Wrap Up:
A script with all of the SQL from this post is posted on GitHub.
The Graph functions are definitely useful, but hopefully some improvements will be made. I’m not sure why the auto-generated columns include the extra characters appended to the column names, that does make it a little harder to query, since you have to go retrieve the column name after the table is created. It would also be useful if there were some sort of visualization tool to view the Nodes and their relations.

Links:
SQL Graph Overview – Microsoft
SQL Graph – Niko Neugebauer

Advertisements

Links – Database Articles

September 18, 2017

Links to a few interesting database articles I’ve run across.

Coding Geek: How does a relational database work?: A good overview on relational database architecture.

Simple Talk: SQL Code Smells: List of issues to look for in writing queries and in database design.

Carnegie Mellon: Advanced Database Systems 15-721: Readings from Andy Pavlo’s database internals course – Lectures are also available on YouTube.


SQL Server 2017 – New T-SQL Functions

August 14, 2017

Here are some new T-SQL functions added to SQL Server 2017. I’m running the RC 2 version.

TRIM:
SQL Server has separate functions for removing spaces at the start (LTRIM) and end (RTRIM) of a string. 2017 adds the TRIM function to remove leading and trailing spaces together.

declare @Test varchar(20) = '  Test string   ';
select @Test as OriginalString, trim(@Test) as TrimmedString;

Result:
OriginalString TrimmedString
Test string Test string

TRIM

CONCAT_WS:
CONCAT_WS stands for concatenate with separator. The first argument is a string that will be used as the separator. Any number of string arguments can follow and these string will be combined into one string, with the designated separator between each value.

select concat_ws(',', 'First', 'Second', 'Third', 'Fourth');

Result:
First,Second,Third,Fourth

CONCAT_WS

TRANSLATE:
TRANSLATE can be thought of as an extension of REPLACE. We can define a list of characters that we want replaced in a string, and then a list of the characters to replace them with.

select translate('[Test string]', '[]', '()');

Result: (Test string)
TRANSLATE

STRING_AGG:
STRING_AGG is similar to CONCAT_WS, except it will concatenate values from a column instead of a list of values. The second argument is the string to use as the value separator. We can also use the WITHIN GROUP clause to specify the order of the items in the delimited list.

drop table if exists #Test;
create table #Test(Team varchar(20) not null primary key);
insert into #Test values('Falcons'), ('Saints'), ('Panthers'), ('Buccaneers');
select string_agg(Team, ':') from #Test;
select string_agg(Team, ':') within group(order by Team desc) from #Test;

Result:
Buccaneers:Falcons:Panthers:Saints
Saints:Panthers:Falcons:Buccaneers

STRING_AGG

Links:
SQL Server 2017 New Features
SQL Shack – New 2017 String Functions


Azure Cosmos DB

July 17, 2017

Microsoft’s Azure Cosmos DB is a Multi-Model cloud database system, introduced in May 2017. DocumentDB served as Azure’s document database, but it is now part of Cosmos DB.

Set Up:
To set up an instance, we can either connect to Azure or we can use a local emulator to develop against to avoid using Azure credits. For this run through, we’ll connect to Azure.
To create a Cosmos instance in the Azure dashboard, click New, Databases, then select Azure Cosmos DB. After a unique ID is specified, we have to pick the database model to use. We can choose between Gremlin (graph), MongoDB (document), SQL (DocumentDB) or Table (key/value). For this example, I’ll go with Document DB.

Data Manipulation:
Once the instance has been created, the Azure dashboard will bring up the Quick Start page. From here we can create a Collection (the Document DB term for a table). This will create a database called ‘ToDoList’ with a collection named ‘Items’. From here we can download a sample app to connect to this Collection. This app will come already configured to connect to the correct URL, along with an authorization key. Running the solution will start a web app for a simple to-do list.
The Quick Start also links to some sample code and additional documentation.
Within the Azure dashboard, we can also bring up the Data Explorer, which is a graphical tool to view, create, and edit documents. We can also create new collections here.

Settings:
On the Azure dashboard, there are several settings that are of interest.
With ‘Replicate Data Globally’, we can create a read only version of our instance in a different data center. For example, we could have the primary read-write instance in the US East data center, but also have a read only instance in US West, or perhaps in a different country. This will allow is to distribute our data closer to our users, as well as give us a replicated data set in case we need to fail over to the back up instance.
Under ‘Default Consistency’, we can set the consistency level for our data. The levels range from Strong consistency, where any updates must be synchronously committed, to Eventual consistency, which gives the best performance but doesn’t guarantee that all copies of the data are up to date. The default setting is Session consistency, where a session will have strong consistency for its updates but eventual consistency for other sessions.
The ‘Keys’ page will give the connections strings necessary for our applications to connect to the database, either as read-write or as read-only.

Additional Information:
Azure – Cosmos DB Introduction
Cosmos DB – Getting Started
Syncfusion E-Book – Cosmos DB


Number To Text

June 25, 2017

I ran across a programming challenge question on Reddit, asking to create a program that will take a number and convert it to its text representation, so 11 to Eleven, 103 to One Hundred Three, etc.

I’ve posted a script on Github to accomplish this in T-SQL.

Previously, I had created a similar script to convert a Roman numeral to a number.


SSMS 2017 Table Properties

June 4, 2017

A few new things I’ve noticued snder table properties in SSMS 2016 and 2017.
Table Properties – Microsoft Docs

1) I noticed a new page for ‘Security Predicates’ and wasn’t quite sure what this displays. I finally figured out this refers to (Row Level Security. A security policy is created, which links a table to the function that will determine if a row will be displayed or not. As part of the policy a predicate is created, either a Filter (filter rows read) or a Block (block a write) predicate.
This information is also available by querying the sys.security_predicates view.

2) Under ‘General’, there is a ‘GraphDB’ section with two properties ‘Table is a node table’ and ‘Table is an edge table’. In the upcoming release of the SQL Server 2017 engine there will be graph database functionality added.
Microsoft Docs – SQL Graph
It looks like SQL Server will implement the graph database with separate tables for nodes and edges.


SQL Server and R Integration – Part 2 – Running R Code

May 27, 2017

In a previous post I covered setting up SQL Server for R language integration. Now that we are set up, we can put this functionality to use.

Simple R Function Call:
First we’ll get the average from a list of numbers retrieved from SQL Server using one of R’s built-in functions. First we’ll make a list of numbers.

drop table if exists dbo.RTest;
go
create table dbo.RTest (
Measure tinyint not null
);
go

insert into dbo.RTest(Measure) values (1);
insert into dbo.RTest(Measure) values (3);
insert into dbo.RTest(Measure) values (4);
insert into dbo.RTest(Measure) values (7);
insert into dbo.RTest(Measure) values (8);
insert into dbo.RTest(Measure) values (10);
go 10000

select count(*) as RecordCount from dbo.RTest;
go

This will give us a list of 60,000 numbers. Now it is easy enough to find the average of all of these numbers in T-SQL:

select avg(Measure) from dbo.RTest;

So a few things about R before we run some R code. R is case sensitive. We use When results are returned from R to SQL Server, they will need to be as a data frame.

Let’s use use R to find the average of the ‘Measure’ values in our test table. Here is the stored procedure call:

EXEC sp_execute_external_script  
  @language=N'R',    
  @script=N'OutputDataSet <-data.frame(mean(InputDataSet$Measure))',      
  @input_data_1=N'SELECT Measure FROM dbo.RTest'   
  WITH RESULT SETS (([Average] tinyint NULL)); 

The ‘sp_execute_external_script’ stored procedure is used to execute R code. In the @language parameter we specify ‘R’ since we want to run R code. In the @input_data_1 parameter, we specify the data set that we want to pass in, in this case we select all of the records from our 1 column table. The @script parameter is where we place the R code. Any data returned by the code in the @input_data_1 parameter is visible in our R code as ‘InputDataSet’ (We can use a different name by specifying in a different parameter, but we’ll go with the defaults to keep it simple).
Even though our input data set has only one attribute (Measure), it should be specified, InputDataSet$Measure. We use the R function mean to find the mean value of all of the Measure values. So even though it is a scalar value being returned, it needs to be converted to a data frame, which is assigned to the default output variable OutputDataSet.
For the output, we use RESULT SETS to specify the name, data type and nullability of each attribute in the data set. In this case, we get the value 5 as ‘Average’.

Installing R Packages:
One of the powerful things about R is all of the external libraries of functionality that are freely available to use. If we want to explore machine learning or advanced statistical functions, we’ll need to find and install these packages.
RStudio Support has a list of the most popular R packages.
Once we find a package that we’d like to use, we’ll need to download and install it. We’ll install stringr, which has regular expression and other string manipulation functionality.
Running install.packages in the sp_execute_external_script proc didn’t work for me, I got a message that the package wasn’t available.
SQL Server Central – Installing R packages in SQL Server R Services gives some other options to install packages.
I ended up using the R command line option to install stringr. Since the package was dependent on two other packages, they were retrieved and installed as well.

Other Links:
MSDN – Using R Code in Transact-SQL
Simple Talk