SQL Server And JSON

February 11, 2018

SQL Server 2016 added support for working with JSON. Although there isn’t a JSON datatype, there is still the ability to output query results to JSON, and to break down JSON into rows and columns.
This post will run through some basic examples of working with JSON. I’m using SQL Server 2017 although 2016 can be used as well.

First I’ll create some test data.

drop table if exists dbo.JsonTest;

create table dbo.JsonTest(
FirstName varchar(20) null,
LastName varchar(20) not null,
StatusCode varchar(10) not null
insert into dbo.JsonTest(FirstName, LastName, StatusCode) values ('Mike', 'Smith', 'Inactive');
insert into dbo.JsonTest(FirstName, LastName, StatusCode) values ('Jane', 'Doe', 'Active');
insert into dbo.JsonTest(FirstName, LastName, StatusCode) values (NULL, 'Jones', 'Pending');

Next is returning relational data as JSON. Much like the FOR XML clause returns XML, the FOR JSON clause returns the selected data in a JSON string. AUTO will return a default structure, where using PATH will allow more control on the output, like naming the root node and returning NULL values instead of omitting them.

select FirstName, LastName, StatusCode FROM dbo.JsonTest FOR JSON AUTO;
select FirstName, LastName, StatusCode, 'Atlanta' as [Address.City], 'GA' as [Address.State] FROM dbo.JsonTest FOR JSON PATH, ROOT('People');

The second select also shows how to nest data, in this case in an Address node.

OPENJSON will return one row for each node in a JSON string.

declare @JSON as varchar(4000) = '[{"FirstName":"Mike","LastName":"Smith","StatusCode":"Inactive"},{"FirstName":"Jane","LastName":"Doe","StatusCode":"Active"},{"LastName":"Jones","StatusCode":"Pending"}]';


With OPENJSON, we can also parse JSON into relational rows and columns, provided that the column name matches the JSON attribute name. If the names don’t match then NULLs are returned.

declare @JSON as varchar(4000) = '[{"FirstName":"Mike","LastName":"Smith","StatusCode":"Inactive"},{"FirstName":"Jane","LastName":"Doe","StatusCode":"Active"},{"LastName":"Jones","StatusCode":"Pending"}]';

FirstName varchar(20),
LastName varchar(20),
StatusCode varchar(10)

It is also possible to map a JSON attribute name to a different name for the output, we’ll need to specify which value to match to.

declare @JSON as varchar(4000) = '[{"FirstName":"Mike","LastName":"Smith","StatusCode":"Inactive"},{"FirstName":"Jane","LastName":"Doe","StatusCode":"Active"},{"LastName":"Jones","StatusCode":"Pending"}]';

GivenName varchar(20) '$.FirstName',
Surname varchar(20) '$.LastName',
StatusCode varchar(10)

There are also a few JSON Functions available.
ISJSON will determine if a string is valid JSON or not.
JSON_VALUE will extract a scalar value.
JSON_QUERY will return a JSON fragment or an array of values.
By default, the JSON functions are in Lax mode, which means that an error won’t be raised with an invalid operation, a NULL values will be returned instead. Strict mode can be specified, in which case an error will be raised with an invalid operation.

declare @JSON as varchar(4000) = '[{"FirstName":"Mike","LastName":"Smith","StatusCode":"Inactive", "Language":["English","Spanish"]},{"FirstName":"Jane","LastName":"Doe","StatusCode":"Active"}]';
declare @NotJson as varchar(4000) = 'Not a JSON string';

-- Return bit to determine if a string is valid JSON or not

-- Get scalar value - 0 based array
SELECT JSON_VALUE(@JSON, '$[1].FirstName');

-- Return JSON fragment or array of values
SELECT JSON_QUERY(@JSON, '$[0].Language');

-- Default is lax mode - returns NULL on error
-- Strict will raise error

SELECT JSON_QUERY(@JSON, 'lax $[1].Language');
SELECT JSON_QUERY(@JSON, 'strict $[1].Language');

All of the SQL is also posted on GitHub.

Simple Talk has a good introduction to JSON functionality.
Microsoft Docs – OPENJSON
Microsoft Docs – JSON Functions


Decimal vs Numeric

January 22, 2018

When storing exact numeric values with decimal values in SQL Server, there are two data types to choose from: Decimal and Numeric. Both types allow precision and scale to be defined. Precision is the maximum number of total digits allowed, and scale is the number of digits to the right of the decimal point. Both are part of the ANSI standard. Both types appear to be used the same way in SQL Server, the Microsoft documentation states that the two are synonyms. So why are there two data types for the same purpose?
In Joe Celkos’s book “SQL For Smarties”, he mentioned a sight distinction between the two.
“NUMERIC (p,s) specifies the exact precision and scale to be used. DECIMAL(p,s) specifies the exact scale, but the precision is implementation-defined to be equal or greater than the specified value.”
So there’s a slight difference in the ANSI Standard, but no difference in the SQL Server implementation.

I did find it interesting that when that when an error was raised assigning a value to a decimal variable, the error message referred to a numeric data type.

declare @Value decimal(2,1);
set @Value = 123.456;

Msg 8115, Level 16, State 8, Line 25
Arithmetic overflow error converting numeric to data type numeric.

I use decimal, since that term is more specific than numeric, which sounds like it would be any kind of number, even integers.

SQL Server 2017 – In-Database Python Setup

January 15, 2018

With SQL Server 2016, the ability to execute R code in SQL Server was added. SQL Server 2017 added functionality to run Python code as well. This functionality for either language can be installed as part of the SQL Server installation, or added to an existing instance.

My initial attempt at Python installation didn’t go smoothly. Multiple attempts at install would hang up and not progress. However, once I removed the Python 3 installation from the machine, the install was able to complete.
The Launchpad service gets installed as part of the Python/R package as well. This service needs to be running in order to execute Python or R code. At first I created a new user for the service, but I ran into errors trying to execute code.
Msg 39021, Level 16, State 1, Line 55
Unable to launch runtime for ‘Python’ script. Please check the configuration of the ‘Python’ runtime.
Msg 39019, Level 16, State 2, Line 55
An external script error occurred:
Unable to launch the runtime. ErrorCode 0x80070490: 1168(Element not found.).

I ended up giving the service an administrator account to run under, which cleared up this issue.

The ‘External Scripts Enabled’ setting should be set to true in order to run Python/R code.

EXEC sp_configure 'external scripts enabled', 1;

I’ll create a small table to have some test data.

drop table if exists dbo.Team;
create table dbo.Team(Team varchar(20) not null primary key);
insert into dbo.Team values('Falcons'), ('Saints'), ('Panthers'), ('Buccaneers');

And then run a simple loop to read the team names from the table and write them to the results pane.

exec sp_execute_external_script @language = N'Python', @script = N'for x in InputDataSet.Team: print(x)', @input_data_1 = N'select Team from dbo.Team';

InputDataSet is the default name for the data set read in from the database. There is a parameter for the sp_execute_external_script proc to set a different name, if so desired.

Additional Links:
Set up Python Machine Learning Services

SQL Server Tools List

December 31, 2017

SQL Server Tools
Here is a collection of SQL Server tools I’ve run across. These are all free tools, some from Microsoft and some from the SQL Server community.

SQL Cop: Runs checks on a database to find potential issues, like tables missing primary keys, use of deprecated data types, etc.

Microsoft Assessment and Planning (MAP) Toolkit for SQL Server: Lists all SQL Server instances in an environment,including version, edition as well as information on the host(OS, # of cores, etc).

SQL Query Stress: Developed by Adam Machanic, this is a tool to load test queries. It can create up to 200 threads to create a test load.

SQL Server Diagnostics: An add-in for SSMS (2016 or later). It can analyze memory dumps and give recommendations for issues. It also recommends Cumulative Updates for fixes for specific features being used in a specific instance.

SQL Server Migration Assistant: Tools to move data into SQL Server from Access, Oracle, MySQL and other databases.

Database Experimentation Assistant: Testing program to identify potential issues when upgrading to a newer edition of SQL Server.

Tiger Team Github: Various tools from the SQL Server engineering team. The SQL Performance dashboard reports in particular are very useful.

DiskSpd: Tool for testing storage. Takes the place of the SQLIO utility.



Running Totals

November 28, 2017

The guys from the SQL Server Radio podcast posted a blog post from their episode on calculating running totals in SQL Server. Their script (link in the post) covers several different ways to calculate a running total in a query (A total of a value in the current row added to each row that came before it). The best method (in my view) is using the SUM function with the OVER clause. A lot of the other methods involved a cursor or some other messy method.
I imagine most developers familiar with T-SQL have used the OVER clause with ROW_NUMBER or some other windowing function, but the OVER clause can also be used with aggregate functions, like COUNT or SUM or AVG.
An example:

create table [dbo].[RunningTotal](
	RecordId int identity(1,1) NOT NULL,
	GroupId tinyint NOT NULL,
	Amount decimal(6, 2) NOT NULL

insert into [dbo].[RunningTotal](GroupId, Amount) values (1, 12.34);
insert into [dbo].[RunningTotal](GroupId, Amount) values (1, 56.78);
insert into [dbo].[RunningTotal](GroupId, Amount) values (1, 55.66);
insert into [dbo].[RunningTotal](GroupId, Amount) values (2, 33.33);
insert into [dbo].[RunningTotal](GroupId, Amount) values (2, 22.10);
insert into [dbo].[RunningTotal](GroupId, Amount) values (2, 12.34);
insert into [dbo].[RunningTotal](GroupId, Amount) values (3, 98.76);

select GroupId, Amount, 
	sum(Amount) over(order by RecordId) as RunningSum,
	sum(Amount) over(partition by GroupId order by RecordId) as RunningSumByGroup
from dbo.RunningTotal;


This example gives a running total over the whole dataset, plus a running total by Group (GroupId).

Microsoft SQL Operations Studio

November 27, 2017

Microsoft has released a new database management tool called Microsoft SQL Operations Studio. The new studio will run on Windows, Linux and Mac OS, and is free and open source. It can manage on-premises SQL Servers as well as Axure instances. There’s no installer, you just unzip the files and run the EXE to start the studio.
SSMS users will be familiar with most features: Object Explorer, T-SQL code window with Intellisense, backup and restore, and so on.
There is also built-in support for Git source control. You can also create dashboard widgets to display performance and maintenance data, which they call Insight Widgets. Two examples are provided: One to display the slowest queries and one to show the space used by tables. The performance widget runs off of the Query Store data, so that feature must be enabled.
The most interesting feature was the Integrated Terminal, which allows running command lines tasks from within the Studio, including Command Line, Powershell, and BCP.
Operations Studio is still way short of SSMS on features, although I did like the dashboards and the Integrated Terminal. The basics seem to be covered, and I would expect more features to be added over time. I imagine the main use case is for non-Windows users, although a non-admin Windows user that just needed to query data may find use in the simpler version.

Download SQL Operations Studio
Insight Widgets with examples
Integrated Terminal

Apache Spark On Azure HDInsight – Overview and Deployment

October 31, 2017

Apache Spark is an in-memory distributed data processing framework that is part of the Apache open source foundation. Spark was created to work with large datasets utilizing multiple nodes. Developers can use Java, Python, Scala or R to interact with and process large datasets. Generally Spark is used as part of a Hadoop deployment, and can be used to query large datasets, process streaming data, or by data scientists using machine learning libraries to analyze data.

Data Sets:
Previously Spark worked with Resilient Distributed Datasets. RDDs are read-only, in-memory datasets. The data is distributed across several nodes, and the data sets are fault tolerant.
DataSets were introduced in Spark 1.6, which build on RDDs. A DataFrame is a DataSet with named columns. Users of R are familar with a DataFrame, which are used widely in that language.

Spark Core: The central component, handling task scheduling, memory management, etc.
Spark SQL: Works with structured data, allowing a mixture of SQL queries and data manipulation in DataFrames.
Spark Streaming: Works with data in streams (as opposed to batch processing of data).
MLlib: Machine learning library.
GraphX: Graph data processing.

There are several options for deployment. We can create a standalone cluster, we can download a VM from Hortonworks to deploy in Virtual Box, VM Ware or Docker, or we can setup a cluster an Azure, among other options. I chose to create a cluster in Azure.
To create the cluster in Azure, go to New, then Data + Analytics, then HDInsight. Here you run through a regular Azure setup, User Name/Password, resource group, etc. Once we get to the Cluster Type, we can specify that we want a Spark setup. I chose the latest edition available (Spark 2.1.0) on the Standard tier.
On the next page we specify storage options. We can choose Azure Storage (BLOB) or Data Lake storage, I went with Azure storage. If you don’t already have a storage account, you can set one up here. When specifying a name for the storage account, you aren’t allowed to use upper case letters, only lower case and numbers.
Once you create the cluster, you are billed by the hour. The cluster I created came to $3.63 an hour, with 6 nodes and 40 cores. The deployment took about 15 minutes to complete.

In a future post I’ll run through a Spark demo job and look at some use cases.

Syncfusion E-book – Spark Succinctly
Introduction to Spark on HDInsight
Apache Spark