SQL Server And JSON

February 11, 2018

SQL Server 2016 added support for working with JSON. Although there isn’t a JSON datatype, there is still the ability to output query results to JSON, and to break down JSON into rows and columns.
This post will run through some basic examples of working with JSON. I’m using SQL Server 2017 although 2016 can be used as well.

First I’ll create some test data.

drop table if exists dbo.JsonTest;

create table dbo.JsonTest(
FirstName varchar(20) null,
LastName varchar(20) not null,
StatusCode varchar(10) not null
);
insert into dbo.JsonTest(FirstName, LastName, StatusCode) values ('Mike', 'Smith', 'Inactive');
insert into dbo.JsonTest(FirstName, LastName, StatusCode) values ('Jane', 'Doe', 'Active');
insert into dbo.JsonTest(FirstName, LastName, StatusCode) values (NULL, 'Jones', 'Pending');

Next is returning relational data as JSON. Much like the FOR XML clause returns XML, the FOR JSON clause returns the selected data in a JSON string. AUTO will return a default structure, where using PATH will allow more control on the output, like naming the root node and returning NULL values instead of omitting them.

select FirstName, LastName, StatusCode FROM dbo.JsonTest FOR JSON AUTO;
select FirstName, LastName, StatusCode, 'Atlanta' as [Address.City], 'GA' as [Address.State] FROM dbo.JsonTest FOR JSON PATH, ROOT('People');
select FirstName, LastName, StatusCode FROM dbo.JsonTest FOR JSON PATH, INCLUDE_NULL_VALUES, WITHOUT_ARRAY_WRAPPER;

The second select also shows how to nest data, in this case in an Address node.

OPENJSON will return one row for each node in a JSON string.

declare @JSON as varchar(4000) = '[{"FirstName":"Mike","LastName":"Smith","StatusCode":"Inactive"},{"FirstName":"Jane","LastName":"Doe","StatusCode":"Active"},{"LastName":"Jones","StatusCode":"Pending"}]';

SELECT * FROM OPENJSON(@JSON);

With OPENJSON, we can also parse JSON into relational rows and columns, provided that the column name matches the JSON attribute name. If the names don’t match then NULLs are returned.

declare @JSON as varchar(4000) = '[{"FirstName":"Mike","LastName":"Smith","StatusCode":"Inactive"},{"FirstName":"Jane","LastName":"Doe","StatusCode":"Active"},{"LastName":"Jones","StatusCode":"Pending"}]';

SELECT * FROM OPENJSON(@JSON)
WITH (
FirstName varchar(20),
LastName varchar(20),
StatusCode varchar(10)
);

It is also possible to map a JSON attribute name to a different name for the output, we’ll need to specify which value to match to.


declare @JSON as varchar(4000) = '[{"FirstName":"Mike","LastName":"Smith","StatusCode":"Inactive"},{"FirstName":"Jane","LastName":"Doe","StatusCode":"Active"},{"LastName":"Jones","StatusCode":"Pending"}]';

SELECT * FROM OPENJSON(@JSON)
WITH (
GivenName varchar(20) '$.FirstName',
Surname varchar(20) '$.LastName',
StatusCode varchar(10)
);

There are also a few JSON Functions available.
ISJSON will determine if a string is valid JSON or not.
JSON_VALUE will extract a scalar value.
JSON_QUERY will return a JSON fragment or an array of values.
By default, the JSON functions are in Lax mode, which means that an error won’t be raised with an invalid operation, a NULL values will be returned instead. Strict mode can be specified, in which case an error will be raised with an invalid operation.


declare @JSON as varchar(4000) = '[{"FirstName":"Mike","LastName":"Smith","StatusCode":"Inactive", "Language":["English","Spanish"]},{"FirstName":"Jane","LastName":"Doe","StatusCode":"Active"}]';
declare @NotJson as varchar(4000) = 'Not a JSON string';

-- Return bit to determine if a string is valid JSON or not
SELECT ISJSON(@JSON);
SELECT ISJSON(@NotJson);

-- Get scalar value - 0 based array
SELECT JSON_VALUE(@JSON, '$[1].FirstName');

-- Return JSON fragment or array of values
SELECT JSON_QUERY(@JSON, '$[0].Language');

-- Default is lax mode - returns NULL on error
-- Strict will raise error

SELECT JSON_QUERY(@JSON, 'lax $[1].Language');
SELECT JSON_QUERY(@JSON, 'strict $[1].Language');

All of the SQL is also posted on GitHub.

Links:
Simple Talk has a good introduction to JSON functionality.
Microsoft Docs – OPENJSON
Microsoft Docs – JSON Functions

Advertisements

Running Totals

November 28, 2017

The guys from the SQL Server Radio podcast posted a blog post from their episode on calculating running totals in SQL Server. Their script (link in the post) covers several different ways to calculate a running total in a query (A total of a value in the current row added to each row that came before it). The best method (in my view) is using the SUM function with the OVER clause. A lot of the other methods involved a cursor or some other messy method.
I imagine most developers familiar with T-SQL have used the OVER clause with ROW_NUMBER or some other windowing function, but the OVER clause can also be used with aggregate functions, like COUNT or SUM or AVG.
An example:

create table [dbo].[RunningTotal](
	RecordId int identity(1,1) NOT NULL,
	GroupId tinyint NOT NULL,
	Amount decimal(6, 2) NOT NULL
);

insert into [dbo].[RunningTotal](GroupId, Amount) values (1, 12.34);
insert into [dbo].[RunningTotal](GroupId, Amount) values (1, 56.78);
insert into [dbo].[RunningTotal](GroupId, Amount) values (1, 55.66);
insert into [dbo].[RunningTotal](GroupId, Amount) values (2, 33.33);
insert into [dbo].[RunningTotal](GroupId, Amount) values (2, 22.10);
insert into [dbo].[RunningTotal](GroupId, Amount) values (2, 12.34);
insert into [dbo].[RunningTotal](GroupId, Amount) values (3, 98.76);

select GroupId, Amount, 
	sum(Amount) over(order by RecordId) as RunningSum,
	sum(Amount) over(partition by GroupId order by RecordId) as RunningSumByGroup
from dbo.RunningTotal;

go

This example gives a running total over the whole dataset, plus a running total by Group (GroupId).


Number To Text

June 25, 2017

I ran across a programming challenge question on Reddit, asking to create a program that will take a number and convert it to its text representation, so 11 to Eleven, 103 to One Hundred Three, etc.

I’ve posted a script on Github to accomplish this in T-SQL.

Previously, I had created a similar script to convert a Roman numeral to a number.


Calendar Table And Marking Holidays

March 15, 2017

A lot of Data Marts or reporting databases will include a Calendar or Dates table to list all days within a certain range, along with data on those days (Day of the week, quarter, etc.).
I’ve posted my script on GitHub that will generate these records.
At a previous job, we wanted to track days the office was closed so that we could measure how many business days certain functions took. Determining weekends was easy, but calculating holidays took a little more effort. You could just manually enter the holidays for each year, but it is possible to calculate the holidays for each year in one pass.
Some holidays are the same date every year, like Christmas.

UPDATE dbo.Calendar SET IsHoliday = 1 WHERE [Month] = 12 AND [Day] = 25;

Others are on a specific day of the week, like Labor Day being the first Monday in September (This script is marking US holidays).

UPDATE c
SET IsHoliday = 1
FROM dbo.Calendar as c
JOIN (
	SELECT DateKey,
		ROW_NUMBER() OVER (PARTITION BY [Year] ORDER BY DateKey) AS RowNumber
	FROM dbo.Calendar
	WHERE [Month] = 9
		AND [WeekDay] = 2
) AS z
ON z.DateKey = c.DateKey
	AND z.RowNumber = 1;

This statement will find the first Monday of each September for the range of dates in the table, and mark it as a holiday.
Another method I saw used was to look for a Monday in September where the day was between 1 and 7, since the first Monday of the month will always be in that range.
Of course, the hardest holiday to calculate is Easter. Luckily, I found an algorithm on the US Naval Observatory site to determine the date. I’ve included a function in the Calendar script to calculate the date.
Another thing to consider (which I didn’t include in this script) is if a holiday falls on a weekend, a business may choose the nearest weekday to observe that holiday.


BETWEEN

March 6, 2017

While writing a query recently, I made use of the BETWEEN operator, which will match all values within a specified range. Without thinking, I put the greater value first:
WHERE RecordId BETWEEN 100 and 90

I was somewhat surprised that no records were returned, I knew there should be matches. Reversing the order gave me the results I expected:
WHERE RecordID BETWEEN 90 and 100.

So running the query and getting the execution plan, I saw that the WHERE clause was transformed to:
WHERE RecordID >= 90 and RecordID <= 100

Seeing this, it becomes obvious why the order would matter.

MSDN – BETWEEN


Integers vs Strings For Joins

January 31, 2017

One recurring argument that comes up with databases is determining a primary key for a table. Should a natural key (which may be a string value) be used, or should a integer surrogate key be generated? In terms of query performance, the common wisdom is that integer keys will perform better. I’ve always been doubtful about that claim, I can see where if values were of different sizes then the performance would be different.
I ran across this post that compared string vs integer key values for query performance.
I wanted to perform my own test to compare the performance. I’ve posted a script on Github to build out some test tables. We end up with a Customer table with a little over 2 million records. We’ll use two different queries to join to a State table, one joining on an integer StateId value, the 2nd joining on a string StateCode. The StateId will be a smallint, and the StateCode a char(2) value, so that both columns are 2 bytes. The Customer table has a clustered index on the Customer ID, with nonclustered indexes on both the StateId and StateCode columns. The State table is a heap (it only has 62 records), since ordering by either StateId or StateCode may give an advantage to that type.
So with running the two queries, the first joining on StateId is 46% of the run time, with the StateCode string join as the other 54%. So the integer column key is faster in this case. However, on examining the query plan, Hash joins were used, so certainly we can get a more efficient plan.
I went back and added INCLUDE columns to the two Customer indexes, so that all of the returned attributes are in the index leaf level. Once I made this change, the query plan showed that the queries used Merge joins, and the query cost is the same for both queries.
So my conclusion is that there is no gain in query performance when joining on strings or integers, assuming that the values are of the same size. I need to do more research on Hash joins, but it appears that a string value hash may possibly be larger than a integer value hash, which would account for the difference in query performance.


Query Performance: PIVOT vs CASE

October 10, 2016

A discussion came up at my job on the most efficient way to return subtotals and totals in one row, something like:

Total A Total B Total C Grand Total
1 2 3 6

I had assumed that PIVOT would be the most straightforward method, but others preferred using CASE statements. So which method is more efficient?

I posted a script on GitHub to build a table with 6 million records to run both kinds of queries against. The results of each will be:

Red Blue White Total
3000000 2000000 1000000 6000000

The PIVOT query:

select [Red], [Blue], [White], ([Red] + [Blue] + [White]) as Total
from (
select RecordColor from dbo.TestPivotCase
) as a
pivot (
count(RecordColor) for RecordColor in ([Red], [White], [Blue])
)as p;

The CASE query:

select
  sum(case when RecordColor = 'Red' then 1 else 0 end) as [Red],
  sum(case when RecordColor = 'Blue' then 1 else 0 end) as [Blue],
  sum(case when RecordColor = 'White' then 1 else 0 end) as [White],
  count(*) as [Total]
from dbo.TestPivotCase;

The plans were almost exactly similar in cost, with the PIVOT with a slightly less cost (49.7 % vs. 50.3%).
The execution plans were very similar.
Case Plan:
caseplan

Pivot Plan:
pivotplan

So both methods have a similar execution plan and run time, with a slight edge to the PIVOT method.