Download Source (Login Required)

Background
I'm not one to do many puzzles by hand. The Rubik's cube was a puzzle that I couldn't let beat me, but aside from that, crossword puzzles, cryptoquotes, video games (with few exceptions), and Sudoku puzzles don't yield enough psychological reward for me to work so tediously on them. Sudoku did capture my interest as being an awesome database puzzle. Yes, they've been solved using SQL by others, but mine is the only solution I'm aware of that involves set matching and cross-row lookahead/lookbehinds in a single statement.
The other solutions that I've seen were either largely procedural or required lots of loops: http://www.vsj.co.uk/articles/display.asp?id=540 , http://www.sqlmag.com/Articles/ArticleID/47752/pg/2/2.html , http://technology.amis.nl/blog/?p=2066 -- this one does it in a single statement, but appears to utilize a proprietary declarative looping construct.
The only true other single statement solution that I've seen is on Joe Celko's site, solved by Richard Romley: http://www.celko.com/sudoku.txt (linked from http://www.celko.com/puzzles.htm -- Joe has a little introduction on it there. I think Richard's solution is absolutely brilliant.) On another side note, I also have three pages of fame in Joe's latest SQL Puzzles and Answers book, from pages 41 through 43 for my solution to his "Wages of Sin" puzzle.
Richard followed a vastly different approach from me; I haven't tested his solution, but I imagine that it would outperform mine given present-day RDBMS implementations. Whereas I followed a set-matching/elimination algorithm that utilized a set of computed combinations, Richard took the approach of an elimination-style algorithm; something I may have thought of in doing a 2x2 puzzle, but it never occurred to me for a 9x9 w/ 3x3 cells.
Breaking SQL Server and SQL Management Studio
SQL Server's query optimizer produced a 2600+ row execution plan for my query. SQL Management studio's Execution Plan visualizer navigation widget (the little map-like thing in the lower right-hand corner of the Execution Plan) is unable to handle it and causes GDI to crash.
Thankfully it doesn't crash Management Studio. Which, by the way, is my least favorite thing about SQL Server 2005. It's better than Enterprise Manager was in SQL Server 2000, but, aside from the text-editor features, is a huge step away from the niceties of Query Analyzer. The large memory footprint and lack of stability make it a poor tool for working on servers that are hard at work crunching data. Call me an elitist, but I believe that if you're going to be working with databases, you should do it through SQL. Using graphical tools for database design is like watching a movie instead of reading a book -- it forces you to use somebody else's interpretation of how things should be. Solving this type of problem using the GUI tools is absolutely impossible.
My machine is unable to run this as a single statement. At first, it would cause my CPU to overheat and my machine would just turn off. I had to break it down into multiple steps that perform the same action as the query's common-table-expressions (this step-based solution is included in the .zip file). Microsoft: this may be a useful query for testing and improving your optimizer. It's also useful for database benchmarking as it utilizes heavy CPU and IO operations.
What's in the .zip file
Disclaimer: All software is provided as-is. If you execute it and it causes your tempdb to grow to fill up your hard drive, overheats your CPU, et cetera, consider yourself warned. I make no warranties; run this stuff at your own risk.
- SudokuRender -- a small .Net 3.5 WPF application that's capable of rendering your Sudoku result sets in a Sudoku-style grid. It improves readibility. You may need to update the connectionstring in the source code. I wrote this as a "programmer's tool" to help me confirm the validity of my results.
- unpivoted.txt.pipe -- The pipe-delimitted file containing a handful of Sudoku problems.
- sudoku_master.sql -- This script creates the Sudoku database and populates the set-matching tables. It takes several minutes to run on my old P4 hyperthreaded CPU w/ 1GB RAM and a 10kRPM Western Digital Raptor hard drive. You will need to update the path for the unpivoted.txt.pipe file in this .sql file for the bulk insert to work.
- sudoku_solution.sql -- This is the query that my machine can't run, but it parses and I'm convinced that it would return the correct solution based on tests that I've performed; those tests are contained within sudoku_solution_step_by_step.sql
- sudoku_solution_step_by_step.sql -- This contains the same queries as used in sudoku_solution.sql, only broken out into individual steps. SQL Server seems to do a much better job digesting this script than sudoku_solution.sql
How it all works
I'm still writing this...be patient
Below I catalog the different common table expressions that are used within the problem:
Expression Name | What it represents |
Puzzle
|
The set of known values for the current Sudoku puzzle
|
RowSolutions
|
All valid state sets that match the known values in each row of the puzzle.
|
ColumnSolutions
|
All valid state sets that match the known values in each column of the puzzle.
|
CellSolutions
|
All valid state sets that match the known values in each 3x3 cell of the puzzle.
|
CompleteRowMatches
|
All valid state sets from the rows within the puzzle that are also compatible with the values in Column Solutions. In other words, RowSolutions may still contain a cell with a value that would conflict with the known value in another column. This expression eliminates those.
|
CompleteColumnMatches
|
All valid state sets from the columns within the puzzle, further limited by restrictions from the only valid allowable values from CompleteRowMatches.
|
MatchStateValues
|
The actual values associated with the sets computed in CompleteColumnMatches
|
CellMatchStateSets
|
Uses MatchStateValues to determine the statesets that work with the 3x3 cells
|
CellMatchStateSetValues
|
The remaining values, further limited
|
RowCellSets
|
The allowable combinations of 3, 3x3 cells in rows of three
|
ColCellSets
|
The allowable combinations of 3, 3x3 cells in columns of three
|
The final query of this is where everything comes together. The subquery that starts with "SELECT TOP 1 ..." would give us all possible solutions to the given Sudoku problem as a set of cell statesets. SELECT TOP 1 gives us one possible solution, which is all that we really care about. All of those self-joins determine set-compatibility. The outer query then gives us the distinct values in row-column-value sets.
This solution is quite unique in that it performs set aggregation across sets, utilizing a second degree of set logic. Instead of standard row-based relational operations, or comparison of rows within sets, this solution relies on comparisons of sets within sets. The first common-table expression returns sets of divisible sets, rather than performing a simple division operation based on known values within rows, it handles sets with missing values to determine compatibility.
SQL Source:
-- ## BEGIN DATABASE SCHEMA ## --
USE tempdb
GO
SET NOCOUNT ON
GO
IF DB_ID('Sudoku') IS NOT NULL BEGIN
ALTER DATABASE Sudoku SET SINGLE_USER WITH ROLLBACK IMMEDIATE
DROP DATABASE Sudoku
END
GO
CREATE DATABASE Sudoku
GO
USE Sudoku
GO
ALTER DATABASE Sudoku SET RECOVERY SIMPLE
GO
CREATE TABLE StateSets
(
StateSet INT
NOT NULL
, Position TINYINT
NOT NULL
CHECK (Position BETWEEN 1 AND 9)
, Val TINYINT
NOT NULL
CHECK (Val BETWEEN 1 AND 9)
, PRIMARY KEY (StateSet, Position)
, UNIQUE(StateSet, Val)
)
GO
CREATE INDEX IX_Position_Val ON StateSets (Position, Val, StateSet)
GO
CREATE TABLE Puzzles
(
PuzzleGUID UNIQUEIDENTIFIER
, Row TINYINT
NOT NULL
CHECK (Row BETWEEN 1 AND 9)
, Col TINYINT
NOT NULL
CHECK (Col BETWEEN 1 AND 9)
, Cell AS (Row - 1) / 3 * 3 + (Col - 1) / 3 + 1 PERSISTED
, CellPosition AS ((Row - 1) % 3) * 3 + ((Col - 1) % 3 + 1) PERSISTED
, Val TINYINT
NOT NULL
CHECK (Val BETWEEN 1 AND 9)
, PRIMARY KEY (PuzzleGUID, Row, Col, Cell)
, UNIQUE (PuzzleGUID, Row, Val)
, UNIQUE (PuzzleGUID, Col, Val)
, UNIQUE (PuzzleGUID, Cell, Val)
, UNIQUE (PuzzleGUID, Row, Col)
)
GO
CREATE TABLE #SudokuLoad
(
PuzzleGUID UNIQUEIDENTIFIER
, Row INT
, Col INT
, Val INT
)
BULK INSERT #SudokuLoad
FROM 'D:\Sudoku\unpivoted.txt.pipe'
WITH (ROWTERMINATOR = '\n', FIELDTERMINATOR = '|')
INSERT Puzzles (PuzzleGUID, Row, Col, Val)
SELECT * FROM #SudokuLoad
DROP TABLE #SudokuLoad
GO
-- nums = range(1,10)
DECLARE @States TABLE
(
StateSet INT
IDENTITY -- IDENTITY is fine here, table variable
PRIMARY KEY
, P1 INT
, P2 INT
, P3 INT
, P4 INT
, P5 INT
, P6 INT
, P7 INT
, P8 INT
, P9 INT
)
DECLARE @V TABLE
(
V INT
)
DECLARE @I INT
SET @I = 1
WHILE @I <= 9 BEGIN
INSERT @V
SELECT @I
SET @I = @I + 1
END
CREATE TABLE StateValues
(
StateValue TINYINT PRIMARY KEY
)
INSERT StateValues SELECT V FROM @V
INSERT @States(P1,P2,P3,P4,P5,P6,P7,P8,P9)
SELECT t1.V
, t2.V
, t3.V
, t4.V
, t5.V
, t6.V
, t7.V
, t8.V
, t9.V
FROM @V t1,@V t2,@V t3,@V t4,@V t5,@V t6,@V t7,@V t8,@V t9
--print '%s=%d' % (reduce (lambda a, b : '%s*%s'%(a, b), ['t%s.V'%n for n in nums]), reduce(lambda a, b:a*b, nums))
WHERE t1.V*t2.V*t3.V*t4.V*t5.V*t6.V*t7.V*t8.V*t9.V=362880
AND (t1.V+1)*(t2.V+1)*(t3.V+1)*(t4.V+1)*(t5.V+1)*(t6.V+1)*(t7.V+1)*(t8.V+1)*(t9.V+1)=3628800
ORDER BY t1.V, t2.V, t3.V, t4.V, t5.V, t6.V, t7.V, t8.V, t9.V
SELECT *
INTO #States
FROM @States
GO
INSERT StateSets
(
StateSet
, Position
, Val
)
SELECT StateSet
, REPLACE(Position, 'P', '')
, Val
FROM #states
UNPIVOT (Val FOR Position IN (P1, P2, P3, P4, P5, P6, P7, P8, P9))n
GO
-- ### BEGIN SOLUTION ### --
WITH Puzzle AS
(
SELECT Cell
, CellPosition
, Row
, Col
, Val
FROM Puzzles
WHERE PuzzleGUID LIKE '4877%'
)
, RowSolutions AS
(
SELECT RowStateSet
, Row
, Position
, Val
FROM (SELECT StateSet RowStateSet
, Row
FROM Puzzle
inner join StateSets
ON Position = Col
AND Puzzle.Val = StateSets.Val
inner join StateValues
ON Row = StateValue
GROUP BY StateSet, Row, Statevalue
HAVING COUNT(*) = (SELECT COUNT(*) FROM Puzzle WHERE Row = StateValue)
) a
INNER JOIN StateSets
ON RowStateSet = StateSet
) ,
ColumnSolutions AS
(
SELECT ColStateSet
, Col
, Position
, Val
FROM (SELECT StateSet ColStateSet
, Col
FROM Puzzle
inner join StateSets
ON Position = Row
AND Puzzle.Val = StateSets.Val
inner join StateValues
ON Col = StateValue
GROUP BY StateSet, Col, Statevalue
HAVING COUNT(*) = (SELECT COUNT(*) FROM Puzzle WHERE Col = StateValue)
) a
INNER JOIN StateSets
ON ColStateSet = StateSet
) ,
CellSolutions AS
(
SELECT CellStateSet
, Cell
, Position
, Val
FROM (SELECT StateSet CellStateSet
, Cell
FROM Puzzle
inner join StateSets
ON Position = CellPosition
AND Puzzle.Val = StateSets.Val
inner join StateValues
ON Cell = StateValue
GROUP BY StateSet, Cell, StateValue
HAVING COUNT(*) = (SELECT COUNT(*) FROM Puzzle WHERE Cell = StateValue)
) a
INNER JOIN StateSets
ON CellStateSet = StateSet
) ,
CompleteRowMatches AS
(
SELECT RowStateSet
, Row
FROM RowSolutions
INNER JOIN ColumnSolutions
ON Row = ColumnSolutions.Position
AND Col = RowSolutions.Position
AND RowSolutions.Val = ColumnSolutions.Val
GROUP BY RowStateSet, Row
HAVING COUNT(DISTINCT Col) = 9
) ,
CompleteColumnMatches AS
(
SELECT ColStateSet, Col
FROM RowSolutions
INNER JOIN ColumnSolutions
ON Row = ColumnSolutions.Position
AND Col = RowSolutions.Position
AND RowSolutions.Val = ColumnSolutions.Val
INNER JOIN CompleteRowMatches
ON RowSolutions.Row = CompleteRowMatches.Row
AND RowSolutions.RowStateSet = CompleteRowMatches.RowStateSet
GROUP BY ColStateSet, Col
HAVING COUNT(DISTINCT RowSolutions.Row) = 9
) ,
MatchStateValues AS
(
SELECT ColStateSet StateSet
, Col
, Position Row
, (Position - 1) / 3 * 3 + (Col - 1) / 3 + 1 Cell
, ((Position - 1) % 3) * 3 + ((Col - 1) % 3 + 1) CellPos
, Val
FROM CompleteColumnMatches
INNER JOIN StateSets
ON ColStateSet = StateSet
) ,
CellMatchStateSets AS
(
SELECT CellSolutions.CellStateSet
, CellSolutions.Cell
FROM MatchStateValues
INNER JOIN CellSolutions
ON MatchStateValues.Cell = CellSolutions.Cell
AND MatchStateValues.CellPos = CellSolutions.Position
AND MatchStateValues.Val = CellSolutions.Val
GROUP BY CellSolutions.CellStateSet, CellSolutions.Cell
HAVING COUNT(DISTINCT Position) = 9
) ,
CellMatchStateSetValues AS
(
SELECT StateSets.*
, (Position + 2) % 3 + ((cell + 2) % 3) * 3 + 1 Col
, (Position - 1) / 3 + ((cell - 1) / 3) * 3 + 1 Row
, Cell
FROM StateSets
INNER JOIN CellMatchStateSets
ON StateSet = CellStateSet
) ,
RowCellSets AS
(
SELECT a.CellStateSet ASet
, a.Cell ACell
, b.CellStateSet BSet
, b.Cell BCell
, c.CellStateSet CSet
, c.Cell CCell
FROM CellMatchStateSets a
, CellMatchStateSets b
, CellMatchStateSets c
WHERE a.Cell % 3 = 1
AND b.Cell % 3 = 2
AND c.Cell % 3 = 0
AND b.Cell BETWEEN a.Cell AND c.Cell
AND c.Cell - a.Cell = 2
AND EXISTS
(
SELECT *
FROM
(
SELECT Row
FROM CellMatchStateSetValues
WHERE (
StateSet = a.CellStateSet
AND Cell = a.Cell
)
OR
(
StateSet = b.CellStateSet
AND Cell = b.Cell
)
OR
(
StateSet = c.CellStateSet
AND Cell = c.Cell
)
GROUP BY Row
HAVING SUM(DISTINCT Val) = 45
) y
HAVING COUNT(DISTINCT Row) = 3
)
) ,
ColCellSets AS
(
SELECT a.CellStateSet ASet
, a.Cell ACell
, b.CellStateSet BSet
, b.Cell BCell
, c.CellStateSet CSet
, c.Cell CCell
FROM CellMatchStateSets a
, CellMatchStateSets b
, CellMatchStateSets c
WHERE a.Cell <= 3
AND c.Cell - b.Cell = 3
AND c.Cell - a.Cell = 6
AND EXISTS
(
SELECT *
FROM
(
SELECT Col
FROM CellMatchStateSetValues
WHERE (
StateSet = a.CellStateSet
AND Cell = a.Cell
)
OR
(
StateSet = b.CellStateSet
AND Cell = b.Cell
)
OR
(
StateSet = c.CellStateSet
AND Cell = c.Cell
)
GROUP BY Col
HAVING SUM(DISTINCT Val) = 45
) y
HAVING COUNT(DISTINCT Col) = 3
)
)
SELECT DISTINCT Row
, Col
, Val
FROM CellMatchStateSetValues
WHERE EXISTS (
SELECT DISTINCT *
FROM
(
SELECT TOP 1
cA.ASet Cell1
, cA.BSet Cell4
, cA.CSet Cell7
, rA.BSet Cell2
, rA.CSet Cell3
, cB.BSet Cell5
, cB.CSet Cell8
, rB.CSet Cell6
, rC.CSet Cell9
FROM ColCellSets cA
INNER JOIN RowCellSets rA
ON cA.ASet = rA.ASet
AND cA.ACell = 1
AND rA.ACell = 1
INNER JOIN ColCellSets cB
ON rA.BSet = cB.ASet
AND cB.ACell = 2
INNER JOIN RowCellSets rB
ON cA.BSet = rB.ASet
AND cB.BSet = rB.BSet
AND rB.ACell = 4
INNER JOIN ColCellSets cC
ON rA.CSet = cC.ASet
AND rB.CSet = cC.BSet
AND cC.ACell = 3
INNER JOIN RowCellSets rC
ON rC.ASet = cA.CSet
AND rC.BSet = cB.CSet
AND rC.CSet = cC.CSet
AND rC.ACell = 7
) s UNPIVOT (StateSet FOR Cell IN (Cell1, Cell2, Cell3, Cell4, Cell5, Cell6, Cell7, Cell8, Cell9)) x
WHERE CellMatchStateSetValues.StateSet = x.StateSet
AND RTRIM(x.Cell) = RTRIM('Cell' + CAST(CellMatchStateSetValues.Cell AS VARCHAR(10)))
)
GO