Monday, November 19, 2007

Sudoku Solver Program Idea

If I keep writing about Sudoku all the time, people will think I'm obsessed. Or boring. I'm not sure which is worse. Obsessed people have some possibility of being freakishly intriguing to a small audience that shares the same interest/hobby. Boring is just . . . well, boring.

I digress.

Tonight, I find myself preoccupied with the idea of creating a computer program that solves Sudoku puzzles-- including diabolicals. I haven't tested every single solver program in existence, obviously, but the ones I have seen don't appear to solve diabolical Sudoku puzzles. At first, this made sense-- the program is applying the various solving techniques (lone number, hidden pairs, etc.) and takes the puzzle as far as it can go with those techniques. Once the techniques stop producing results, the process has no choice but to stop.

But computers aren't humans, and vice versa.

Instead of trying to make the computer mimic human solving techniques and processes, it makes sense to capitalize on the unique strength of the computer's raw computational power. There are 81 cells in a typical Sudoku puzzle, with 9 possible variables in each cell-- that's 9 to the 81st power (or, um . . . 1.96627050476e+77 for those playing along at home). I know that seems like a lot, but with advances in processor speeds, distributed clients, and a smart approach, it isn't completely impossible. Right? Also, don't forget that the values of some of the squares are given at the start (28 given clues in yesterday's diabolical puzzle), so it's more like 9 to the 53 power (aka 3.75710212614e+50).

It's basically a multi-dimensional array, 9 by 9. It helps to think of a series of car tripometers when trying to visualize this-- except each tripometer has nine places, and they are stacked one right above the other. You increment the value of the last cell, and then you perform a series of 18 checksums-- one for each row, and one for each column. If the checksums each add up to 45, then you have a valid sudoku solution. If not, then you know there's at least one number that is wrong, you increment the number again and start over. Humans don't have the speed or patience to pull off such maneuvers-- but computers are fantastic at this sort of thing.

You could do it as a series of 81 nested for loops as well, I suppose, but that'd be crass and ugly. This reminds me of a program I wrote years ago that generated letter combinations for telephone numbers to help people find "vanity" numbers for their phones. Same concept, except you'd be running the process on nine digit numbers instead of seven.

If you really wanted to make it fast, I think you could create a "dictionary" of valid Sudoku puzzles-- in other words, run the program and let it generate all the possible valid combinations of numbers for all possible complete Sudoku puzzles (much like password cracking programs use dictionaries to assist their attacks). Then, all you'd need to do is run a match against the given clues and filter out all the solutions that don't match the values and positions of those clues, and-- voila! -- you have your solution. Of course, storing an array of that size (in RAM or hard drive space) might be problematic.

There must be something wrong in one of my assumptions. It can't possibly be *that* easy, or someone else would have already done it. The total number of possibilities (with 28 provided clues) multiplied by 18 checksum operations is 6.76278382705e+51. Of course, you wouldn't carry out all 18 checksums for every possible combination-- you'd stop running checksums as soon as the first one didn't total 45.

What language would you choose to write something like this in? Jeez, almost every programming language I know is web-oriented these days. That's certainly not too practical for this sort of operation. Maybe it's time I dusted off those old C programming skills and see if I can pull this off.

If anyone's interested in taking a shot at this with me, drop me a line.