Wednesday, October 10, 2007

Active Directory Tools

I just found one of the sweetest freeware tools for recovering deleted objects in Active Directory.

Quest also has other helpful freeware for Active Directory among other things.

SQL Magic Part 1 - Select Distinct Tricks

Select Distinct is possibly one of the most useful SQL tools but one of the most flawed. Select Distinct will give you only 1 column of data. Why is that? Shouldn't it be able to do a distinct selection on a target column and also return the other row's columns? One would think so but this is not the case.
If you need to grab all columns out of a table while doing a select distinct you can try something like this:
This gives you all the records (and all columns) where there is only 1 instance of an item. You can adjust the 1 = ( Select to a 2>= to get all items that have 2 or or fewer instances of the data you are looking for.
You can do a multi-column distinct query using a sub-select for example:
Of course we could do even more fun stuff given another level of sub-select query but SQL will not go more than 1 level in sub-selects. If we could do a two level sub select we could return all columns for a given distinct query by appending a select * from dupes where 1= (select distinct.... (select count(*)...))
Granted we can write stored procedures to do some of this stuff programmatically but deficiencies in the Distinct function are significant and cost a great deal of time in work-arounds. If new SQL standards come out any time soon they should include a Distinct function that allows for a multi column distinct specification AND wildcards.... ex: SELECT *,DISTINCT (ID, MEMBER_KEY) FROM... or perhaps better... SELECT * FROM DUPES HAVING DISTINCT(ID,MEMBER_KEY)

Sunday, October 07, 2007

Herding Cats

Managing developers is like herding cats. Developers are skittish, fickle and smart enough to be dangerous. Every programmer thinks that he is the next Bill Gates which is ultimately not too far out of the realm of possibility.

With this in mind how does one manage programmers through a restructuring or other major shift in development? The answer is non-obvious to those who have never done this before but surprisingly simple...
1) Always start with building trust. This means that you have to know what you're doing.  It also means that you need to invest serious time in the project.
2) Don't buzz-word drop. Everyone hears something different when people drop buzzwords and almost inevitably it will lead to preconceptions of what is being discussed. And this includes talking in broad generalities.
3) Keep discussions/meetings to the point and don't let them wander off topic.
4) Use fist-of-five decision making or use a proper Decision Analysis process with requirements and a ranking cube.
5) Be smart go-Agile. It's a foregone conclusion that old methods do not work. Agile systems represent 80% of the successful software development projects.
6) Put project management where it belongs... with a group representing each discipline within the division. Let them chose their own team to lead the project.
7) Stay out of project management... act as oversight, don't meddle
8) Drive discipline-specific pride by finding creative ways to encourage BA, QA, Dev, PM teams to take pride in what they do and build personal skill and presige.
9) Keep the team together whenever possible. Lay-offs and firings are almost always counter-productive.
10) Reward, Reward, Reward and go back to step #1.

So if you are doing a major revamp of how your company does things... give these ideas a try.


Saturday, September 15, 2007

Moving MS Office to a new computer or hard drive

I recently had a hard drive drive fail. It lasted long enough however to get the License Key for MS Office and a few other handy programs out of the registry with this tool:

It's free and is a must have if you have to move software from one computer to the other.

Tuesday, August 28, 2007

New Paper and Sample Code on TrustSecurity

I have posted the paper on the security vulnerabilities found in Zone Alarm, Norton Antivirus and other programs. The paper is called "In Certificates We Trust" and it has all the juicy details about how changing the system clock could cause a great many programs to stop working even critical ones like anti-virus. With the vendors notified and the products fixed the fun was over for a while but it may be time to dust off this tidbit and see what it does now. Of course Microsoft was notified and said that it was not a security concern however in Vista they fixed this "non-issue" so does that mean it was really an issue?

At the time I created the paper the information was too sensitive to broadcast but now that Vista handily keeps you from changing system time via programs I think it's time to let people try this out on their old system just for fun to see if it works on any new programs.

The code and exe sample for the clock forwarding is here:

There is a program to test the clock/certificate issue and another (if you are affected by this problem) to watch for and correct wild clock changes that cause the issues mentioned in the paper.


Monday, August 27, 2007

Using Neural Nets to Compare Ultrasound Data

Our company had done some volunteer work for a company that needed to match ultrasound "image" data a while ago. The only sample data they had was a picture of the ultrasonic wave signature graphed on the computer. With a little ingenuity we were able to screen-grab the picture and split the sample into segments for a training data set and a testing data set. While using picture data of sound wave forms in a neural net is possible it's not recommended. However we were able to match at 99% confidence with 3 subjects.

Given unlimited training time and a genetic learning algorithm applied to the output a net can "find" variations in the data and learn which ones are significant and which are not. With more up-front thinking and some math magic you can get your data into a fast-training layout with better results.

AI in DB Record Matching (MPI)

What is Fuzzy Logic?
Fuzzy logic is a term that describes the concept of "maybe" where things aren't always yes, no or 1 or 0.  This third alternative of maybe is the area in which humans operate on a daily basis.   We look to computers all the time for definitive information and we either find what we're looking for or we don't. What fuzzy logic does is allow for the third alternative; that just because something is not a definite Yes doesn't mean it's a No. With most 5 gl languages now you have the ability to define custom variable types. Pascal has had that ability for 30 years making Pascal a language before it's time. With custom defined types you can return results to calling functions in a natural readable way that makes sense in the context of fuzzy logic.
I finished integrating a fuzzy logic parser that I wrote some years ago. All that remained was to put the cut-off thresholds in the main program and call the algorithms and see if they were up to the job. It took a while to get the thresholds fine-tuned for the different types of data but the algorithms are sound. With fuzzy logic parsing you can determine what type of data is contained in a field (like phone number, SSN, DL Number or birth date) and you can compare two values and get a confidence level back. By adjusting the acceptable confidence window you can fine tune your data matching.

I was able to quickly build a database record analyser to match data in different tables and to find duplicate records. The results are uncanny and the more data elements you throw into the mix the better it does. It nailed name misspellings, date transpositions, addresses that were written differently like 7th street -vs- Seventh St. etc. It even hits on similar sounding names very close to the performance of soundex but without the high number of false positives.

It has been really fun to work on some of these projects that I designed years ago but never got the chance to implement. A little vision goes a long way.

How To Recover From Bad Update or Delete Querries in SQL Server 2005

It happened to me... the good old update query without the where statement :( The end result was zeroing out hand-coded data that took me about 4 days to enter.

After searching the web for a while I came up empty. Finally I thought of a different way to run the query and found another blog that mentioned 3 products. The only one that worked with SQL 2005 was "SQL Log Explorer". I tried the eval version and after a couple tries got it to load the 54GB log from my DB that was 104 gigabytes. Needless to say this took a while.

Once I narrowed log entries to the appropriate table and date range I found one of the entries from the offending query. All I needed to do at that point was right-click the entry and tell it to "undo". What resulted was the generation of about 50 pages of SQL statements that I fed into SQL Manager and voila!!!! All my data was back.

SQL Log Explorer is an impressive product! I highly recommend it.

Also of note: SLE does not allow you to work on any db in the trial version... it only lets' you work on their DB and on the "pubs" db. As for me, I wasn't tied to my db name so I renamed my DB to "pubs" and ran the program on it.

We have now placed an order for SLE just for those special times when one of us shows our human side.

Saturday, August 04, 2007

Prescience in neural network data

Neural network systems are excellent at predicting unpredictable data. To work in neural systems one must be aware of the concept of prescience.

Prescience is advanced knowledge of future events.

Neural systems require two sets of data. Set A is the set the data is trained on and set B is the data that is to be used for prediction. In the real world this requires three sets of data... A,B AND C. Data set C is the answer set. If predicting a future event you would need two pristine sets of past data, and one pristine set of present data. The present data set will provide validation for the net that is trained on the historical data.

With this in mind, you do not want to artificially introduce prescience into the neural system by allowing data (any data) from the C set to get mixed even indirectly into the A or B set. What this does is allow the net to "see" into the future via cheating off of the future data. It's kind of like the TV show where the guy get's the next day's paper every morning.

A real-world example would be if you have cost data from 2005 and you are trying to predict future cost for other years. If you run a database query that mistakenly pulls any 2006 data, whether seemingly important or not, into the mix you will get near perfect results.

I had recently worked on a project where in the first data run I had a column of data that had pulled averages across multiple years of data. Even though this average did not reflect the current predictive year it did introduce an abnormally high prediction rate (98%). When you have a prediction rate that high you know that either your data set contains contaminated data or that your program is actually physchic. In this case removing the value that provided the prescience brought the average down to the 72% range. After many other runs with many other forms of data gathered using different methods a range in the high 60% - 74% was common. After running the full data set through and allowing it to learn from the present data set as well the number was up into the mid 80% range! This is still an amazingly fantastic predictive ratio and for financial predictions anything above 51% is a winning number, 60% and up is bank.

"Show me the money!"