Tuesday, August 28, 2007

New Paper and Sample Code on TrustSecurity

I have posted the paper on the security vulnerabilities found in Zone Alarm, Norton Antivirus and other programs. The paper is called "In Certificates We Trust" and it has all the juicy details about how changing the system clock could cause a great many programs to stop working even critical ones like anti-virus. With the vendors notified and the products fixed the fun was over for a while but it may be time to dust off this tidbit and see what it does now. Of course Microsoft was notified and said that it was not a security concern however in Vista they fixed this "non-issue" so does that mean it was really an issue?

At the time I created the paper the information was too sensitive to broadcast but now that Vista handily keeps you from changing system time via programs I think it's time to let people try this out on their old system just for fun to see if it works on any new programs.

The code and exe sample for the clock forwarding is here: http://www.trustsecurityconsulting.com/Downloads.html

There is a program to test the clock/certificate issue and another (if you are affected by this problem) to watch for and correct wild clock changes that cause the issues mentioned in the paper.


Monday, August 27, 2007

Using Neural Nets to Compare Ultrasound Data

Our company had done some volunteer work for a company that needed to match ultrasound "image" data a while ago. The only sample data they had was a picture of the ultrasonic wave signature graphed on the computer. With a little ingenuity we were able to screen-grab the picture and split the sample into segments for a training data set and a testing data set. While using picture data of sound wave forms in a neural net is possible it's not recommended. However we were able to match at 99% confidence with 3 subjects.

Given unlimited training time and a genetic learning algorithm applied to the output a net can "find" variations in the data and learn which ones are significant and which are not. With more up-front thinking and some math magic you can get your data into a fast-training layout with better results.

AI in DB Record Matching (MPI)

What is Fuzzy Logic?
Fuzzy logic is a term that describes the concept of "maybe" where things aren't always yes, no or 1 or 0.  This third alternative of maybe is the area in which humans operate on a daily basis.   We look to computers all the time for definitive information and we either find what we're looking for or we don't. What fuzzy logic does is allow for the third alternative; that just because something is not a definite Yes doesn't mean it's a No. With most 5 gl languages now you have the ability to define custom variable types. Pascal has had that ability for 30 years making Pascal a language before it's time. With custom defined types you can return results to calling functions in a natural readable way that makes sense in the context of fuzzy logic.
I finished integrating a fuzzy logic parser that I wrote some years ago. All that remained was to put the cut-off thresholds in the main program and call the algorithms and see if they were up to the job. It took a while to get the thresholds fine-tuned for the different types of data but the algorithms are sound. With fuzzy logic parsing you can determine what type of data is contained in a field (like phone number, SSN, DL Number or birth date) and you can compare two values and get a confidence level back. By adjusting the acceptable confidence window you can fine tune your data matching.

I was able to quickly build a database record analyser to match data in different tables and to find duplicate records. The results are uncanny and the more data elements you throw into the mix the better it does. It nailed name misspellings, date transpositions, addresses that were written differently like 7th street -vs- Seventh St. etc. It even hits on similar sounding names very close to the performance of soundex but without the high number of false positives.

It has been really fun to work on some of these projects that I designed years ago but never got the chance to implement. A little vision goes a long way.

How To Recover From Bad Update or Delete Querries in SQL Server 2005

It happened to me... the good old update query without the where statement :( The end result was zeroing out hand-coded data that took me about 4 days to enter.

After searching the web for a while I came up empty. Finally I thought of a different way to run the query and found another blog that mentioned 3 products. The only one that worked with SQL 2005 was "SQL Log Explorer". I tried the eval version and after a couple tries got it to load the 54GB log from my DB that was 104 gigabytes. Needless to say this took a while.

Once I narrowed log entries to the appropriate table and date range I found one of the entries from the offending query. All I needed to do at that point was right-click the entry and tell it to "undo". What resulted was the generation of about 50 pages of SQL statements that I fed into SQL Manager and voila!!!! All my data was back.

SQL Log Explorer is an impressive product! I highly recommend it.

Also of note: SLE does not allow you to work on any db in the trial version... it only lets' you work on their DB and on the "pubs" db. As for me, I wasn't tied to my db name so I renamed my DB to "pubs" and ran the program on it.

We have now placed an order for SLE just for those special times when one of us shows our human side.

Saturday, August 04, 2007

Prescience in neural network data

Neural network systems are excellent at predicting unpredictable data. To work in neural systems one must be aware of the concept of prescience.

Prescience is advanced knowledge of future events.

Neural systems require two sets of data. Set A is the set the data is trained on and set B is the data that is to be used for prediction. In the real world this requires three sets of data... A,B AND C. Data set C is the answer set. If predicting a future event you would need two pristine sets of past data, and one pristine set of present data. The present data set will provide validation for the net that is trained on the historical data.

With this in mind, you do not want to artificially introduce prescience into the neural system by allowing data (any data) from the C set to get mixed even indirectly into the A or B set. What this does is allow the net to "see" into the future via cheating off of the future data. It's kind of like the TV show where the guy get's the next day's paper every morning.

A real-world example would be if you have cost data from 2005 and you are trying to predict future cost for other years. If you run a database query that mistakenly pulls any 2006 data, whether seemingly important or not, into the mix you will get near perfect results.

I had recently worked on a project where in the first data run I had a column of data that had pulled averages across multiple years of data. Even though this average did not reflect the current predictive year it did introduce an abnormally high prediction rate (98%). When you have a prediction rate that high you know that either your data set contains contaminated data or that your program is actually physchic. In this case removing the value that provided the prescience brought the average down to the 72% range. After many other runs with many other forms of data gathered using different methods a range in the high 60% - 74% was common. After running the full data set through and allowing it to learn from the present data set as well the number was up into the mid 80% range! This is still an amazingly fantastic predictive ratio and for financial predictions anything above 51% is a winning number, 60% and up is bank.

"Show me the money!"