Saturday, August 04, 2007

Prescience in neural network data

Neural network systems are excellent at predicting unpredictable data. To work in neural systems one must be aware of the concept of prescience.

Prescience is advanced knowledge of future events.

Neural systems require two sets of data. Set A is the set the data is trained on and set B is the data that is to be used for prediction. In the real world this requires three sets of data... A,B AND C. Data set C is the answer set. If predicting a future event you would need two pristine sets of past data, and one pristine set of present data. The present data set will provide validation for the net that is trained on the historical data.

With this in mind, you do not want to artificially introduce prescience into the neural system by allowing data (any data) from the C set to get mixed even indirectly into the A or B set. What this does is allow the net to "see" into the future via cheating off of the future data. It's kind of like the TV show where the guy get's the next day's paper every morning.

A real-world example would be if you have cost data from 2005 and you are trying to predict future cost for other years. If you run a database query that mistakenly pulls any 2006 data, whether seemingly important or not, into the mix you will get near perfect results.

I had recently worked on a project where in the first data run I had a column of data that had pulled averages across multiple years of data. Even though this average did not reflect the current predictive year it did introduce an abnormally high prediction rate (98%). When you have a prediction rate that high you know that either your data set contains contaminated data or that your program is actually physchic. In this case removing the value that provided the prescience brought the average down to the 72% range. After many other runs with many other forms of data gathered using different methods a range in the high 60% - 74% was common. After running the full data set through and allowing it to learn from the present data set as well the number was up into the mid 80% range! This is still an amazingly fantastic predictive ratio and for financial predictions anything above 51% is a winning number, 60% and up is bank.

"Show me the money!"