![]() |
Ãëàâíàÿ Ñëó÷àéíàÿ ñòðàíèöà Êîíòàêòû | Ìû ïîìîæåì â íàïèñàíèè âàøåé ðàáîòû! | |
|
What you really want to know about your trading system is how it will do in the future. Ideally, you want to know what the profits and the drawdowns may be. Since you cannot foresee the future, a good option is to test your systems on many data sets that "simulate" future market
228 Data Scrambling
action. You can then simply average the results to get reasonable estimates of future profits and drawdowns. We emphasize, however, that it is difficult to get precise figures for both your profitability and your drawdowns.
You can also use good estimates for the average and standard deviation of the monthly equity changes. We looked at interval equity change and its standard deviation when we studied equity curves. The standard deviation is useful for projecting future drawdowns.
As a trader, you also want to get a feel for how well the design philosophy of a particular system works on a variety of markets. Your comfort level with a system's design approach is even more valuable than performance numbers because you can implement a familiar system without hesitation. One way to gain this confidence is to test the system on many different data sets, of the type you are likely to come across in the future.
Remember that all the computer testing occurs in a sterile, unemotional environment, without any great stakes riding on the outcome. You become emotionally involved with the system in real trading because the stakes are higher. However, you never feel this pressure when you test a system. A solution is to test the system on many data sets so you can experience, at least indirectly, many different market environments. You will then have a better understanding of the variability of system performance over those markets.
Note that your system-testing efforts tell you only how your system would have done in the past. Your results are hostage to the particular data set you use. So you would like to test dozens, even hundreds, of data sets that simulate future market action. However, since most of the active futures markets have traded for less than two decades, the amounts of market data are finite. Add the complication that futures contracts expire, and your big challenge is to find sufficient data to thoroughly test your system. The more data sets you can use, the better off you are, both from a quantitative and from a psychological perspective.
This chapter discusses a new method to generate unlimited amounts of data that "simulate" future market action. This new method allows you to generate an unlimited number of data sets from historical market data. These synthetic data encapsulate knowledge about market volatility and trading patterns. You will see that once you can generate such data, you are free to thoroughly test your system in a way better than ever possible before. What is more important, you can "live through" many types of markets, and build confidence in the system that could be vital to your success.
Past Is Prolog: Sampling with Replacement 229 Past Is Prolog: Sampling with Replacement
The idea of sampling with replacement is basically as follows. Imagine a situation in which you take 100 disks, number them from 1 to 100, and then mix them up in a bag. You then shake the bag, and pull out a disk with, say, #21 on it. Now, before pulling out a second disk, you have two options. You can put #21 back into the bag so that all 100 disks are in the bag, or you can select another number out of the remaining 99 disks, without replacing #2 1 in the bag.
If you put #21 back, there is a 1 chance in 100 you will pull it out the second time. This is the process of sampling with replacement. The probability that you will get #21 the first time is 1 percent. The probability that you will get #21 twice is 0.01 percent. Thus, the probability of getting #21 three times in a row is 0.0001 percent, or once in 1,000,000 tries. Remember, just because the probability of getting #21 three times in a row is small does not mean it cannot happen.
An example will illustrate the idea behind sampling with replacement (see Table 8.1). Using the numbers from 1 to 10 as our "original" sample, we calculate its average (5.5) and standard deviation (3.03). We then use the sampling-with-replacement algorithm in Microsoft Excel 5.0 to generate 11 additional samples. If you study the samples for a minute, you will see that the same value often occurs more than once. The values are being drawn at random from the original sample, so that each of the 11 samples is different. At the same time, we retain the "signature" of the original data set, as measured by the difference between the highest and lowest value.
We have also computed the average and standard deviation for each sample. These values range from 4.30 to 6.90 for the average and 1.95 to 3.60 for the standard deviation. Thus, each sample is only a rough estimate of the average and standard deviation of the original sample. However, when we calculate the average (5.72) and standard deviation (2.81) for all 11 samples, those values are closer to the statistics of the original sample. The more samples we generate, the better our estimates for the statistics of the original sample.
Applying the same principle to system testing, we could use sampling with replacement to generate synthetic data. The extra data will improve our estimates of the average and standard deviation of, say, the monthly equity changes. The new data will allow true out-of-sample testing and extend the variety of market conditions exposed to the system.
230 Data Scrambling
Table 8.1 Illustrating sampling with replacement; note how the average and standard deviation of the 11 samples move toward the values for the original sample.
Original | #1 | #2 | #3 | #4 | #5 | #6 | #7 | #8 | #9 | #10 | #11 |
Average 5.50 5.60 4.60 6.40 5.80 6.10 6.90 5.50 6.40 5.20 6.10
Stddev 3.03 2.50 2.27 2.12 3.49 3.00 2.85 3.60 3.37 2.78 2.73
Average (all) 5.72
Std dev (all) 2.81
The idea of sampling with replacement leads to another statistical idea, called bootstrapping, in which you use sampling with replacement from the results of some experiment to develop the statistical distribution for the quantity of interest. For example, say you had the results of 200 trades from a trading system. You can use sampling with replacement to generate different possible outcomes, and then average those data to develop a distribution for future trading results. In the example above, the different values of average and standard deviation from each sample give us a distribution for the average and standard deviation of the original sample.
You can revisit our discussion of the results for the 65sma-3cc system to look at the distribution of all trades. Those 2,400 tests led to a particular histogram or distribution of trades. We could use sampling with replacement from the 2,400 trades to develop other potential distributions or histograms, and try to estimate future performance.
One difficulty in using sampling with replacement is that we can select data only from within the original sample. Hence, you can "see" only those events that have occurred in the original sample. The procedure developed here tries to overcome this problem so you can create new price ranges and price patterns.
Data Scrambling: All the Synthetic Data You'll Ever Need 231
Let us see how we can use sampling with replacement on a continuous contract to generate other continuous contracts that encapsulate market information. Once we can replicate market data, we are on the way to freedom from data set limitations.
Äàòà ïóáëèêîâàíèÿ: 2014-11-28; Ïðî÷èòàíî: 295 | Íàðóøåíèå àâòîðñêîãî ïðàâà ñòðàíèöû | Ìû ïîìîæåì â íàïèñàíèè âàøåé ðàáîòû!