Step 1: Collecting Training and Test Data for Pattern Recognition Using an MT5 Script

This is the first article from my series of articles on developing neural network-based pattern recognition indicators for MetaTrader5. Here, you can find an introduction to the series. Here, you can find the second article describing training neural networks in Google Colab using collected data. In the third article, I describe how you can use your model as MT5 indicator using DT-Box-Inference tool. In the following article you can find instructions on installing and configuring DT-Box-Inference and the necessary MT5 files. Finally, here you find the source code of DT-Box-Inference.

In this article, I will walk you through the process of collecting training and test data for training a price pattern recognition neural network. Data collection is the most important part of the entire chain. If the training and test data are of poor quality or poorly representative of the problem domain, even the most sophisticated neural network architectures will fail to achieve meaningful results. Therefore, ensuring the relevance and quality of the dataset is crucial in achieving effective neural network training outcomes.

Additionally, a neural network has the potential to ‘learn’ to recognize any type of pattern to the extent of completely ‘remembering’ the data. Even if the dataset contains noise or random values, the network can still learn from it; however, in this case, its predictions on new, unseen data will be completely useless. This leads to an important consideration: while the architecture of the model is important, it plays a secondary role to the quality of the data being used.

To streamline your efforts in pattern data collection, I have developed a MetaTrader5 script that automates all the routine tasks, allowing you to focus solely on marking patterns on a chart. Once you install DT-Box-Inference, you will find the script named ‘CollectPatternData’ available in your MT5 terminal under Expert Advisors -> pccom -> DL. To gain a better understanding of the process, I suggest you follow the described steps while reading.

The Pattern

Now, let’s begin with the pattern we will use throughout this series of articles. It can be described as a relatively non-volatile period followed by a sudden price rise, which is in turn followed by a price drop. The idea is that if we manage to identify the first two parts—the non-volatile period and the subsequent price rise—we can then enter a short position. I used the same pattern in my articles on developing profitable strategies using the DT-Box tool

Collecting and preparing data is often the most time-consuming process in the neural network training workflow. Initially, you need to identify as many pattern occurrences on a chart as possible and then save them as arrays of bars. By design, neural networks process inputs through input layers with a fixed number of neurons. Since the number of bars (data points) in each pattern can vary, you need to pad or trim your sequences to ensure uniform length. Subsequently, you must apply normalization within each window to make the model scale-invariant. The next step involves dividing the collected datasets into training sets (data used to adjust neural network parameters during learning) and test sets (data used to validate). Oh, and remember that the errors in the process and bugs are just part of the fun!

Mark Down Patterns on a Chart

To simplify the process and allow you to focus on identifying trading patterns rather than getting bogged down in the routines of data engineering, I have developed an MT5 script that automates all aspects of training and test data preparation. All you need to do is mark your patterns on the chart using rectangle shapes. The script will then handle the collection, padding or trimming, normalization, and random division into training and test sets, before saving the datasets into files. Simply locate the desired pattern on the chart and overlay it with a red rectangle:

The script will scan the chart, identify rectangles, and save all the bars from the left to the right border of each rectangle.

We will follow the supervised learning technique, where the model is trained using labeled data. This means that for each input (an array of bars), we provide the corresponding desired output (label). Specifically, all the arrays of bars that represent our pattern will be labeled with ‘1’ (indicating that this dataset includes our pattern). However, it is also necessary to include arrays of bars that do not represent our pattern so that the neural network can learn to distinguish between our pattern and everything else.

While a red rectangle indicates a pattern and is labeled as ‘1’, a white rectangle represents a non-pattern, with arrays of bars collected from these white rectangles being labeled as ‘0’.

Since we are going to use a simple feedforward densely connected neural network with just one hidden layer for our experiments, I recommend choosing the most visually distinct parts of the chart for labeling non-patterns. A simpler model is more effective at distinguishing patterns that look significantly different, whereas more complex neural networks are needed to differentiate between similarly looking patterns.

To change the color of a rectangle on the chart, first add a rectangle, then double-click on any of its sides to ‘select’ it. Right-clicking on the selected rectangle will open a context menu:

Click on ‘Properties of …’ and in the menu and choose ‘white’ color from a palette:

From now on, all new rectangles you add to the chart will be white.

Using this approach, you need to mark several patterns that you want your neural network to detect with red rectangles and approximately the same amount of non-patterns with white rectangles. At this stage, you need to keep your dataset balanced, i.e., where the pattern of class 1 accounts for about 50% of your dataset.

If the pattern you are tracking constitutes only say 10% of the dataset, you face the risk that the model will learn to predominantly predict the more frequent class because it minimizes error on such datasets. In other words, if the model predicts that every other bar sequence doesn’t contain a pattern, it will be almost always right.

If, for some reason, your dataset is naturally imbalanced, you will need to employ several strategies to effectively train your neural network. Here you can get an overview, and here you can gain some practical insights.

Running the Script

Now it is time to collect all this data using the script I developed for you. Whether you follow the installation instructions or upload MT5 files from my Github, in MetaTrader5, navigate to Expert Advisors -> pccom -> DL -> CollectPatternData script, and run it on the chart with the patterns you’ve marked:

In the input parameters, you must specify colors for label ‘1’ (Pattern1Color parameter) and label ‘0’ (Pattern0Color parameter), and select a folder where your training and test data will be saved:

Once you click ‘Ok,’ the script will begin collecting the data, logging some information about the patterns in the process:

The script has already collected 171 pattern data sets, and I encourage you to mark at least 100 patterns of both classes on a chart.

Since we need all patterns to contain the same number of bars, the script selects the pattern with the smallest size and trims all other patterns from the left side. For instance, in the picture above, you can see that the data collection script has defined the pattern size as 42 bars. This also means that our neural network’s input layer will consist of 42 neurons.

After executing the script, in the destination folder, you will find two CSV files named according to the date and time at which the data was collected:

One file contains data for neural network training, while the other is for validation. Each line in the files represents a pattern, in our case, consisting of 42 entries of [timestamp, OHLC], followed by the pattern class [0; 1]. The test data includes 20% of the samples from pattern class 1 and 20% from pattern class 0, ensuring that the frequency of positive pattern occurrence is balanced between the training and test data  

Step One Complete

We’ve found and marked our patterns, added non-pattern data, unified the length of bar sequences in each pattern, and divided the collected data into training and validation datasets. And all of this was accomplished without writing a single line of code or manually manipulating the data.

Now, it’s time to move to the next step—designing and training your neural network using TensorFlow/Keras.

If you like the article, I will be glad to hear your feedback – pavel@pavelchigirev.com