A not-for-profit organization, IEEE is the world’s largest technical professional organization dedicated to advancing technology for the benefit of humanity.© Copyright 2023 IEEE – All rights reserved. Use of this web site signifies your agreement to the terms and conditions. The traditional interpretation of the RSI is that values of 70 or above indicate that a security is becoming overvalued or overbought and may be due for a trend reversal or correction in price. An RSI value of 30 or below indicates an undervalued or oversold scenario.
Another method is to run RFE for each individual stock and calculate the most effective features by voting. Financial ratios of a listed company are used to present the growth ability, earning ability, solvency ability, etc. Each financial ratio consists of a set of technical indices, each time we add a technical index (or feature) will add another column of data into the data matrix and will result in low training efficiency and redundancy. If non-relevant or less relevant features are included in training data, it will also decrease the precision of classification. Because some of the features after RFE are percentage data, while others are very large numbers, i.e., the output from RFE are in different units. Thus, before feeding the data into the PCA algorithm , a feature pre-processing is necessary.
Using Genetic Algorithms to Build Stock Trading Strategies
If it performs the normalization before PCA, both true positive rate and true negative rate are decreasing by approximately 10%. This test also proved that the best feature pre-processing method for our feature https://trading-market.org/a-hybrid-stock-trading-framework-integrating-technical-analysis-with-machine-learning-techniques-2020/ set is exploiting the max–min scale. In the first step, we select all 29 effective features and train the NN model without performing PCA. It creates a baseline of the accuracy and training time for comparison.
The picture below shows all the times that the model bought the share, along with the two SMA lines. As I said earlier, I will only be using the most basic framework for this program. If you want to try a more complex network, I have also constructed a few others. The two SMAs are then connected using np.hstack so that the values are side by side.
Some data, such as the percentage of a certain index fluctuation has been proven to be effective on stock performance. We believe that by extracting new features from data, then combining such features with existed common technical indices will significantly benefit the existing and well-tested prediction models. One of the main weaknesses found in the related works is limited data-preprocessing mechanisms built and used. When they select the features, they list all the features mentioned in previous works and go through the feature selection algorithm then select the best-voted features. These behaviors often need a pre-processing procedure of standard technical indices and investment experience to recognize. The inspiration for the machine learning portion of the research stems from the paper
“Stock Price Prediction uses Neural Network with Hybridized Market Indicators” by
Ayodele, et al.  Sunday published in the Journal of Computing.
We note the feature selection dataset and model testing dataset as DS_test_f and DS_test_m, respectively. The last part of our hybrid feature engineering algorithm is for optimization purposes. For the training data matrix scale reduction, we apply Randomized principal component analysis (PCA) , before we decide the features of the classification model. PCA reduced the dimensions of the input data, while the data pre-processing is mandatory before feeding the data into the LSTM layer.
Our Top Authors
This might not appear to be very promising, however, the original positive rate was 24.7% and the negative rate was 22.5%. This means if we are to randomly say that observation would be a positive label, our probability of being right was 24.7%. Normalize method preserves the relative frequencies of the terms, and transform the technical indices into the range of [0, 1]. Polarize is a well-known method often used by real-world investors, sometimes they prefer to consider if the technical index value is above or below zero, we program some of the features using polarize method and prepare for RFE. Max-min (or min-max)  scaling is a transformation method often used as an alternative to zero mean and unit variance scaling. Another well-known method used is fluctuation percentage, and we transform the technical indices fluctuation percentage into the range of [− 1, 1].
Either trading data for the past 300 days or a list of tickers can be provided, in which case the function will first get the historical data, and then run the models on the same to get the predictions. The table above shows the cross-validated performance of all the models with top-4 models being very close. This means, if the model predicts a certain outcome, it is right on an average of 49.2%.
Both precisions of true positive and true negative have been improved by 7% and 10%, respectively, which proves that our feature extension method design is reasonably effective. In the implementation part, we expanded 20 features into 54 features, while we retain 30 features that are the most effective. The dataset was divided into two different subsets, i.e., training and testing datasets. Test procedure included two parts, one testing dataset is for feature selection, and another one is for model testing.
Forecasting the stock market using LSTM; will it rise tomorrow.
Machine Learning performs better in up markets because it uses momentum to its advantage
by calculating the optimal weights that need to be traded on in the market paired with
the future direction. On the other hand, technical analysis performs much better at
spotting potential drawdowns, especially when using so many different trading
strategies it is apparent some work better than others in down markets. For future
research, we would recommend examining similar methods over a longer timeperiod. Because the down market only had 48 observations, it might have decreased
the usability of the results.
To get started, you’ll need to have a basic understanding of how machine learning works and what types of data it needs in order to make accurate predictions. Once you have this knowledge, you can begin collecting relevant data points such as stock prices, volume levels or news headlines, which will serve as inputs for your machine learning models. We would like to know how the feature selection method benefits the performance of prediction models. From the abundance of the previous works, we can conclude that stock price data embedded with a high level of noise, and there are also correlations between features, which makes the price prediction notoriously difficult. That is also the primary reason for most of the previous works introduced the feature engineering part as an optimization module. In the related works, often a thorough statistical analysis is performed based on a special dataset and conclude new features rather than performing feature selections.
They also applied optimization of feature discretization, as a technique that is similar to dimensionality reduction. The strengths of their work are that they introduced GA to optimize the ANN. First, the amount of input features and processing elements in the hidden layer are 12 and not adjustable. Another limitation is in the learning process of ANN, and the authors only focused on two factors in optimization. While they still believed that GA has great potential for feature discretization optimization. Qiu and Song in  also presented a solution to predict the direction of the Japanese stock market based on an optimized artificial neural network model.
Comparative analysis between the fundamental and technical analysis of stocks
Though we did not see the novelty of this work, we can still conclude that the genetic programming (GP) algorithm is admitted in stock market research domain. To reinforce the validation strengths, it would be good to consider adding GP models into evaluation if the model is predicting a specific price. Jeon et al. in  performed research on millisecond interval-based big dataset by using pattern graph tracking to complete stock price prediction tasks. The dataset they used is a millisecond interval-based big dataset of historical stock data from KOSCOM, from August 2014 to October 2014, 10G–15G capacity.
For model evaluation, we would use the top model i.e ‘cat boost’ classifier. The above is a simplistic back-test assuming no transaction costs, and perfect execution of trades. Python offers a very convenient way of saving function files using the pickle package. The idea would be to fit the model on the data and save this fitted model into pickle files for each cluster. Hence, when predicting for a particular company, we will use the model in the corresponding cluster’s pickle file and make our prediction. A simple way of predicting would be to assume that all the companies would follow the same ML model and create this one global model to predict returns for all companies.
- We decompose the problem into predicting the trend and then the exact number.
- To avoid the problems of over and under fitting; cross
validation is used.
- This is because the classic trading strategy for SMA is the intersection between these two SMA lines.
- AI-driven trading systems can quickly identify patterns in the markets that may indicate potential profit opportunities allowing you to capitalize on them before they disappear.
Kara et al. in  also exploited ANN and SVM in predicting the movement of stock price index. The data set they used covers a time period from January 2, 1997, to December 31, 2007, of the Istanbul Stock Exchange. The primary strength of this work is its detailed record of parameter adjustment procedures. While the weaknesses of this work are that neither the technical indicator nor the model structure has novelty, and the authors did not explain how their model performed better than other models in previous works. They explained how ANN and SVM work with stock market features, also recorded the parameter adjustment.
Fischer and Krauss in  applied long short-term memory (LSTM) on financial market prediction. The dataset they used is S&P 500 index constituents from Thomson Reuters. They obtained all month-end constituent lists for the S&P 500 from Dec 1989 to Sep 2015, then consolidated the lists into a binary matrix to eliminate survivor bias.
Some critics to EMH point to the psychological biases that investors exhibit under uncertainty, leading to irrational and unpredictable behaviors . Nowadays there is no consensus about EMH and the debate is still ongoing. Let us assume that we are currently on 31st December 2018 and have created the model files. At the end of 2nd January, we now have values for all the indicators using which we can predict each stocks movement. Hence, we will put these values in our models and get the probability of 1 (up movement) in next 7 trading days for each stock .