Table of Contents

Part I - Introduction to Stock Market Prediction
1 Task 1 - Research Overview
Part II - Stock Price Forecasting and Model Evaluation
2 Task 2 - Statistical Analysis and Data Visualization
3 Task 3 - Stock Market Prediction Using Machine Learning

Pages: 13 Words: 3261

Part I - Introduction to Stock Market Prediction

1 Task 1 - Research Overview

Introduction

The development of a machine learning-based system for forecasting stock prices on the financial market is the main goal of my ad hoc “Master of Research” thesis project. Making informed investing decisions requires being able to predict stock prices effectively in the complicated and dynamic environment of the stock market. Although many of the conventional statistical approaches have been employed to forecast stock prices, they frequently fall short of capturing the non-linear correlations between share prices and the variables that affect them. In order to accurately predict stock prices, it is necessary to create increasingly advanced machine learning models. The software platform d “R-Studio” is utilized to fulfil all of the necessary objectives on the whole.

Grab The Best Academic Assistance In Just One Click

UK Assignment Help Order AI-FREE Content

Aim

The sole aim of this research is to thoroughly demonstrate the usage of machine learning tools as well as data science to predict the direction of stock price along with the market behaviour.

Objectives

To determine the prime variables which mostly affects the share prices within the stock market
To construct machine learning models for the anticipation of the stock prices upon the concerned financial market with a certain level of accuracy
To determine and implement the most useful machine learning models to forecast the prices of stocks upon the share market
To determine ways to enhance the machine learning algorithm’s dependability and performance

Research Questions

What are the main variables that have the biggest effect on the price of stocks in the share market?
Can machine learning systems anticipate stock prices on the financial market with any degree of accuracy?
What machine learning methods are most useful for forecasting stock prices on the share market?
How might the machine learning-based algorithm's performance and dependability be improved?

A mixed-methods strategy is used to perform the study, combining both “qualitative” and “quantitative” techniques. The “quantitative” approach essentially employs the machine learning techniques to create the stock price prediction model for that matter. The study's dataset pertains to the financial sector database and includes data on the share prices, financial metrics, news, as well as sentiment on social media respectively (Brückbauer, 2020). To learn about the best machine learning strategies for predicting stock values, the qualitative approach includes the interviewing of the financial professionals along with the investors.

save up to

35%

On Each Order!

Place order now

Get Extra 10% OFF on WhatsApp Offer use my discount

Expected Outcome

The fetching of the most significant variables that influence share prices inside the financial market as well as developing a machine learning-based system that can predict stock prices effectively are the anticipated outcomes of this project. The study's findings will also shed light on the best machine learning methods for forecasting stock values as well as ways to improve the algorithm's efficiency and dependability (Carnes, 2020). Overall, this particular study has that capability to significantly advance the concerned domain of stock market forecasting and offer insightful information to both researchers and practitioners of finance.

2. Review of ML Models and Data Science applications

In these years, there is a significant increase in the interest in the utilization of the machine learning and data science approaches to finance research. These techniques have been the subject of numerous research looking at risk management, stock market patterns, and price prediction.

A proper study by the utilization of the deep learning methods to properly forecast the stock prices is one example of this type of investigation. For the purpose of predicting the future price of Apple Inc.'s shares, they utilized a “LSTM” neural network. (Fodor, 2020) As a result of the study's discovery that the “LSTM” model beat conventional linear models, deep learning's potential for use in finance research was made clear.

Another study has thoroughly investigated the underlying connection between the share prices and the sentiment of the investor. In this case, the “SVM” model is taken up for the sole objective of performing the classification of the articles of news as either negative or positive in respect of the factor of sentiment.

In another study the respective models of machine learning have been implemented for analysing the effects of the stock liquidity upon the returns. Here, the model d “Random Forest” is employed for the investigation of the link between the “liquidity” and the “returns” for the stocks listed within the stock market of China. This particular study has also discovered that the liquidity has a considerable impact upon the stock returns in this regard. Furthermore, the “RF” model has outperformed the conventional regression models while predicting the returns.

Lastly, the study which has thoroughly examined the effect of the news sentiment upon the share market’s volatility has been taken into account. The “BSTS” model is taken into consideration in this case to thoroughly analyse the underlying relationship between the share market’s volatility and the sentiment of the news. The “BSTS” model has efficiently rendered better prediction in comparison to the conventional “Time series” models.

use my discount

All of the above-mentioned studies have demonstrated the capacity of machine learning models and data science in the research regarding finances (Inckle, 2020). They have rendered precise sets of predictions along with insights into the behaviour of the financial market.

3. Prediction of Stock Price direction and movement

The first example is the research which has utilized “SVM” for the prediction of the stock market’s direction in particular. This study has the objective of predicting the price direction pertaining to the “KOSPI” by using the aforementioned model. The implementation of technical analysis “indicators” has taken place in the form of the input features. These are stochastic oscillators, relative “strength” index, and the moving averages. The obtained empirical outcome suggests that this model outperforms the conventional “statistical” models along with achieving a “56%” accuracy.

The second one is “Deep Learning” with the “LSTM” networks to predict the nature of the financial markets. A “Deep Learning” model is considered to predict the direction of the “Nikkei 225” index. This particular model has been trained upon both the fundamental data and the technical indicators, like dividend yield and earnings (Kaufmann, 2020). The obtained ‘empirical” results have indicated that the concerned model has outdone the customary statistical models. Here, the prediction accuracy has turned out to be more than “58%’ for that matter.

4. Life Cycle of Data Science Project

Essentially, six major stages make up the “life cycle” for a data science project in particular. These are company understanding, input understanding, gathering of the data, modelling, evaluation, and deployment respectively. This “life cycle” can be utilized within the financial services industry for a number of different functions, which includes handling of the risks, identification of the fraudulent activity, and assessment of the investments.

In the first place, the study that is taken into account as an example has used the data science process of projects for the creation of a model to evaluate the risks associated with the credit (Niloy, 2022). The concerned problem, objective of the project, and the scope are all defined at the “business understanding” stage. In the process of understanding the platform, the investigators have obtained information from various places which includes consumer reporting agencies along with the financial institutions. They subsequently employed the data visualization and data exploration techniques to find the associations as well as the trends within the data.

Part II - Stock Price Forecasting and Model Evaluation

2 Task 2 - Statistical Analysis and Data Visualization

Figure 1: Calculating the descriptive Statistics for closing prices

The image in the section above showcases the “descriptive statistics” for the closing prices in particular. The other factors such as skewness and kurtosis have also been determined for that matter (Oluwagbemiga, 2021). This programme computes and outputs descriptive statistics for logarithmic returns and closing prices. The statistics include “skewness, kurtosis, and normality assessment p-values, as well as summary statistics. The R tools summary, skewness, kurtosis, and Shapiro. Test” are used in the code.

Figure 2: Output of the descriptive Statistics for closing prices

The descriptive statistics for the closing prices have been determined in the platform of software known as “R-Studio” in this case. The parameters whose values have been obtained are min., median, and max. respectively. The values of “skewness” and “kurtosis” are “0.039”, and “2.191” respectively. Moreover, the value of the normality test has been determined to be “0.00285”. Moreover, the deceptive statistics for the logarithmic returns have also been determined in this regard (Schoenfeld, 2020). The same factors like the previous one has been obtained in terms of their values. These are skewness, normality test, and kurtosis wherein the values are “1.91”, “17.64”, and “4.213223e-13” respectively.

3 Task 3 - Stock Market Prediction Using Machine Learning

(a)

Input Code and the respective output for the 10 days average

Figure 3: Input Code and the respective output for the 10 days average

The 10-day simple moving average of the Closing prices for the SHL AX data is calculated using this code, and the results are saved inside the "sma" variable. The price as well as the length of time to be averaged are the two arguments for the SMA function. The print () function is used to print the outcome numbers.

(b)

Figure 4: Input Code of Log Return

The following figure shows the input code of Log Return and this is created using the diff function; here the print function is used to display the log returns.

Figure 5: Output of Log returns

The “log returns” have been printed and showcased by way of the image above (Sousa et al. 2020). These values are obtained with the help of the software platform called “R-Studio” in particular.

(c)

Figure 6: Input and Output of Log returns

This code determines a stock's basic returns using its closing price every day. It computes the differences between successive closing prices using the diff function and divides them by the closing price from the previous day (Truman, 2021). The print function is then used to print the values, which indicate the stock's daily simple returns, to the console.

(d)

Figure 7: Input and Output of Log returns

The ggplot2 library in R is being used in this code to generate a boxplot visualization. The 'Direction' column is mapped to the x-axis as well as the 'EMA' field to the y-axis using the data from the merged df data frame. The boxplot layer is added to the plot by the function geom boxplot (). The plot's title, x-axis label, and y-axis label are all set by the labs () function (Ukpai, 2019). The connection between the exponentially increasing average and the direction of a stock's price is generally visualized using this code.

(e)

Figure 8: Input and Output of Momentum by price direction

Using the help of R's ggplot2 library, this code generates a box plot display. The merged df data frame, which has the two variables Direction and Momentum, contains the data utilized for the visualization. The variables which are displayed on the y and x axes are specified by the aes () function. The actual box plot is produced using the geom boxplot () function, with boxes for every stage of a Direction variable. The plot's title and labels are set by the labs () method. The dispersion of the Velocity factor for every degree of the Price Directions variable is displayed in the resultant plot.

Figure 9: Histogram Plot on Closing Prices

The histogram plot is obtained in respect of the closing prices. This plot is utilized for the purpose of visualizing the entire distribution of the “numerical” data for that matter (Xiao, 2022). The parameters of “closing price” are taken along the “x-axis” while the “frequency” is taken along the “y-axis” in general.

(a)

Figure 10: Time Series plot Created over time

The image in this section showcases the “time series” plot that has been created over the parameter d “time” within the software platform of “R-Studio”.

Figure 11: Box plot by moving average direction

This picture represents the box plot obtained upon the two respective parameters (Zhang, 2023). These are “simple moving average”, and “price direction” respectively.

Figure 12: Histogram visualization for the high attribute

This image displays the “Histogram” plot for nothing but the “high attribute” for that matter. This plot is obtained from “R-Studio” in particular.

Figure 13: Scatter Plot visualization for the high attribute

The scatter plot has been visualized in respect of the “high attribute” in this regard. The “open”, and “close” elements are taken along the two respective axes.

Figure 14: Reducing Complexity of the dataset

The complexity of the data has been reduced and is displayed through the attached picture. The new data which is obtained afterwards is shown in this regard.

References

Brückbauer, F., 2020. Do financial market experts know their theory? New evidence from survey data. New Evidence From Survey Data, pp.20-092.

Carnes, C., 2020. Assessing the Financial Literacy of Texas Christian University Students.

Fodor, K., 2020. Using multivariate statistical methods for analysing financial literacy, as a possible appearance of social innovation. Theory, Methodology, Practice-Review of Business and Management, 16(01), pp.11-18.

Inckle, K., 2020. Poetry in motion: Qualitative analysis, I-poems and disabled cyclists. Methodological Innovations, 13(2), p.2059799120924980.

Kaufmann, M., 2020. Vocations, visions and vitalities of data analysis. An introduction. Information, Communication & Society, 23(14), pp.1981-1995.

Niloy, R.K., 2022. Financial Performance Analysis of Ranks Motors Limited.

Oluwagbemiga, O.E., 2021. The influence of IFRS adoption on the quality of financial reporting in Nigerian listed companies. In Advances in pacific basin business, economics and finance (Vol. 9, pp. 137-160). Emerald Publishing Limited.

Schoenfeld, J., 2020. The invisible risk: Pandemics and the financial markets. Tuck School of Business Working Paper, (3567249).

Sousa, K.M.D., Pinhanez, M.D.M.S.F., Monte, P.A.D. and Diniz, J.A., 2020. Salary, financial autonomy and efficiency of healthcare systems in local governments. Applied economics letters, 27(2), pp.122-126.

Truman, E., 2021. Central Banks and the Global Financial Safety Net. Building Back a Better Global Financial Safety Net, Boston Universiry: Global Development Policy Center, pp.23-33.

Ukpai, U.I., 2019. A critical realist approach to understanding the human resource management practices-organizational financial performance link: evidence from Nigeria's petroleum sector. Lancaster University (United Kingdom).

Xiao, Z., 2022. The Relation of The Shadow Banking System on The Financial System in China (Doctoral dissertation).

Zhang, Y., 2023. Using Google Trends to track the global interest in International Financial Reporting Standards: Evidence from big data. Intelligent Systems in Accounting, Finance and Management.