Price Prediction Using ARIMA Model of Monthly Closing Price of Bitcoin

Abstract


INTRODUCTION
Bitcoin is a phenomenal digital currency that shaking economic foundation since its nature are decentralized and powered by its users with no central authority or middleman.Bitcoin's transactions are recorded and monitored by its users using cryptograph network technology or called blockchain.Created in 2008 and used for the first time in 2009 as it launched as open software [1].Nowadays, bitcoin has gathered many interests, not just people, communities, or companies but countries [2].Like a double edge sword, bitcoin's price rising and falling as its user increasing follows the market flow of supply and demand causing a instability and uncertainty.Despite its volatility, bitcoin is still in early phase and one promising digital currency in the future.Therefore, in this study, we try to apply time series analysis method to forecast the future price of bitcoin and trying to follow its price movement.

METHOD 2.1 Nonstationary
Nonstationary is a condition where time series data had no zero mean, constant variance over time, and constant autocorrelation structure over time.Performing stationary test on time series data formally conducted by unit roots test's Dickey-Fuller [3] with uses the null hypothesis H0: data had unit root / time series data nonstationary against alternative hypothesis H1: data had no unit root / time series data is stationary.Ideally, reject H0 as p-value less than significance level alpha = 0.05 and conclude time series data is stationary.
Handling nonstationary data is carried out through differencing process (1 − )  with  as differencing order while transforming time series data helps to reduce non constant variances, most commonly transformation method is Box-Cox Transformation [4].Box-Cox transformation depend on the parameter  and are defined as

𝜆 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
where   is original observations and   is transformed observations.Detail estimation value of parameter  shown in Table 1.

ARIMA Models
Autoregressive Integrated Moving Average (ARIMA) is combination of Autoregressive (AR) model, Moving Average (MA) model, and Integrated (I) as order of differencing process [3].ARIMA model is suitable to handle a nonstationary time series data that has gone through differencing process [5], generally denoted as ARIMA (p, d, q) while p, d, q represents order of AR, I, and MA respectively.

Model Identification
General approach model identification [6] Forecasting process of time series data as follows: 1. Plot the data and identify any unusual observations.Cut off lag q Dies down ARMA(p,q) Dies down after lag (q-p) Dies down after lag (p-q)

Parameter Estimation
Maximum Likelihood Estimation method is used as a parameter estimation as the method used all the information in the time series data.Given set time series observations  1 ,  2 , … ,   , likelihood function  is defined to be function of the , , ,and  2 given the observations.
Without assumption (  ) =  = 0, equation 1 can written as: Then, likelihood function L can be written as: Again, log likelihood function can be written as: 2.5 Diagnostic Checking

Autocorrelation Test of The Residuals
The residuals are ideally independently distributed and exhibit no serial correlation.Ljung-Box Test [3] is common method that used to test the residuals independency with null hypothesis H0: The Residuals are independently distributed against alternative hypothesis H1: The residuals are not independently distributed and exhibit seral correlation.Statistics test is expected to fail to reject H0 within significant level and conclude that the residuals are independently distributed.
The test statistics for the Ljung-Box Test as follows: where,  is number of sample size,   is sample autocorrelation at lag  ( = 1, … , ℎ).

Normality of The Residuals
The residuals also need to be normally distributed.One of many methods are by creating a Histogram.A histogram counts the number of observations between some ranges.To not violate the normality assumption, the histogram should be centered around zero and should show a bell-shaped curve.A high frequency at the extremes of the histogram could indicate that the residuals are not normally distributed.

Model Selection
Given a collection of models, Akaike's Information Criterion (AIC) estimates the quality of each model, relative to each of the other models.Thus, AIC provides a means for model selection.AIC estimates the relative amount of information lost by a given model, the less information a model loses, the higher the quality of that model.In estimating the amount of information lost by a model, AIC deals with the trade-off between the goodness of fit of the model and the simplicity of the model.
General notation of AIC written as [7]: where  is the likelihood of the data, and  is the number of predictors in the model,  = 1 if  ≠ 0 and  = 0 if  = 0.

Performance Indicators
Measuring the performance of the models carried out by applying few methods.
where   is observed value at time t and   is forecasted value at time t, and n is number observations.

RESULT AND DISCUSSION
On this study, we used bitcoin price data on monthly basis and was obtained from the website https://coinmarketcap.com,start from 01 January 2014 until 31 May 2022.At the end of October 2021, the bitcoin price closed at a peak price of $61,318.96while in the last 2 years, the lowest bitcoin price was recorded at the end of March 2020 at $6,438.64.

Data Stationarity
Figure 2 clearly shows transformed data had upward trend, meaning data has non-constant mean.Supported by the result of the Augmented Dickey-Fuller (ADF) test conducted on the Training data, which obtained p-value 0.8684.Since p-value > alpha 0.05 indicates that we failed to reject null hypothesis, we could say that our training data is not stationary.Differencing process was applied one time to achieve constant mean time series, as shown in Figure 3. ADF re-test was carried out to obtained p-value = 0.02411.Since p-value lower than alpha 0.05, we could finally be safe to reject null hypothesis.

ARIMA model identification
Figure 4 shows Both ACF and PACF plot did not show any pattern resemble geometric decay or sharp drop.Ljung-Box then applied to check whether our differenced data is white-noise as null hypothesis or not white noise as alternative hypothesis.The test results p-value = 0.008118 which is larger than alpha 0.05, that conclude our differenced data is not a white noise.
To identify the model candidates, "auto.arima()"was used, with arguments: stepwise = False, approximation = False, seasonal = False, allowdrift = False, and AIC value is used to measure the best candidates.The result shown in Table 3.The white-noise test was carried out on model candidate's residue using the "Box.test()"syntax.The result p-value were shown in Table 4 indicates that all model candidate's residue were white-noise.Figure 5 display one of residual plot from model candidates ARIMA (1,1,4), the residual plot from the model shows that the variation of residuals stays much the same across the historical data, therefore the residual can be treated as constant, while ACF plot show there is no significant correlations in the residuals series.The histogram suggests that the residual may be normal with a little heavy on the right side.Consequently, forecast from the model candidates will be quite good.

Forecasting
Top 2 model candidates from Table 3 are used to make prediction for the next 5 month.Figure 6 visualize the forecast against data Test on blue line, we can see that the test value has uptrend in first 3 months and continue to fall in the next 2 months with overall data Test in a downtrend.Both candidates did also have downtrend but with different approach, ARIMA (1,1,0) has slow gradually decreasing, while ARIMA (1,1,4) has a decreasing zig-zag pattern.For better evaluating forecast accuracy, Table 7

CONCLUSION
Model ARIMA (1,1,0) clearly win the competition between candidate models, with result of accuracy tests closest to 0. While the ARIMA (1,1,0) has the same downtrend with the data Test, in Figure 6 we can see clearly that how bad the model's performance to follow price movement.If we track the problem, time series data has nonconstant variance and non-constant mean in latest series compared to early series.Recall Figure 1, it is noticeable prior 2017 the price movements are not as dynamic as from 2017 onwards, therefore this data maybe not accurate representation of how Bitcoin currently behaves.
As a conclusion, ARIMA (1,1,0) is the best model by the result of accuracy test compared to the other model, but clearly cannot be used as main tool to predict the bitcoin price due to its poor performance to predict price movement.Further study is needed to overcome the characteristic of Bitcoin Price time series data.

Table 2 Table 1 .
 Estimations on Box-Cox Transformation Try your chosen model(s) and use the AIC to search for a better model.6. Check the residuals from your chosen model by plotting the ACF of the residuals and doing a portmanteau test of the residuals.If they do not look like white noise, try a modified model.7. Once the residuals look like white noise, calculate forecasts.Identification process of ACF and PACF could follow general pattern criteria as shown in

Table 2 .
General Pattern Criteria ACF and PACF Such as, Mean Error (ME), Root Mean Squared Error (RSME), Mean Absolute Error (MAE), Mean Percentage Error (MPE), and Mean Absolute Percentage Error (MAPE).Ideally, the best accuracy shown as the value of closer to zero.The equations are as follows:

Table 3 .
Model Candidates with The Lowest AIC Value

Table 7 .
Result of Forecast Accuracy Test