Bitcoin Price Prediction: 2024 Predictions with R and Machine Learning

Hello everyone, today we're going to talk about How to analyze Bitcoin price forecasts with R and machine learningThe price of Bitcoin is affected by many different factors in the market.

비트코인 가격 전망 포스트 - 비트코인 상승 그림

To analyze this A common way to work with time series data is to use the ARIMA Using modelsBelow, we'll walk through the steps from data preparation to visualization to model evaluation.

Data preparation and preprocessing

The first step is to get and clean the data you need for your analysis. Yahoo Financeto get daily price data for Bitcoin and convert it to time series data.

# Install and load the required packages
install.packages(c("forecast", "ggplot2", "quantmod"))
library(forecast) Load the # ARIMA modeling package
library(ggplot2) # Package for data visualization
library(quantmod) # Package for financial data collection

Get # Bitcoin data
getSymbols("BTC-USD", src = "yahoo") # Collect data from Yahoo Finance
btc_price <- Cl(get("BTC-USD"))       Extract only # closing price data

Code commentary:

  1. install.packages(c("forecast", "ggplot2", "quantmod")): Install the required packages. forecast is an ARIMA model, ggplot2is used for visualization, and quantmod is used to collect financial data.
  2. library(forecast): Load the forecast package.
  3. getSymbols("BTC-USD", src = "yahoo"): Get bitcoin data from Yahoo Finance.
  4. Cl(get("BTC-USD")): Extract only the Closing Price from the imported data and store it in the btc_price variable.

Implementing the ARIMA Model

Now fit the data into an ARIMA model to forecast the future price of Bitcoin.

Convert to # time series data
btc_ts <- ts(btc_price, frequency = 365) # Convert to daily data

Apply # ARIMA model
model_fit <- auto.arima(btc_ts, seasonal = TRUE) # Automatically select best model

# 30-day price forecast
forecast_result <- forecast(model_fit, h = 30) # Perform a forecast for the next 30 days

Code commentary:

  1. ts(btc_price, frequency = 365): Converts closing price data to daily time series data. The frequency = 365 indicates an annual cycle.
  2. auto.arima(btc_ts, seasonal = TRUE): Set the ARIMA model automatically to select the best parameters.
  3. forecast(model_fit, h = 30): Predict the price of Bitcoin over 30 days using the model fitted by model_fit.

Visualize prediction results

We visualize the forecast results to give you an at-a-glance view of the Bitcoin price outlook. The right end of the graph below visualizes the forecast results in terms of up and down movement.

# Visualize results with ggplot2
autoplot(forecast_result) +
  ggtitle("Bitcoin Price Forecast") +
  xlab("Date") +
  ylab("Price (USD)") +
  theme_minimal()

Code commentary:

  1. autoplot(forecast_result): Automatically visualize the forecast result.
  2. ggtitle(): Set the title of the graph.
  3. xlab() / ylab(): Set the labels for the x-axis and y-axis, respectively.
  4. theme_minimal(): Simplify the theme of the graph to make it cleaner.
비트코인 가격 전망 시계열 그림
(Bitcoin price outlook visualization graph)

Evaluate model performance

To evaluate the performance of the ARIMA model for Bitcoin price forecasting, we check the accuracy metrics and analyze the residuals.

Evaluating # Accuracy
accuracy(forecast_result) Check # forecast accuracy

# Perform residual analysis
checkresiduals(model_fit) # Verify model fit

Code commentary:

  1. accuracy(forecast_result): Evaluates the accuracy of the forecast result, providing various metrics such as RMSE, MAE, etc.
  2. checkresiduals(model_fit): Performs a residual analysis to validate that the model has fit the data well. Useful for verifying that patterns in time series data are well modeled.
정확도 평가 확인 콘솔화면
(Accuracy evaluation console screen)

Below is a more detailed explanation of the forecast accuracy data for the Bitcoin price prediction model shown above.

accuracy(forecast_result) Let's interpret the output of the function. The function provides various metrics to evaluate the performance of the prediction model. Below is a description of each metric and an interpretation of its value:

1. ME (Mean Error)

  • Value: 0.03430013
  • MeaningA metric that averages the difference between the predicted value and the actual value. A value closer to 0 indicates no prediction bias.
  • InterpretationSince the value of : is close to zero, there doesn't appear to be much bias in the prediction.

2. Root Mean Square Error (RMSE)

  • Value: 915.6077
  • MeaningThe error between the predicted value and the actual value, squared, averaged, and converted to a square root. It measures the size of the error, and the lower the value, the more accurate the prediction.
  • InterpretationThe RMSE value is relatively high, which suggests that there is significant variability between the predicted and actual values.

3. MAE (Mean Absolute Error)

  • Value: 449.3615
  • MeaningThe mean of the absolute error between the predicted and actual values. MAE is less sensitive to outliers than RMSE.
  • InterpretationThe :MAE is 449.36, which indicates a lower value than the RMSE. This means that outliers exist, but overall, the average of the errors is relatively low.

4. MPE (Mean Percentage Error)

  • Value: -1.857804
  • Meaning: The average of the relative error between the predicted and actual values, expressed as a percentage. A negative value indicates that the predicted value was estimated lower than the actual value.
  • Interpretation: about -1.86%which tends to estimate the predicted value slightly lower than the actual value.

5. MAPE (Mean Absolute Percentage Error)

  • Value: 3.660335
  • MeaningThe absolute value of the relative error between the predicted value and the actual value, averaged as a percentage. In general, a value of 10% or less is considered a very good prediction.
  • Interpretation: if the MAPE is 3.66%which shows that the model is making accurate predictions overall.

6. MASE (Mean Absolute Scaled Error)

  • Value: 0.03490917
  • MeaningThe MAE of the model divided by the MAE of the baseline model (e.g., a simple naive model). A value less than 1 indicates that the model performs better than the baseline model.
  • Interpretation: if MASE is 0.035which is very low, indicating that the model is performing much better than the baseline model.

7. ACF1 (First-order Autocorrelation of Residuals)

  • Value: 1.077496e-05 (i.e., 0.00001077)
  • MeaningIndicates the autocorrelation coefficient in the first-order lags of the residuals; values closer to zero indicate that the residuals are independent (white noise).
  • Interpretation: The ACF1 value is near zero, indicating that the residuals are independent and the ARIMA model did a good job explaining the patterns in the data.

Synthetic interpretation

  • Pros:
    • The ME and ACF1 values are near zero, indicating that the predictions are unbiased and the independence of the residuals is well maintained.
    • It has a very low MAPE of 3.66%, which means it has a high prediction accuracy.
    • MASE performs very well against the baseline model.
  • Limitations:
    • The RMSE value is on the high side, indicating a possible impact from outliers.
    • There can be a large variance between data bins with high and low prediction performance.

While the overall performance of the Bitcoin price forecast ARIMA model is rated as good, it may be desirable to improve the RMSE by more closely analyzing cases of high volatility in the data or by removing/complementing outliers.

(Model fit validation analysis graph)

Here's a more detailed explanation of the Bitcoin price forecast model goodness-of-fit analysis graph shown above.

The attached figure shows the results of the residual analysis of the ARIMA(0,1,2) model. The interpretation of each panel is as follows

1. top panel: Residuals over time

  • While the residuals are fluctuating around zero over time, we can see that the variance is increasing, which is consistent with the Heterogeneity(heteroskedasticity), which violates the assumption of the ARIMA model (variance is constant).
  • While it's positive that we don't see a clear pattern, we need to address the increased variance to get a better model fit.

2. bottom left panel: Autocorrelation function (ACF) of residuals

  • The ACF graph shows the autocorrelation of the residuals.
  • Ideally, all points should be within the blue confidence interval, showing that the residuals are pure white noise.
  • However, in this graph, we see that some of the spikes are beyond the confidence interval, indicating that there is still autocorrelation in the residuals, suggesting that the model may not have fully accounted for the temporal structure of the data.

3. bottom right panel: Histogram of residuals

  • The histogram shows the distribution of the residuals. There's a sharp peak in the center, but at either end, the Extreme values (outliers) is visible.
  • This indicates that the residuals do not follow a normal distribution, which violates another assumption of the ARIMA model (normality of residuals).
  • These outliers can have a negative impact on model performance.

Implications of ARIMA model results

Interpreting ARIMA model results: The forecast results show the overall trend of the Bitcoin price outlook. If the model's fit and residuals are stable, the forecast can be considered reliable. As a caveat, external factors (market news, economic changes, etc.) are not reflected in the model and should only be used as a guide.

Finalize

In this post, we used an ARIMA model to analyze and visualize the Bitcoin price outlook. Data-driven analysis using R and machine learning can be a powerful tool for forecasting future prices, but remember that there are many more factors to consider when making an actual investment.

#Glossary

1. Autoregressive Integrated Moving Average (ARIMA)
Definition: a statistical model used to analyze time series data and predict future values.
Components:
Autoregression (AR): Values in past data influence current values.
I (Iteration): An operation used to transform data into a stable form.
MA (Moving Average): Utilizes past forecast errors to explain current values.
Use case: Often used to forecast chronologically observed data such as stock prices, bitcoin prices, temperature changes, etc.


2. Time Series Data
Definition: data collected in chronological order.
Characteristics: Patterns (trends, seasonality, etc.) in which data varies over time.
Examples: Bitcoin's daily price, monthly sales, annual temperature changes.
Purpose: To analyze the characteristics of data over time and predict the future.

3. Root Mean Square Error (RMSE)
Definition: A metric that measures the difference between a predicted value and an actual value.
How it's calculated: The square root of the root mean square of the prediction error (actual value - predicted value).
What it means: The smaller the RMSE value, the better the predictive model fits the actual data.
Use cases: Used to evaluate the predictive accuracy of a model.

4. Mean Absolute Error (MAE)
Definition: The average of the absolute differences between the predicted and actual values.
Difference: RMSE squares the error and is therefore sensitive to large errors, whereas MAE treats all errors the same.
What it means: A useful metric for understanding how accurately a prediction model is working.

5. Residual Analysis
Definition: The process of analyzing the differences (residuals) between model predictions and actual values.
Purpose: To see how well the model explains the data, and to check for patterns.
How to analyze:
The mean of the residuals should be zero.
The residuals should be randomly distributed with no specific pattern.
Use cases: You can use residual analysis to evaluate the fit of your model and find areas for improvement.

6. Visualization
Definition: A method of graphically representing data to help people intuitively understand complex information.
Example:
Time series graphs: visualize changes in data over time.
Scatterplot: Shows the relationship between two variables.
Purpose: To easily identify and communicate patterns, trends, and correlations in data.

7. auto.arima()
Definition: A function in R that automatically selects the parameters (P, D, Q) of an ARIMA model.
Benefits: Automatically sets the optimal model without you having to set the parameters yourself.
Use case: When you want to analyze time series data quickly and easily.

8. forecast()
Definition: R function that predicts future values based on an ARIMA model.
Usage: Generate n future data by specifying the forecast period (h = n).
Result: Returns a result with the predicted value and confidence interval.
Applications: Bitcoin price, stock price, demand forecasting, etc.

9. autoplot()
Definition: A function that makes it easy to create visualizations in R.
Features: Automatic representation of data in the form of a graph.
Example: autoplot(forecast_result) displays forecast results as a line graph.
Pros: Intuitive and simple coding to implement visualizations.

10. Seasonality
Definition: The presence of periodically recurring patterns in data.
Example: Bitcoin price may repeat a pattern during certain periods (e.g., weekends, certain months).
Analytic purpose: To incorporate these patterns into models to improve forecast accuracy.

Similar Posts