Predicting ETH/USD Trade Signals with Random Forest

Apr 18, 2023

9 min read


programmer-with-chrome

In this article, we’ll demonstrate how to create a machine learning model for predicting buy and sell signals in the ETH/USD market using a Random Forest algorithm. We’ll walk through the entire process, from importing libraries and data, to preprocessing, feature engineering, model training, hyperparameter tuning, and evaluation. Finally, we’ll test our model on live data and backtest the predictions using vectorbt to gauge its potential profitability.


By following this tutorial, you’ll gain a deeper understanding of the application of machine learning techniques in the field of cryptocurrency trading and learn how to leverage the power of Random Forest for predicting market movements. Whether you’re an experienced trader or a machine learning enthusiast, this article will provide valuable insights and practical guidance to help you build your own trading strategies.


Importing Libraries and Data


First, let’s import the necessary libraries and set up the API connection.


import pandas as pd
import pandas_ta as ta
import numpy as np
import vectorbt as vbt
import yfinance as yf
from datetime import datetime
import matplotlib.pyplot as plt
import pydot
import vectorbt as vbt
import ccxt

initial ccxt and symbol


exchange = ccxt.binance()
symbol = 'ETH/USDT'
timeframe = '4h'

Fetching Historical Data


We’ll fetch historical OHLCV data from the Binance exchange starting from 2015.


# get OHLCV 4h.  focus on bear market 2018 , 2022 i want to create some stretegies to protect my wealth when bear market coming 
from_ts = exchange.parse8601('2015-11-01 00:00:00')
ohlcv_list = []
ohlcv = exchange.fetch_ohlcv(symbol, timeframe, since=from_ts, limit=1000)
ohlcv_list.append(ohlcv)
while True:
    from_ts = ohlcv[-1][0]
    new_ohlcv = exchange.fetch_ohlcv(symbol, timeframe, since=from_ts, limit=1000)
    ohlcv.extend(new_ohlcv)
    if len(new_ohlcv)!=1000:
    	break

ohlcv_list
    [[[1502942400000, 301.13, 307.96, 298.0, 307.96, 1561.95305],
      [1502956800000, 307.95, 312.0, 307.0, 308.95, 1177.71088],
      [1502971200000, 308.95, 310.51, 303.56, 307.06, 1882.05267],
      ...
      [1517284800000, 1137.96, 1178.92, 1130.08, 1174.96, 14730.37261],
      [1517299200000, 1174.05, 1186.85, 1156.0, 1169.89, 12918.61121],
      [1517313600000, 1167.16, 1175.0, 1101.0, 1112.09, 26607.01368],
      [1517328000000, 1112.5, 1133.01, 1051.0, 1112.11, 51675.82377],
      ...]]

Next, we’ll create a DataFrame with the fetched data.


ohlcv_list = [[item[0], item[1], item[2], item[3], item[4], item[5]] for item in ohlcv_list[0]]

# create a DataFrame from the  list OHLCV from  2021-11-01 to 2023-03-16
df = pd.DataFrame(ohlcv_list, columns=['timestamp', 'open', 'high', 'low', 'close', 'volume'])

df

we will see dataframe like this


timestampopenhighlowclosevolume
01502942400000301.13307.96298.00307.961561.95305
11502956800000307.95312.00307.00308.951177.71088
21502971200000308.95310.51303.56307.061882.05267
31502985600000307.74312.18298.21301.601208.05192
41503000000000301.60310.85299.01302.001200.94182
1222016789680000001660.371666.731635.961663.06129192.79690
1222116789824000001663.061691.421652.881679.64113369.52090
1222216789968000001679.641681.781652.531673.7376922.01860
1222316790112000001673.731725.001662.651708.62166034.43070
1222416790256000001708.621730.991696.371715.56103495.68750

12225 rows × 6 columns


Preparing the Data


Now, we’ll prepare the data by adding timestamps and various technical indicators.


Add timestamp and set index

df_action=df.copy()
df_action['timestamp'] = pd.to_datetime(df['timestamp'], unit='ms')
df_action.reset_index(inplace=True)
df_action.set_index(df_action['timestamp'],inplace=True)
df_action

indextimestampopenhighlowclosevolume
timestamp
2017-08-17 04:00:0002017-08-17 04:00:00301.13307.96298.00307.961561.95305
2017-08-17 08:00:0012017-08-17 08:00:00307.95312.00307.00308.951177.71088
2017-08-17 12:00:0022017-08-17 12:00:00308.95310.51303.56307.061882.05267
2017-08-17 16:00:0032017-08-17 16:00:00307.74312.18298.21301.601208.05192
2017-08-17 20:00:0042017-08-17 20:00:00301.60310.85299.01302.001200.94182
2023-03-16 12:00:00122202023-03-16 12:00:001660.371666.731635.961663.06129192.79690
2023-03-16 16:00:00122212023-03-16 16:00:001663.061691.421652.881679.64113369.52090
2023-03-16 20:00:00122222023-03-16 20:00:001679.641681.781652.531673.7376922.01860
2023-03-17 00:00:00122232023-03-17 00:00:001673.731725.001662.651708.62166034.43070
2023-03-17 04:00:00122242023-03-17 04:00:001708.621730.991696.371715.56103495.68750

12225 rows × 7 columns


add MACD indicator


df_action.ta.macd(append=True)
df_action

indextimestampopenhighlowclosevolumeMACD_12_26_9MACDh_12_26_9MACDs_12_26_9
timestamp
2017-08-17 04:00:0002017-08-17 04:00:00301.13307.96298.00307.961561.95305NaNNaNNaN
2017-08-17 08:00:0012017-08-17 08:00:00307.95312.00307.00308.951177.71088NaNNaNNaN
2017-08-17 12:00:0022017-08-17 12:00:00308.95310.51303.56307.061882.05267NaNNaNNaN
2017-08-17 16:00:0032017-08-17 16:00:00307.74312.18298.21301.601208.05192NaNNaNNaN
2017-08-17 20:00:0042017-08-17 20:00:00301.60310.85299.01302.001200.94182NaNNaNNaN
2023-03-16 12:00:00122202023-03-16 12:00:001660.371666.731635.961663.06129192.7969026.487664-9.72914736.216810
2023-03-16 16:00:00122212023-03-16 16:00:001663.061691.421652.881679.64113369.5209026.074843-8.11357434.188417
2023-03-16 20:00:00122222023-03-16 20:00:001679.641681.781652.531673.7376922.0186024.982806-7.36448832.347295
2023-03-17 00:00:00122232023-03-17 00:00:001673.731725.001662.651708.62166034.4307026.625765-4.57722431.202989
2023-03-17 04:00:00122242023-03-17 04:00:001708.621730.991696.371715.56103495.6875028.163175-2.43185130.595026

12225 rows × 10 columns


create MACD trend when MACD line > MACD signals line


df_action['macd_trend']= df_action.MACD_12_26_9 > df_action.MACDs_12_26_9

Add RSI indicator


df_action.ta.rsi(append=True)

 timestamp
    2017-08-17 04:00:00          NaN
    2017-08-17 08:00:00          NaN
    2017-08-17 12:00:00          NaN
    2017-08-17 16:00:00          NaN
    2017-08-17 20:00:00          NaN
                             ...    
    2023-03-16 12:00:00    56.716544
    2023-03-16 16:00:00    59.590967
    2023-03-16 20:00:00    58.109595
    2023-03-17 00:00:00    63.826631
    2023-03-17 04:00:00    64.854113
    Name: RSI_14, Length: 12225, dtype: float64

Add RSI labels to dataframe


RSI < 50 = -1 and other is 1


df_action.loc[df_action['RSI_14']>75,'overbought']= 1
df_action.loc[df_action['RSI_14']<30,'oversold']= 1
df_action.loc[df_action['RSI_14']>50,'RSI_trend']= 1
df_action.loc[df_action['RSI_14']<50,'RSI_trend']=-1

preview dataframe


df_action

indextimestampopenhighlowclosevolumeMACD_12_26_9MACDh_12_26_9MACDs_12_26_9macd_trendRSI_14overboughtoversoldRSI_trend
timestamp
2017-08-17 04:00:0002017-08-17 04:00:00301.13307.96298.00307.961561.95305NaNNaNNaNFalseNaNNaNNaNNaN
2017-08-17 08:00:0012017-08-17 08:00:00307.95312.00307.00308.951177.71088NaNNaNNaNFalseNaNNaNNaNNaN
2017-08-17 12:00:0022017-08-17 12:00:00308.95310.51303.56307.061882.05267NaNNaNNaNFalseNaNNaNNaNNaN
2017-08-17 16:00:0032017-08-17 16:00:00307.74312.18298.21301.601208.05192NaNNaNNaNFalseNaNNaNNaNNaN
2017-08-17 20:00:0042017-08-17 20:00:00301.60310.85299.01302.001200.94182NaNNaNNaNFalseNaNNaNNaNNaN
2023-03-16 12:00:00122202023-03-16 12:00:001660.371666.731635.961663.06129192.7969026.487664-9.72914736.216810False56.716544NaNNaN1.0
2023-03-16 16:00:00122212023-03-16 16:00:001663.061691.421652.881679.64113369.5209026.074843-8.11357434.188417False59.590967NaNNaN1.0
2023-03-16 20:00:00122222023-03-16 20:00:001679.641681.781652.531673.7376922.0186024.982806-7.36448832.347295False58.109595NaNNaN1.0
2023-03-17 00:00:00122232023-03-17 00:00:001673.731725.001662.651708.62166034.4307026.625765-4.57722431.202989False63.826631NaNNaN1.0
2023-03-17 04:00:00122242023-03-17 04:00:001708.621730.991696.371715.56103495.6875028.163175-2.43185130.595026False64.854113NaNNaN1.0

12225 rows × 15 columns


Creating the Machine Learning Model

First, we’ll create signals based on the MACD trend. and check macd trend is shift and add iloc 2 cause want to shift fist signal shift macd at starts have noise to avoid that should shift


MACD-Chart


Next, we’ll create the Random Forest model, preprocess the data, and split it into training and testing sets.


signaled=df_action[df_action['macd_trend'].shift(1)!= df_action['macd_trend']].iloc[2:]
signaled

indextimestampopenhighlowclosevolumeMACD_12_26_9MACDh_12_26_9MACDs_12_26_9macd_trendRSI_14overboughtoversoldRSI_trend
timestamp
2017-08-26 00:00:00532017-08-26 00:00:00327.24332.27323.41323.46569.578015.218425-0.1443035.362728False53.515087NaNNaN1.0
2017-08-27 16:00:00632017-08-27 16:00:00335.04341.47332.97339.64795.131775.1469010.2851604.861741True68.589301NaNNaN1.0
2017-08-31 16:00:00872017-08-31 16:00:00384.38387.39383.13384.45878.8837212.619776-0.00921812.628994False73.658395NaNNaN1.0
2017-09-05 16:00:001172017-09-05 16:00:00290.07321.52285.94318.26677.80125-17.4203850.985630-18.406015True48.964468NaNNaN-1.0
2017-09-08 12:00:001342017-09-08 12:00:00330.10331.57272.00294.052343.12862-1.422169-0.756305-0.665864False36.802842NaNNaN-1.0
2023-03-03 00:00:00121392023-03-03 00:00:001647.861649.251544.391565.44317798.34160-2.231587-5.0792402.847653False34.504987NaNNaN-1.0
2023-03-05 08:00:00121532023-03-05 08:00:001568.611573.071564.031568.5032527.56420-16.3836150.212968-16.596583True38.888106NaNNaN-1.0
2023-03-08 20:00:00121742023-03-08 20:00:001552.721561.291523.611532.38128310.87040-9.161005-0.786086-8.374919False32.613518NaNNaN-1.0
2023-03-11 16:00:00121912023-03-11 16:00:001428.001457.041422.321449.22180365.58850-32.7690321.859142-34.628174True42.608536NaNNaN-1.0
2023-03-15 12:00:00122142023-03-15 12:00:001679.911698.291625.561634.04313571.1957046.865502-3.71541450.580916False52.346900NaNNaN1.0

921 rows × 15 columns


create return


signaled['return']=signaled['close'].pct_change().shift(-1)
signaled

indextimestampopenhighlowclosevolumeMACD_12_26_9MACDh_12_26_9MACDs_12_26_9macd_trendRSI_14overboughtoversoldRSI_trendreturn
timestamp
2017-08-26 00:00:00532017-08-26 00:00:00327.24332.27323.41323.46569.578015.218425-0.1443035.362728False53.515087NaNNaN1.00.050022
2017-08-27 16:00:00632017-08-27 16:00:00335.04341.47332.97339.64795.131775.1469010.2851604.861741True68.589301NaNNaN1.00.131934
2017-08-31 16:00:00872017-08-31 16:00:00384.38387.39383.13384.45878.8837212.619776-0.00921812.628994False73.658395NaNNaN1.0-0.172168
2017-09-05 16:00:001172017-09-05 16:00:00290.07321.52285.94318.26677.80125-17.4203850.985630-18.406015True48.964468NaNNaN-1.0-0.076070
2017-09-08 12:00:001342017-09-08 12:00:00330.10331.57272.00294.052343.12862-1.422169-0.756305-0.665864False36.802842NaNNaN-1.00.026220
2023-03-03 00:00:00121392023-03-03 00:00:001647.861649.251544.391565.44317798.34160-2.231587-5.0792402.847653False34.504987NaNNaN-1.00.001955
2023-03-05 08:00:00121532023-03-05 08:00:001568.611573.071564.031568.5032527.56420-16.3836150.212968-16.596583True38.888106NaNNaN-1.0-0.023028
2023-03-08 20:00:00121742023-03-08 20:00:001552.721561.291523.611532.38128310.87040-9.161005-0.786086-8.374919False32.613518NaNNaN-1.0-0.054269
2023-03-11 16:00:00121912023-03-11 16:00:001428.001457.041422.321449.22180365.58850-32.7690321.859142-34.628174True42.608536NaNNaN-1.00.127531
2023-03-15 12:00:00122142023-03-15 12:00:001679.911698.291625.561634.04313571.1957046.865502-3.71541450.580916False52.346900NaNNaN1.0NaN

921 rows × 16 columns


Clean data



signaled=signaled.iloc[:-1]
signaled

indextimestampopenhighlowclosevolumeMACD_12_26_9MACDh_12_26_9MACDs_12_26_9macd_trendRSI_14overboughtoversoldRSI_trendreturn
timestamp
2017-08-26 00:00:00532017-08-26 00:00:00327.24332.27323.41323.46569.578015.218425-0.1443035.362728False53.515087NaNNaN1.00.050022
2017-08-27 16:00:00632017-08-27 16:00:00335.04341.47332.97339.64795.131775.1469010.2851604.861741True68.589301NaNNaN1.00.131934
2017-08-31 16:00:00872017-08-31 16:00:00384.38387.39383.13384.45878.8837212.619776-0.00921812.628994False73.658395NaNNaN1.0-0.172168
2017-09-05 16:00:001172017-09-05 16:00:00290.07321.52285.94318.26677.80125-17.4203850.985630-18.406015True48.964468NaNNaN-1.0-0.076070
2017-09-08 12:00:001342017-09-08 12:00:00330.10331.57272.00294.052343.12862-1.422169-0.756305-0.665864False36.802842NaNNaN-1.00.026220
2023-03-02 16:00:00121372023-03-02 16:00:001628.311652.481621.451648.4878007.649704.1381910.0614884.076702True53.188566NaNNaN1.0-0.050374
2023-03-03 00:00:00121392023-03-03 00:00:001647.861649.251544.391565.44317798.34160-2.231587-5.0792402.847653False34.504987NaNNaN-1.00.001955
2023-03-05 08:00:00121532023-03-05 08:00:001568.611573.071564.031568.5032527.56420-16.3836150.212968-16.596583True38.888106NaNNaN-1.0-0.023028
2023-03-08 20:00:00121742023-03-08 20:00:001552.721561.291523.611532.38128310.87040-9.161005-0.786086-8.374919False32.613518NaNNaN-1.0-0.054269
2023-03-11 16:00:00121912023-03-11 16:00:001428.001457.041422.321449.22180365.58850-32.7690321.859142-34.628174True42.608536NaNNaN-1.00.127531

920 rows × 16 columns


signaled_filter=signaled[['close','macd_trend','overbought','oversold','RSI_trend','MACDs_12_26_9','MACD_12_26_9','RSI_14','return',]]

Label y: profit = 1, loss = -1, and nonprofit = 0

signaled_filter.loc[signaled_filter['return']>0,'y']=1
signaled_filter.loc[signaled_filter['return']<0,'y']=-1
signaled_filter.loc[signaled_filter['return']==0,'y']=0

Clean data


signaled_filter = signaled_filter.fillna(0)
signaled_filter

closemacd_trendoverboughtoversoldRSI_trendMACDs_12_26_9MACD_12_26_9RSI_14returny
timestamp
2017-08-26 00:00:00323.46False0.00.01.05.3627285.21842553.5150870.0500221.0
2017-08-27 16:00:00339.64True0.00.01.04.8617415.14690168.5893010.1319341.0
2017-08-31 16:00:00384.45False0.00.01.012.62899412.61977673.658395-0.172168-1.0
2017-09-05 16:00:00318.26True0.00.0-1.0-18.406015-17.42038548.964468-0.076070-1.0
2017-09-08 12:00:00294.05False0.00.0-1.0-0.665864-1.42216936.8028420.0262201.0
2023-03-02 16:00:001648.48True0.00.01.04.0767024.13819153.188566-0.050374-1.0
2023-03-03 00:00:001565.44False0.00.0-1.02.847653-2.23158734.5049870.0019551.0
2023-03-05 08:00:001568.50True0.00.0-1.0-16.596583-16.38361538.888106-0.023028-1.0
2023-03-08 20:00:001532.38False0.00.0-1.0-8.374919-9.16100532.613518-0.054269-1.0
2023-03-11 16:00:001449.22True0.00.0-1.0-34.628174-32.76903242.6085360.1275311.0

920 rows × 10 columns


Separate features and target


#my_features to traning = ['macd_trend','overbought','oversold','RSI_trend','MACDs_12_26_9','MACD_12_26_9','RSI_14']
X = signaled_filter.iloc[:,1:-2]
y = signaled_filter.iloc[:,-1]

X

macd_trendoverboughtoversoldRSI_trendMACDs_12_26_9MACD_12_26_9RSI_14
timestamp
2017-08-26 00:00:00False0.00.01.05.3627285.21842553.515087
2017-08-27 16:00:00True0.00.01.04.8617415.14690168.589301
2017-08-31 16:00:00False0.00.01.012.62899412.61977673.658395
2017-09-05 16:00:00True0.00.0-1.0-18.406015-17.42038548.964468
2017-09-08 12:00:00False0.00.0-1.0-0.665864-1.42216936.802842
2023-03-02 16:00:00True0.00.01.04.0767024.13819153.188566
2023-03-03 00:00:00False0.00.0-1.02.847653-2.23158734.504987
2023-03-05 08:00:00True0.00.0-1.0-16.596583-16.38361538.888106
2023-03-08 20:00:00False0.00.0-1.0-8.374919-9.16100532.613518
2023-03-11 16:00:00True0.00.0-1.0-34.628174-32.76903242.608536

920 rows × 7 columns


Split data into training and testing sets

from 920 data will use data to traning 700 data and test 100 data and split data to test with machine learning model 120 data how it working


X_train = X.iloc[:700]
y_train = y.iloc[:700]
X_test = X.iloc[700:800]
y_test = y.iloc[700:800]

Now, let’s import the Random Forest model and evaluate its performance.


Import machine learning model RandomForestRegressor and GridSearchCV for finding best parameter of machine learning and


from sklearn.model_selection import train_test_split
from sklearn.ensemble import (RandomForestRegressor, 
                              RandomForestClassifier)
from sklearn.metrics import (mean_squared_error, 
                             r2_score, 
                             mean_absolute_error,
                             mean_absolute_percentage_error, 
                             accuracy_score, 
                             precision_score, 
                             recall_score, 
                             f1_score)
from sklearn.model_selection import (GridSearchCV, 
                                     TimeSeriesSplit)

Split data (75% for training, 25% for testing)


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=0, shuffle=False)

Hyperparameter Tuning with GridSearchCV

First, we define the Random Forest parameters for tuning with GridSearchCV:


rf_params = {
            'n_estimators': [100,200,300],  
            'max_depth': np.arange(3,6,1),   
            'min_samples_leaf': [1,2,5,10],  
            'bootstrap': [True, False], 
            'max_features': [1, 2],
}

Next, we use TimeSeriesSplit to split the data and obtain the best parameters using GridSearchCV:


# initial TimeSeriesSplit  = 5 it mean split to 5
tscv = TimeSeriesSplit(n_splits = 5)

# use grid Search parameter
rf_regr = GridSearchCV(RandomForestClassifier(random_state=0), rf_params, cv=tscv, n_jobs=-1)
grid_result = rf_regr.fit(X_train, y_train)

# we will see the result in this section

print('Best Score: ', grid_result.best_score_)
print('Best Params: ', grid_result.best_params_)

Best Score:  0.5982758620689654
Best Params:  {'bootstrap': True, 'max_depth': 3, 'max_features': 2, 'min_samples_leaf': 2, 'n_estimators': 100}


Creating the RandomForestClassifier

After obtaining the best parameters from GridSearchCV, we create the RandomForestClassifier:


clf =RandomForestClassifier(
    max_depth=grid_result.best_params_['max_depth'],
    n_estimators=grid_result.best_params_['n_estimators'],
    max_features=grid_result.best_params_['max_features'],
    min_samples_leaf=grid_result.best_params_['min_samples_leaf'],
    random_state=0)
clf.fit(X_train,y_train)

# output
RandomForestClassifier(max_depth=3, max_features=2, min_samples_leaf=2,random_state=0)

Feature Importance Analysis

We visualize the feature importance to understand the impact of each feature on the model:


Importance = pd.DataFrame({'feature importance':clf.feature_importances_*100}, index=X_train.columns)
Importance.sort_values('feature importance', axis=0, ascending=True).plot(kind='barh', color='r')
plt.xlabel('feature importance')

PE-Band-Chart


preview predict and actual with dataframe


y_pred = clf.predict(X_test)
pd.DataFrame({'actual': y_test,
              'predict': y_pred})

actualpredict
timestamp
2021-11-04 08:00:001.01.0
2021-11-07 08:00:001.0-1.0
2021-11-09 20:00:00-1.01.0
2021-11-15 00:00:00-1.0-1.0
2021-11-15 20:00:00-1.01.0
2022-05-13 04:00:00-1.0-1.0
2022-05-18 12:00:001.01.0
2022-05-19 16:00:00-1.0-1.0
2022-05-21 00:00:001.01.0
2022-05-21 04:00:001.0-1.0

100 rows × 2 columns


Create Confusion Matrix to see how accuracy from this model


from sklearn.metrics import plot_confusion_matrix
plot_confusion_matrix(clf,X_test,y_test)

png


from sklearn import metrics

Model Evaluation


We compare the predicted values with the actual values using a confusion matrix and calculate the accuracy, precision, recall, and F1 score:


print('Accuracy:',metrics.accuracy_score(y_test,y_pred))
print('Precision:',metrics.precision_score(y_test,y_pred))
print('Recall:',metrics.recall_score(y_test,y_pred))
print('F1 Score:',metrics.f1_score(y_test,y_pred))
    Accuracy: 0.64
    Precision: 0.6382978723404256
    Recall: 0.6122448979591837
    F1 Score: 0.625

test with real data

X_live = X.iloc[100:]
X_live
macd_trendoverboughtoversoldRSI_trendMACDs_12_26_9MACD_12_26_9RSI_14
timestamp
2018-04-22 00:00:00False0.00.01.023.12603822.93151162.966387
2018-04-22 08:00:00True0.00.01.023.07153623.23470869.677206
2018-04-23 08:00:00False0.00.01.023.71308623.64252266.992778
2018-04-24 00:00:00True0.00.01.023.45802023.95086072.530878
2018-04-25 04:00:00False0.00.01.027.29640625.43979256.633962
2023-03-02 16:00:00True0.00.01.04.0767024.13819153.188566
2023-03-03 00:00:00False0.00.0-1.02.847653-2.23158734.504987
2023-03-05 08:00:00True0.00.0-1.0-16.596583-16.38361538.888106
2023-03-08 20:00:00False0.00.0-1.0-8.374919-9.16100532.613518
2023-03-11 16:00:00True0.00.0-1.0-34.628174-32.76903242.608536

820 rows × 7 columns


test_live = signaled[100:].copy()
test_live

indextimestampopenhighlowclosevolumeMACD_12_26_9MACDh_12_26_9MACDs_12_26_9macd_trendRSI_14overboughtoversoldRSI_trendreturn
timestamp
2018-04-22 00:00:0014812018-04-22 00:00:00604.87606.00589.00600.1417037.3642322.931511-0.19452723.126038False62.966387NaNNaN1.00.041657
2018-04-22 08:00:0014832018-04-22 08:00:00611.80633.56611.03625.1426165.6759423.2347080.16317223.071536True69.677206NaNNaN1.00.019548
2018-04-23 08:00:0014892018-04-23 08:00:00641.10642.00630.42637.3615391.2724723.642522-0.07056423.713086False66.992778NaNNaN1.00.049752
2018-04-24 00:00:0014932018-04-24 00:00:00644.58673.20643.76669.0731087.6947123.9508600.49284023.458020True72.530878NaNNaN1.0-0.011763
2018-04-25 04:00:0015002018-04-25 04:00:00654.00669.00624.00661.2067160.8908225.439792-1.85661427.296406False56.633962NaNNaN1.00.028947
2023-03-02 16:00:00121372023-03-02 16:00:001628.311652.481621.451648.4878007.649704.1381910.0614884.076702True53.188566NaNNaN1.0-0.050374
2023-03-03 00:00:00121392023-03-03 00:00:001647.861649.251544.391565.44317798.34160-2.231587-5.0792402.847653False34.504987NaNNaN-1.00.001955
2023-03-05 08:00:00121532023-03-05 08:00:001568.611573.071564.031568.5032527.56420-16.3836150.212968-16.596583True38.888106NaNNaN-1.0-0.023028
2023-03-08 20:00:00121742023-03-08 20:00:001552.721561.291523.611532.38128310.87040-9.161005-0.786086-8.374919False32.613518NaNNaN-1.0-0.054269
2023-03-11 16:00:00121912023-03-11 16:00:001428.001457.041422.321449.22180365.58850-32.7690321.859142-34.628174True42.608536NaNNaN-1.00.127531

820 rows × 16 columns


test_live['signal']=clf.predict(X_live)
test_live

indextimestampopenhighlowclosevolumeMACD_12_26_9MACDh_12_26_9MACDs_12_26_9macd_trendRSI_14overboughtoversoldRSI_trendreturnsignal
timestamp
2018-04-22 00:00:0014812018-04-22 00:00:00604.87606.00589.00600.1417037.3642322.931511-0.19452723.126038False62.966387NaNNaN1.00.0416571.0
2018-04-22 08:00:0014832018-04-22 08:00:00611.80633.56611.03625.1426165.6759423.2347080.16317223.071536True69.677206NaNNaN1.00.019548-1.0
2018-04-23 08:00:0014892018-04-23 08:00:00641.10642.00630.42637.3615391.2724723.642522-0.07056423.713086False66.992778NaNNaN1.00.0497521.0
2018-04-24 00:00:0014932018-04-24 00:00:00644.58673.20643.76669.0731087.6947123.9508600.49284023.458020True72.530878NaNNaN1.0-0.011763-1.0
2018-04-25 04:00:0015002018-04-25 04:00:00654.00669.00624.00661.2067160.8908225.439792-1.85661427.296406False56.633962NaNNaN1.00.0289471.0
2023-03-02 16:00:00121372023-03-02 16:00:001628.311652.481621.451648.4878007.649704.1381910.0614884.076702True53.188566NaNNaN1.0-0.050374-1.0
2023-03-03 00:00:00121392023-03-03 00:00:001647.861649.251544.391565.44317798.34160-2.231587-5.0792402.847653False34.504987NaNNaN-1.00.0019551.0
2023-03-05 08:00:00121532023-03-05 08:00:001568.611573.071564.031568.5032527.56420-16.3836150.212968-16.596583True38.888106NaNNaN-1.0-0.023028-1.0
2023-03-08 20:00:00121742023-03-08 20:00:001552.721561.291523.611532.38128310.87040-9.161005-0.786086-8.374919False32.613518NaNNaN-1.0-0.0542691.0
2023-03-11 16:00:00121912023-03-11 16:00:001428.001457.041422.321449.22180365.58850-32.7690321.859142-34.628174True42.608536NaNNaN-1.00.127531-1.0

820 rows × 17 columns


change signal ml from -1 1. to True False cause will use to vectorbt backtest


test_live['signal']=test_live['signal'].apply(lambda x:True if x==1 else False)
test_live

indextimestampopenhighlowclosevolumeMACD_12_26_9MACDh_12_26_9MACDs_12_26_9macd_trendRSI_14overboughtoversoldRSI_trendreturnsignal
timestamp
2018-04-22 00:00:0014812018-04-22 00:00:00604.87606.00589.00600.1417037.3642322.931511-0.19452723.126038False62.966387NaNNaN1.00.041657True
2018-04-22 08:00:0014832018-04-22 08:00:00611.80633.56611.03625.1426165.6759423.2347080.16317223.071536True69.677206NaNNaN1.00.019548False
2018-04-23 08:00:0014892018-04-23 08:00:00641.10642.00630.42637.3615391.2724723.642522-0.07056423.713086False66.992778NaNNaN1.00.049752True
2018-04-24 00:00:0014932018-04-24 00:00:00644.58673.20643.76669.0731087.6947123.9508600.49284023.458020True72.530878NaNNaN1.0-0.011763False
2018-04-25 04:00:0015002018-04-25 04:00:00654.00669.00624.00661.2067160.8908225.439792-1.85661427.296406False56.633962NaNNaN1.00.028947True
2023-03-02 16:00:00121372023-03-02 16:00:001628.311652.481621.451648.4878007.649704.1381910.0614884.076702True53.188566NaNNaN1.0-0.050374False
2023-03-03 00:00:00121392023-03-03 00:00:001647.861649.251544.391565.44317798.34160-2.231587-5.0792402.847653False34.504987NaNNaN-1.00.001955True
2023-03-05 08:00:00121532023-03-05 08:00:001568.611573.071564.031568.5032527.56420-16.3836150.212968-16.596583True38.888106NaNNaN-1.0-0.023028False
2023-03-08 20:00:00121742023-03-08 20:00:001552.721561.291523.611532.38128310.87040-9.161005-0.786086-8.374919False32.613518NaNNaN-1.0-0.054269True
2023-03-11 16:00:00121912023-03-11 16:00:001428.001457.041422.321449.22180365.58850-32.7690321.859142-34.628174True42.608536NaNNaN-1.00.127531False

820 rows × 17 columns


Testing the Model on Live Data


We test the model on live data to validate its performance and use vectorbt to backtest the results.


Filter the prediction data frame based on the live test data


time_predict =test_live.index[0]

df_predict=df.copy()

df_predict['timestamp'] = pd.to_datetime(df['timestamp'], unit='ms')
df_predict.reset_index(inplace=True)
df_predict.set_index(df_predict['timestamp'],inplace=True)
df_predict

indextimestampopenhighlowclosevolume
timestamp
2017-08-17 04:00:0002017-08-17 04:00:00301.13307.96298.00307.961561.95305
2017-08-17 08:00:0012017-08-17 08:00:00307.95312.00307.00308.951177.71088
2017-08-17 12:00:0022017-08-17 12:00:00308.95310.51303.56307.061882.05267
2017-08-17 16:00:0032017-08-17 16:00:00307.74312.18298.21301.601208.05192
2017-08-17 20:00:0042017-08-17 20:00:00301.60310.85299.01302.001200.94182
2023-03-16 12:00:00122202023-03-16 12:00:001660.371666.731635.961663.06129192.79690
2023-03-16 16:00:00122212023-03-16 16:00:001663.061691.421652.881679.64113369.52090
2023-03-16 20:00:00122222023-03-16 20:00:001679.641681.781652.531673.7376922.01860
2023-03-17 00:00:00122232023-03-17 00:00:001673.731725.001662.651708.62166034.43070
2023-03-17 04:00:00122242023-03-17 04:00:001708.621730.991696.371715.56103495.68750

12225 rows × 7 columns


df_predict=df_predict.loc[time_predict:,:'close']
df_predict

indextimestampopenhighlowclose
timestamp
2018-04-22 00:00:0014812018-04-22 00:00:00604.87606.00589.00600.14
2018-04-22 04:00:0014822018-04-22 04:00:00600.22612.49599.00611.80
2018-04-22 08:00:0014832018-04-22 08:00:00611.80633.56611.03625.14
2018-04-22 12:00:0014842018-04-22 12:00:00625.20641.00614.31634.46
2018-04-22 16:00:0014852018-04-22 16:00:00634.46643.49630.00636.58
2023-03-16 12:00:00122202023-03-16 12:00:001660.371666.731635.961663.06
2023-03-16 16:00:00122212023-03-16 16:00:001663.061691.421652.881679.64
2023-03-16 20:00:00122222023-03-16 20:00:001679.641681.781652.531673.73
2023-03-17 00:00:00122232023-03-17 00:00:001673.731725.001662.651708.62
2023-03-17 04:00:00122242023-03-17 04:00:001708.621730.991696.371715.56

10744 rows × 6 columns


test_live['signal']

timestamp
    2018-04-22 00:00:00     True
    2018-04-22 08:00:00    False
    2018-04-23 08:00:00     True
    2018-04-24 00:00:00    False
    2018-04-25 04:00:00     True
                           ...  
    2023-03-02 16:00:00    False
    2023-03-03 00:00:00     True
    2023-03-05 08:00:00    False
    2023-03-08 20:00:00     True
    2023-03-11 16:00:00    False
    Name: signal, Length: 820, dtype: bool

Add the signal from the machine learning model to the live data


df_predict =  df_predict.join(test_live['signal'])
df_predict =df_predict.fillna(method='ffill')

df_predict

indextimestampopenhighlowclosesignal
timestamp
2018-04-22 00:00:0014812018-04-22 00:00:00604.87606.00589.00600.14True
2018-04-22 04:00:0014822018-04-22 04:00:00600.22612.49599.00611.80True
2018-04-22 08:00:0014832018-04-22 08:00:00611.80633.56611.03625.14False
2018-04-22 12:00:0014842018-04-22 12:00:00625.20641.00614.31634.46False
2018-04-22 16:00:0014852018-04-22 16:00:00634.46643.49630.00636.58False
2023-03-16 12:00:00122202023-03-16 12:00:001660.371666.731635.961663.06False
2023-03-16 16:00:00122212023-03-16 16:00:001663.061691.421652.881679.64False
2023-03-16 20:00:00122222023-03-16 20:00:001679.641681.781652.531673.73False
2023-03-17 00:00:00122232023-03-17 00:00:001673.731725.001662.651708.62False
2023-03-17 04:00:00122242023-03-17 04:00:001708.621730.991696.371715.56False

10744 rows × 7 columns


Create vectorbt signals


signal_vectorbt_ml_turning = df_predict.ta.tsignals(df_predict.signal,
                                           asbool=True,append=True)

signal_vectorbt_ml_turning.loc[signal_vectorbt_ml_turning['TS_Trades']!=0]

will see the result vectorbt signal


TS_TrendsTS_TradesTS_EntriesTS_Exits
timestamp
2018-04-22 08:00:00False-1FalseTrue
2018-04-23 08:00:00True1TrueFalse
2018-04-24 00:00:00False-1FalseTrue
2018-04-25 04:00:00True1TrueFalse
2018-04-27 08:00:00False-1FalseTrue
2023-03-02 16:00:00False-1FalseTrue
2023-03-03 00:00:00True1TrueFalse
2023-03-05 08:00:00False-1FalseTrue
2023-03-08 20:00:00True1TrueFalse
2023-03-11 16:00:00False-1FalseTrue

783 rows × 4 columns


Use vectorbt Portfolio from_signals to backtest the strategy


port_ml_turning = vbt.Portfolio.from_signals(df_predict.open,entries=signal_vectorbt_ml_turning.TS_Exits,exits=signal_vectorbt_ml_turning.TS_Entries,freq = '4h',init_cash = 1000,size=0.1,fees = 0.0002,direction=2,slippage = 0.005,)

#Plot the portfolio

port_ml_turning.plot().show()

PE-Band-Chart


Display the portfolio statistics


port_ml_turning.stats()

    Start                               2018-04-22 00:00:00
    End                                 2023-03-17 04:00:00
    Period                                447 days 16:00:00
    Start Value                                      1000.0
    End Value                                   2802.131168
    Total Return [%]                             180.213117
    Benchmark Return [%]                         182.477227
    Max Gross Exposure [%]                        27.505487
    Total Fees Paid                               36.027332
    Max Drawdown [%]                              13.663183
    Max Drawdown Duration                  12 days 00:00:00
    Total Trades                                        783
    Total Closed Trades                                 782
    Total Open Trades                                     1
    Open Trade PnL                                27.319297
    Win Rate [%]                                  51.918159
    Best Trade [%]                                57.756017
    Worst Trade [%]                              -22.688176
    Avg Winning Trade [%]                          6.603421
    Avg Losing Trade [%]                          -2.416791
    Avg Winning Trade Duration    0 days 17:53:56.453201970
    Avg Losing Trade Duration     0 days 09:09:05.744680851
    Profit Factor                                  2.443637
    Expectancy                                      2.26958
    Sharpe Ratio                                   4.818487
    Calmar Ratio                                   9.636257
    Omega Ratio                                    1.234305
    Sortino Ratio                                  7.252599
    dtype: object


Conclusion

In conclusion, we have successfully built a Random Forest model for predicting buy and sell signals in the ETH/USD market. We used GridSearchCV for hyperparameter tuning and obtained the best parameters for our model. After fitting the model with the best parameters, we plotted the feature importances to understand the impact of each feature on the model.


We then evaluated the performance of our model using various metrics, including accuracy, precision, recall, and F1 score. The model demonstrated promising results, with an accuracy of 0.64, precision of 0.638, recall of 0.612, and an F1 score of 0.625.


To validate the performance of our model, we tested it on live data and incorporated its predictions into a backtesting framework using vectorbt. The backtesting results showed a total return of 180.21% and a Sharpe Ratio of 4.82, indicating that the strategy might be profitable.


However, it is essential to remember that past performance is not always indicative of future results. Further analysis and testing should be conducted to ensure the robustness and reliability of the model. Additionally, it is worth exploring other machine learning algorithms and techniques to improve the model’s performance and adapt it to changing market conditions.