Apr 27, · In this article we are going to create deep reinforcement learning agents that learn to make money trading Bitcoin. In this tutorial we will be using OpenAI’s gym and the PPO agent from the stable-baselines library, a fork of OpenAI’s baselines library. Dec 14, · Trading bitcoin with reinforcement learning crypmoney.de south africaThe site is very reliable, the prices are quite affordable and the company, based in London, has a team of trading bitcoin with reinforcement learning crypmoney.de South Africa professionals that offers a range of services, including mining rigs sale and cloud miningcontracts for those who choose . Bitcoin operates on a decentralized public ledger technology titled the blockchain for Trading Bitcoin with reinforcement learning. When consumers acquire purchases victimization the metallic element.S. dollar, banks and credit card companies verify the accuracy of those transactions.

# Bitcoin trading reinforcement learning

Trading bitcoin with reinforcement learning crypmoney.de indiaWe will first improve our model and engineer some better features for our agent to learn from, then we will use a technique called Bayesian optimization to zone in on the most profitable hyper-parameters. In a nutshell,. In simpler terms, Bayesian optimization is an efficient method for improving any black box model.

It works by modeling the objective function you want to optimize using a surrogate function, or a distribution of surrogate functions. How does this apply to our Bitcoin trading bots? Essentially, we can use this technique to find the set of hyper-parameters that make our model the most profitable.

We are searching for a needle in a haystack and Bayesian optimization is our magnet. The first thing we are going to do, before optimizing our hyper-parameters, is make a couple improvements on the code we wrote in the last article. Instead, it is inherently captured by the recursive nature of the network. It was also pointed out to me on the last article that our data is not stationary , and therefore, any machine learning model is going to have a hard time predicting future values.

The bottom line is that our time series contains an obvious trend and seasonality, which both impact our algorithms ability to predict the time series accurately. Differencing is the process of subtracting the derivative rate of return at each time step from the value at that time step. This has the desired result of removing the trend in our case, however, the data still has a clear seasonality to it.

We can verify the produced time series is stationary by running it through an Augmented Dickey-Fuller Test. Doing this gives us a p-value of 0. In our case, we are going to be adding some common, yet insightful technical indicators to our data set, as well as the output from the StatsModels SARIMAX prediction model. The technical indicators should add some relevant, though lagging information to our data set, which will be complimented well by the forecasted data from our prediction model.

To choose our set of technical indicators, we are going to compare the correlation of all 32 indicators 58 features available in the ta library. We can use pandas to find the correlation between each indicator of the same type momentum, volume, trend, volatility , then select only the least correlated indicators from each type to use as features.

That way, we can get as much benefit out of these technical indicators as possible, without adding too much noise to our observation space. It turns out that the volatility indicators are all highly correlated, as well as a couple of the momentum indicators. Next we need to add our prediction model.

One might think our reward function from the previous article i. While our simple reward function from last time was able to profit, it produced volatile strategies that often lead to stark losses in capital. To improve on this, we are going to need to consider other metrics to reward, besides simply unrealized profit.

While this strategy is great at rewarding increased returns, it fails to take into account the risk of producing those high returns. The most common risk-adjusted return metric is the Sharpe ratio. To maintain a high Sharpe ratio, an investment must have both high returns and low volatility i.

This metric has stood the test of time, however it too is flawed for our purposes, as it penalizes upside volatility.

For Bitcoin, this can be problematic as upside volatility wild upwards price movement can often be quite profitable to be a part of. The Sortino ratio is very similar to the Sharpe ratio, except it only considers downside volatility as risk, rather than overall volatility.

As a result, this ratio does not penalize upside volatility. The second rewards metric that we will be testing on this data set will be the Calmar ratio. All of our metrics up to this point have failed to take into account drawdown. Drawdown is the measure of a specific loss in value to a portfolio, from peak to trough. Large drawdowns can be detrimental to successful trading strategies, as long periods of high returns can be quickly reversed by a sudden, large drawdown.

To encourage strategies that actively prevent large drawdowns, we can use a rewards metric that specifically accounts for these losses in capital, such as the Calmar ratio. Our final metric, used heavily in the hedge fund industry, is the Omega ratio. On paper, the Omega ratio should be better than both the Sortino and Calmar ratios at measuring risk vs. To find it, we need to calculate the probability distributions of a portfolio moving above or below a specific benchmark, and then take the ratio of the two.

The higher the ratio, the higher the probability of upside potential over downside potential. While writing the code for each of these rewards metrics sounds really fun, I have opted to use the empyrical library to calculate them instead. Getting a ratio at each time step is as simple as providing the list of returns and benchmark returns for a time period to the corresponding Empyrical function. Any great technician needs a great toolset. Instead of re-inventing the wheel, we are going to take advantage of the pain and suffering of the programmers that have come before us.

TPEs are parallelizable, which allows us to take advantage of our GPU, dramatically decreasing our overall search time. Optimizing hyper-parameters with Optuna is fairly simple. A trial contains a specific configuration of hyper-parameters and its resulting cost from the objective function. So we are left with simply taking a slice of the full data frame to use as the training set from the beginning of the frame up to some arbitrary index, and using the rest of the data as the test set.

Next, since our environment is only set up to handle a single data frame, we will create two environments, one for the training data and one for the test data.

Now, training our model is as simple as creating an agent with our environment and calling model. Here, we are using tensorboard so we can easily visualize our tensorflow graph and view some quantitative metrics about our agents.

For example, here is a graph of the discounted rewards of many agents over , time steps:. Wow, it looks like our agents are extremely profitable! It was at this point that I realized there was a bug in the environment… Here is the new rewards graph, after fixing that bug:.

As you can see, a couple of our agents did well, and the rest traded themselves into bankruptcy. However, the agents that did well were able to 10x and even 60x their initial balance, at best. However, we can do much better. In order for us to improve these results, we are going to need to optimize our hyper-parameters and train our agents for much longer. Time to break out the GPU and get to work!

In this article, we set out to create a profitable Bitcoin trading agent from scratch, using deep reinforcement learning.

We were able to accomplish the following:. Next time, we will improve on these algorithms through advanced feature engineering and Bayesian optimization to make sure our agents can consistently beat the market. Stay tuned for my next article , and long live Bitcoin!

It is important to understand that all of the research documented in this article is for educational purposes, and should not be taken as trading advice. You should not trade based on any algorithms or strategies defined in this article, as you are likely to lose your investment. Thanks for reading! As always, all of the code for this tutorial can be found on my GitHub. I can also be reached on Twitter at notadamking. You can also sponsor me on Github Sponsors or Patreon via the links below.

Hands-on real-world examples, research, tutorials, and cutting-edge techniques delivered Monday to Thursday. Make learning your daily ritual. Take a look. Get started. Open in app. Sign in. Editors' Picks Features Explore Contribute. Adam King. Getting Started For this tutorial, we are going to be using the Kaggle data set produced by Zielak. Trading Sessions. Conclusion In this article, we set out to create a profitable Bitcoin trading agent from scratch, using deep reinforcement learning.

Built a visualization of that environment using Matplotlib. Trained and tested our agents using simple cross-validation. Tuned our agent slightly to achieve profitability. Written by Adam King. Sign up for The Daily Pick. Get this newsletter. Review our Privacy Policy for more information about our privacy practices. Check your inbox Medium sent you an email at to complete your subscription. More from Towards Data Science Follow.

A Medium publication sharing concepts, ideas, and codes. Read more from Towards Data Science. More From Medium. Naser Tamimi in Towards Data Science. Terence Shin in Towards Data Science. Daniel Morales in Towards Data Science. Is Python Really a Bottleneck? Anna Anisienia in Towards Data Science. Python Jupyter Notebooks in Excel.

Tony Roberts in Towards Data Science.

## Comments 0