A reinforcement learning approach to improve the performance of the Avellaneda-Stoikov market-making algorithm PLOS ONE
The performances of Sharpe ratios of each models indicates that the stock price models with stochastic volatility based on a quadratic utility function produces more attractive portfolios than the other models. It is demonstrated that the Model d has a Gaussian normal distribution while the others are positively skewed. In the training phase we fit our two Alpha-AS models with data from a full day of trading . In this, the most time-consuming step of the backtest process, our algorithms learned from their trading environment what AS model parameter values to choose every five seconds of trading (in those 5 seconds; see Section 4.1.3). A second problem with Q-learning is that performance can be unstable.
Meanwhile, the other stock price modelings in Table13 produce higher Sharpe ratios. In this part, we operate the simulations under the quadratic utility function for all introduced models here for the comparison purposes, although they have been defined with different utility criteria and solved under the different settings in their original papers. Increases, the risk-averse degree of the investor increases. Consequently, she will sell the assets with a lower price on the positive inventory levels to reduce both the price risk and liquidation risk.
Wireless ad hoc networks are infrastructureless networks and are used in various applications such as habitat monitoring, military surveillance, and disaster relief. Data transmission is achieved through radio packet transfer, thus it is prone to various attacks such as eavesdropping, spoofing, and etc. Monitoring the communication links by secure points is an essential precaution against these attacks. Also, deploying monitors provides a virtual backbone for multi-hop data transmission. However, adding secure points to a WANET can be costly in terms of price and time, so minimizing the number of secure points is of utmost importance.
Buy low, sell high: A high frequency trading perspective
All https://www.beaxy.com/s are computed from Close P&L returns (Section 4.1.6), except P&L-to-MAP, for which the open P&L is used. Figures in bold are the best values among the five models for the corresponding test days. Figures for Alpha-AS 1 and 2 are given in green if their value is higher than that for the AS-Gen model for the same day. Figures in parenthesis are the number of days the Alpha-AS model in question was second best only to the other Alpha-AS model (and therefore would have computed another overall ‘win’ had it competed alone against the baseline and AS-Gen models). We performed genetic search at the beginning of the experiment, aiming to obtain the values of the AS model parameters that yield the highest Sharpe ratio, working on the same orderbook data. At each training step the parameters of the prediction DQN are updated using gradient descent.
Gašperov and Konstanjčar tackle the problem be means of an ensemble of supervised learning models that provide predictive buy/sell signals as inputs to a DRL network trained with a genetic algorithm. The same authors have recently explored the use of a soft actor-critic RL algorithm in market making, to obtain a continuous action space of spread values . Comprehensive examinations of the use of RL in market making can be found in Gašperov et al. and Patel .
IEEE Transactions on Knowledge and Data Engineering
The original Avellaneda-Stoikov model was chosen as a starting point for our research. We plan to use such approximations in further tests with our RL approach. The performance of the Alpha-AS models in terms of the Sharpe, Sortino and P&L-to-MAP ratios was substantially superior to that of the Gen-AS model, which in turn was superior to that of the two standard baselines.
With these values, the AS model will determine the next reservation price and spread to use for the following orders. In other words, we do not entrust the entire order placement decision process to the RL algorithm, learning through blind trial and error. Rather, taking inspiration from Teleña , we mediate the order placement decisions through the AS model (our “avatar”, taking the term from ), leveraging its ability to provide quotes that maximize profit in the ideal case.
What is the order book liquidity/density (κ)
For instance, the model given by has a considerable Sharpe ratio and inventory management with a lower standard deviation comparing to the symmetric strategy. Besides, we further quantify the effects of a variety of parameters in models on the bid and ask spreads and observe that the trader follows different strategies on positive and negative inventory levels, separately. The strategy derived by the model , for instance, illustrates that when time is approaching to the terminal horizon, the optimal spreads converge to a fixed, constant value. Furthermore, in case of the jumps in volatility, it is observed that a higher profit can be obtained but with a larger standard deviation. Stock price prediction and modeling demonstrate high economic value in the financial market. Due to the non-linearity and volatility of stock prices and the unique nature of financial transactions, it is essential for the prediction method to ensure high prediction performance and interpretability.
Increasing the number of training experiences may result in a decrease in performance; effectively, a loss of learning. To improve stability, a DQN stores its experiences in a replay buffer, in terms of the value function given by Eq , where now the Q-value estimates are not stored in a matrix but obtained as the outputs of the neural network, given the current state as its input. A policy function is then applied to decide the next action. The DQN then learns periodically, with batches of random samples drawn from the replay buffer, thus covering more of the state space, which accelerates the learning while diminishing the influence of single or of correlated experiences on the learning process. The cumulative profit resulting from a market maker’s operations comes from the successive execution of trades on both sides of the spread. This profit from the spread is endangered when the market maker’s buy and sell operations are not balanced overall in volume, since this will increase the dealer’s asset inventory.
Learn how to use avellaneda market making strategies
Are they scaled by some scaling parameter beforehand – and what data is this parameter estimated from ? If not, how much data is lost by only using the price differences with absolute values smaller than 1? Also, if the market candle features are “divided by the open mid-price for the candle”, does this mean that all of those higher than the mid-price would be would be truncated to 1? The methodology might be more sound than this, but the text simply does not offer answers to these questions. Maximum drawdown registers the largest loss of portfolio value registered between any two points of a full day of trading. The performance results for the 30 days of testing of the two Alpha-AS models against the three baseline models are shown in Tables 2–5.
For every avellaneda-stoikov model of data the number of ticks occurring in each 5-second interval had positively skewed, long-tailed distributions. The means of these thirty-two distributions ranged from 33 to 110 ticks per 5-second interval, the standard deviations from 21 to 67, the minimums ran from 0 to 20, the maximums from 233 to 1338, and the skew ranged from 1.0 to 4.4. Reducing the number of features considered by the RL agent in turn dramatically reduces the number of states. This helps the algorithm learn and improves its performance by reducing latency and memory requirements. @RRG Right, this makes sense that the market-maker can place quotes improving on the current midprice.
- Table 2 shows that one or the other of the two Alpha-AS models achieved better Sharpe ratios, that is, better risk-adjusted returns, than all three baseline models on 24 (12+12) of the 30 test days.
- By default, when you run create, we ask you to enter the basic parameters needed for a market-making bot.
- The AS-Gen model was a distant third, with 4 wins on Sharpe.
- This part intends to show the numerical experiments and the behaviour of the market maker under the results given in Sect.
- We calibrate the model to real limit order book data which we back-test.
MWCVC is a very suitable infrastructure for energy-efficient link monitoring NEAR and virtual backbone formation. In this paper, we propose a novel metaheuristic algorithm for MWCVC construction in WANETs. Our algorithm is a population-based iterated greedy approach that is very effective against graph theoretical problems. We explain the idea of the algorithm and illustrate its operation through sample examples.
The optimal bid and ask quotes are obtained from a set of formulas built around these parameters. These formulas prescribe the AS strategy for placing limit orders. The rationale behind the strategy is, in Avellaneda and Stoikov’s words, to perform a ‘balancing act between the dealer’s personal risk considerations and the market environment’ [ibid.]. This Avellaneda-Stoikov baseline model (Gen-AS) constitutes another original contribution, to our knowledge, in ETC that its parameters are optimised using a genetic algorithm working on a day’s worth of data prior to the test data. The genetic algorithm selects the best-performing values found for the Gen-AS parameters on the corresponding day of data. This procedure helps establish AS parameter values that fit initial market conditions.
After choosing the exchange and the pair you will trade, the next question is if you want to let the bot calculate the risk factor and order book depth parameter. If you set this to false, you will be asked to enter both parameters values. The reasoning behind this parameter is that, as the trading session is getting close to an end, the market maker wants to have an inventory position similar to when the one he had when the trading session started.
To this approach, more specifically one based on deep reinforcement learning, we turn to next. In order to analyze the experimental results, we work on the models that we have derived using different metrics. It is salient to mention that the market maker modifies her qualitative behavior in various situations, i.e., changing inventory levels, utility functions. By our numerical results, we deduce that the jump effects and comparative statistics metrics provide us with the information for the traders to gain expected profits.
Risk metrics and fine tuning of high frequency trading strategies. That is introduced by Avellaneda and Stoikov and handled by quadratic approximation approach.. While the other parameters are kept the same as in the Table1. Is the value function for the control problem and, moreover, the optimal controls are given by . Cited by lists all citing articles based on Crossref citations. Papers With Code is a free resource with all data licensed under CC-BY-SA.
For mature avellaneda-stoikov models, such as the U.S. and Europe, the real-time LOB is event-based and updates at high speed of at least milliseconds and up to nanoseconds. The dataset from the Nasdaq Nordic stock market in Ntakaris et al. contains 100,000 events per stock per day, and the dataset from the London Stock Exchange in Zhang et al. contains 150,000. In contrast, exchanges in the Chinese A-share market publish the level II data, essentially 10-level LOB, every three seconds on average, with 4500–5000 daily ticks. This snapshot data provides us with the opportunity to leverage the longer tick-time interval and make profits using machine learning algorithms. A wide variety of RL techniques have been developed to allow the agent to learn from the rewards it receives as a result of its successive interactions with the environment. A notable example is Google’s AlphaGo project , in which a deep reinforcement learning algorithm was given the rules of the game of Go, and it then taught itself to play so well that it defeated the human world champion.
Avellaneda -Stoikov market making model – Quantitative Finance Stack Exchange https://t.co/BJMIqgi4XZ
— ??︎?? (@dome_cs) September 3, 2020
For this purpose, we should obtain an appropriate solution to with the final condition and show that this solution verifies the value function . While the market maker wants to maximize her profit from the transactions over a finite time horizon, she also wants to keep her inventories under control and get rid of the remaining inventories at the final time T by the penalization terms. The role of a dealer in securities markets is to provide liquidity on the exchange by quoting bid and ask prices at which he is willing to buy and sell a specific quantity of assets. The Sharpe ratio is a measure of mean returns that penalises their volatility.
Meanwhile, interpretable results show that IIFI can effectively distinguish between important and redundant features via rating corresponding scores to each feature. As a byproduct of our interpretable methods, the scores over features can be used to further optimize the investment strategy. We have designed a market making agent that relies on the Avellaneda-Stoikov procedure to minimize inventory risk. The agent can also skew the bid and ask prices output by the Avellaneda-Stoikov procedure, tweaking them and, by so doing, potentially counteract the limitations of a static Avellaneda-Stoikov model by reacting to local market conditions. The agent learns to adapt its risk aversion and skew its bid and ask prices under varying market behaviour through reinforcement learning using two variants (Alpha-AS-1 and Alpha-AS-2) of a double DQN architecture. The central notion is that, by relying on a procedure developed to minimise inventory risk (the Avellaneda-Stoikov procedure) by way of prior knowledge, the RL agent can learn more quickly and effectively.