Reasonableness and Correctness for Operational Value-at-Risk

Calculating the amount of regulatory capital to cover unexpected losses due to operational events in the upcoming year has caused problems because of difficulties in fitting probability distributions to data. It is consequently difficult to judge an appropriate level of capital that reflects the risk profile of a financial institution. We provide theoretical and empirical analyses to link the calculated capital to the sum of losses using appropriate statistical approximations. We conclude that, in order to reasonably reflect the associated risk, the capital should be approximately half the sum of losses, with a wide bound for the ratio of capital to sum.


Introduction
Every year, banks and other financial institutions are required to assess their reserves for unexpected adverse financial events for the following year. That assessment covers financial risk (due to credit and market events), and also non-financial risk. Operational risk (OpRisk) constitutes the major part of non-financial risk, and is the subject of this paper. OpRisk can be loosely defined as the risk of financial systems failure, and typically might constitute approximately 10% of total capital reserves. Reserved monies for potential upcoming adverse events cannot be used for lending. Hence, it is important that the amount of the reserve should be consistent with the risk profile of the bank. If it is too high, lending will be impaired, and the knock-on effect on supporting economic activity may be significant. If it is too low, a bank may not be able to fund unexpected losses, and could, in extreme circumstances, fail. The recent collapse of Credit Suisse is a prime example, and has been widely reported in the financial press. It has been attributed to lax internal risk controls coupled with scandals involving its senior management, which resulted in a loss of confidence from customers, and withdrawal of funds (Stuart, 2023). OpRisk is regulated by national supervisory banks using Basel regulations (BIS, 2006) as a basis.
Particular problems are associated with calculating operational risk reserves. They are discussed below. The purpose of this paper is to provide a simple decision rule to determine whether or not a calculated operational risk reserve is consistent with the data used to calculate it. Informally, "Is the OpRisk value correct?". The proposal is to find an approximate theoretical relationship between OpRisk reserve value and the sum of OpRisk losses.

Glossary of banking technical terms used in this paper
In this paper, the technical terms can be divided into those referring to banking practice, and common statistical terminology. The banking terms are listed below. The statistical terms are listed in section (3.1).
 Operational Risk (OpRisk) is defined (BIS, 2006) as the risk of financial loss due to flawed or failed processes, policies, systems or events that disrupt business operations. It covers losses due to fraud, customer compensation, fines, physical damage and system outages.  Value-at-Risk (VaR) is a common financial risk metric. VaR is the maximum acceptable loss over a given time horizon within a given probability bound. The regulations (BIS, 2006) specify that a fixed probability 0.999 should be used, and we state this as 99.9% VaR.  Capital is the name given to the reserves that are held from year to year in order to be able to pay for OpRisk events that materialize. It can amount to billions of $US  The most widely applicable metric for measuring OpRisk VaR is the Loss Distribution Approach (LDA), originally formulated by Frachot et al. (2001).

Problems associated with calculating OpRisk capital
The LDA is a Monte Carlo simulation, incorporating a convolution of frequency and severity distributions, both of which have to be fitted to the data. The problem lies with the severity distribution. In most cases it is possible to find, for any one data set, multiple statistical distributions that fit the data. The resulting capitals can differ widely. Consequently, which distribution to choose is not a simple matter. One choice is the distribution with the optimal fit. Another is a distribution that has fewer outliers. Very often the selected capital is one that corresponds closely to capitals calculated in previous years, thereby stabilizing capital year-on-year.

Illustration of wide variation on VaR
The following example illustrates the problem. A sample of size 500 was generated from a LogNormal distribution, resulting in data that has the 'fat-tailed' characteristics that are typical of operational risk data. Figure (1) shows a histogram of the sample. A range of distributions were fitted to the data, and VaR was calculated for each. The results are shown in table (1), in decreasing order of fit suitability, The huge variation in VaR for the distributions shown is clear. It is a phenomenon that is very common in this context, and arises for two main reasons. First, distribution fitting can be imprecise. Iterative processes sometimes converge to non-optimal states, and small changes in parameter values can produce large changes in VaR. Second, sampling in the LDA Monte Carlo process can generate excessive sample values for some distributions, which are unrepresentative of the original data. Section (6) has a discussion of the VaR values for this case in the light of the model proposed in section (4). Some entries in table (1) are extraordinarily high (many billions -marked as "Reject: not credible"). They exceed the capitalization of most banks, and would be rejected immediately. However, there is, as yet, no objective criterion for doing so.

Literature Review
Basing VaR calculations directly on properties of the data has received little attention to date. Authors have concentrated, instead, on attempts to predict OpRisk losses using economic variates. Examples include , and Abdymomunov, Curti and Mihov (2020). In contrast, Berkowitz and O'Brian (2002) link OpRisk losses with corporate P&L. Chavez-Demoulin et al. (2016) have implemented an enhanced Generalized Additive Model (GAM) using economic co-variates as distribution hyper-parameters. Their model does use simple statistical properties of the data, such as Mean, Standard Deviation, Median, Third Quartile, Maximum, and Skewness. It is hard, with complex models such as GAM, to identify specific input features as drivers of output.
Specific properties of a set of largest losses, which are known to be principal determinants of VaR, are not considered (see Abdymomunov, Curti and Mihov, 2020). Two papers by Migueis (2016 and2023) incorporate historic loss frequency and mean total loss into a quantile regression model. Their intention is to test whether future VaR can be predicted using those historic features values. They report that those features, with balance sheet co-variates such as Market Capitalization, are significant predictors of future losses.
Machine Learning has been applied in the context of OpRisk, but rarely using statistical properties of the data directly. Pena et al. (2021) use a convolutional neural network enhanced by fuzzy cognitive maps, which makes it extremely difficult to link VaR to any particular feature. Total loss severity and annual frequency are used. Chen and Wen (2010) investigated prediction of OpRisk losses using operational control variables, such as fraud prevention measures.

Statistical terminology
The following is a list of the principal statistical terms, expressed informally, used for the analysis in this paper. Full statistical definitions may be found in, for example, Weisstein (2023).
 Distribution (or Statistical Distribution) is a description of the relative numbers of times each possible outcome will occur in a number of trials. In particular, the Normal distribution is the "bell curve".  Quantile: specific ranges within a distribution. For example, the 99 th quantile is the value that exceeds 99% of all data values. In particular, the Median is the middle data value (i.e. the 50% quantile).  Order Statistic: If data are ordered by size, the n th Order Statistic is the n th in the list.  Goodness-of-Fit (GoF): A test to find the probability that a given result could not have occurred by chance.  The Central Limit Theorem (CLT) states that if samples are drawn from a data set, the distribution of the mean of those samples approximates a Normal distribution as the sample size increases.  Maximum Likelihood is a method for estimating distribution parameters by progressively improving parameter estimates, so as to maximize the probability that the data fits the statistical model.

The Loss Distribution Approach -Usage
Consider a set of M losses L={x1, x2, …, xM}, assumed to be independently and identically distributed, and drawn from a fat-tailed severity distribution. Each loss is date-stamped. If the date range is t years, it is useful to use the annual loss frequency, n = M/t. A frequency distribution, usually Poisson or Negative Binomial, can also be assigned to L. The base LDA formulation is shown in Appendix A1. In its convolution stage, the single draw from the frequency distribution (step b(i)) is a random variable. That draw determines how many samples to draw from the severity distribution. The sum of draws from the severity distribution constitutes a single estimate of annual loss. After repeating multiple times (typically 1 million), the annual loss estimates can be ordered for easy extraction of the regulatory 99.9% quantile. In the analysis below, the variable single draw from the frequency distribution is replaced by a constant.

Central Limit Theorem VaR Approximation
The basis of the following analysis is the CLT. Its impact is to model sums of random variables by a single Normal distribution, from which a quantile can be extracted. In this analysis, it is assumed that the losses L are modelled by a LogNormal distribution with parameters  and  denoted here by LN(,  2 ).
In the Fixed Frequency LDA variant (Mitic, Cooper and Bloxham, 2020) samples of fixed size and are drawn, and the 99.9% quantile is extracted. using the "base" LogNormal distribution, it is possible to calculate, separately, expressions for the expected value of the sum of all M losses, and for the expected value for the extracted quantile, which is used as an approximation for VaR. For calculations using the data only, the resulting VaR value is the Bootstrap VaR (Efron and Tibshirani, 1994), and is expected to be less than the corresponding value calculated from the distribution fitted to the data (the Distribution VaR). This is because sampling from the data excludes sample points that are greater than the largest empirical datum.
Step (b) in the LDA then reduces to drawing samples of size n from the severity distribution. Since the sample size is constant, the CLT applies. The result of applying the CLT for samples of size n drawn with replacement is a Normal distribution, , representing the sum of n losses.
~ ( , 2 ) The 99.9% quantile of Sn now acts as an estimator of VaR, R, at 99.9% over a t-year horizon. From equations (1) and (2), with z = 3.09 for 99.9% confidence: Now consider the expected value of the M losses. That sum, S, can be modelled by a sum of t random variables, each of which is a sum of n LN(,  2 ) random variables, , from equation (2). The distribution and expected value of S are given in equation (4). The ratio ( ) follows immediately (equation (5)).
It is also easy to show that, theoretically, the sum of losses exceeds VaR. Equation (6) shows the difference between ( ) and . The expression on the right-hand side is positive provided that t > 1. Typically, t would be at least 5, as most banks have an OpRisk loss history of at least 5 years. A further result, equation (7), gives an expression for a lower bound for VaR by considering the condition − ( ) > 0 for some number . The condition leads to an expression which shows that depends on the parameters t, n, z and . The dependency is examined in section (5).

Scaling
Numerical results for the preceding analysis were based on a 5-year time window. If the time window is not 5 years, a scaling adjustment is needed. A simple scaling formula can be derived for large n, which is the general case. With that assumption, equation (5) reduces to the expression in equation (8). If t is then replaced by 1/( ), where c is a positive scale factor, the right-hand mapping applies. The implications are discussed in section (6).

Results
The main thrust of this paper is to develop the relationship between the expected loss sum and VaR in equation (5). The numerical results presented in this section serve as a sense check for that relationship.

Data and software
The loss data comprised approximately 500 data sets drawn from 'fat-tailed' distributions (listed in table 1), with a nominal time span t = 5 years. The overriding principle for data generation was random selection of distribution, distribution parameters and sample size, and then to randomly mix generated data. For each data set, a distribution was selected at random, with a bias towards distributions that we encounter most often: LogNormal, LogNormal Mixtures, LogNormal-Gamma Mixtures, Weibull, Tukey G&H and LogGamma. Distribution parameters were selected at random using uniform distributions based on appropriate parameter ranges. For example, the ranges for the LogNormal "" and "" parameters were [6, 15] and [0.5, 2.5] respectively. The sample sizes for all data sets were selected using uniform distributions with lower and upper bounds 100 and 500 respectively. In order to make each data set more realistic, pairs of data sets were randomly selected and merged, such that it would not be possible to definitively assign a distribution a priori to any given data set.
Each candidate distribution was fitted to each data set using Maximum Likelihood, and optimal GoF was determined using the TNA GoF test (Mitic, 2015). The TNA test is a formalization of a QQ-plot (a plot of predicted against actual empirical quantiles , and is robust with respect to sample sizes. For each data set, the loss sum was recorded, and 99.9% VaR was calculated using the LDA. All calculations were done using the R statistical language (https://www.r-project.org/).  Table (2) shows values of = ( ) for a typical range of n and . The highlighted values show regions that are broadly consistent with a range (0.3, 0.7), centered on 0.5. This 'consistent' region can be characterized by "small  with small n, or medium  with medium n, or large  with large n". The condition < ( ) applies in most other parts of table (2). When s is large, R exceeds ( ). In those cases, the value of R should be questioned.  Three features that are apparent in Figure (2) can be rationalized in terms of equations (5) and (7). They apply for typical annual frequencies, n~(50-250).
 The upper VaR bound (blue, solid) corresponds approximately to  >2, with  = 1. Ordinates with  >1 are common, but indicate unreliable VaR estimations as  increases.  The mid-line (blue, dashed) is the best linear fit, and corresponds approximately to  ~(1,2), with a major concentration for  ~(0.3, 0.7)

Scale factor for other data windows
The 'consistency' range (0.3, 0.7), centered on 0.5 and highlighted in table (2), depends on having used a 5-year data window. For windows of other lengths, the table (2) entries would have to be scaled to account for an increased data sum. A scale factor is introduced in section (6.1) for this purpose.

Discussion
This work is an attempt to resolve the problem of which distribution and VaR value to select if many choices are available. The solution proposed is to relate OpRisk VaR to a simple function of the original data: the Sum of losses. The underlying intuition comes from known properties of the LDA VaR calculation: VaR increases as loss frequency and severity increase. Further, combinations such as Large Var/Small Sum, or Small Var/ Large Sum do not make sense in practice sense they would be inconsistent with the risk profile of the organization concerned. The theoretical treatment in section (4) confirms the empirical view.

Decision Rule
Given the empirical results of section (5), the following decision rule can be formulated. The results presented in section (5) rely on a 5-year data window. Therefore, a scale factor, c, has to be incorporated to account for other data windows. Using the approximation in equation (8), we can define a scaled ratio ′ = The 'base' case is c = 1, which corresponds to the 5-year window. The value of the scale factor is 2 for a 10-year time window, 3 for a 15year time window etc. With empirical data, the expected loss sum, ( ), is the observed loss sum, S, so that = and ′ = . The decision rule takes the form of the choices in equation (9). It incorporates a de minimis value for cases where the calculated VaR is close to zero, based on the minimum VaR = (1/ ) ( ) (case 1 in section (5.2) and equation (5)  Referring to the 5-year time window, the 'ideal' value of VaR is approximately 0.5 (i.e. R ~ ½S). The value ½ should be relaxed within a wide range between 0.2 and 1. Values less than 0.2 can be rejected immediately. Values greater than 1 are harder to assess. In theory, VaR can be unbounded above, so some leeway in this case is permissible. Overall, the decision rule in equation (9) should be regarded as guidelines.
Looking again at the example in table (1), the decision rule does provide a clear guide. The sum of losses for the sample was $75.59m, and the time horizon was 5 years. Setting = 1/2 sets a guide VaR value of $37.80m. The nearest VaR with a GoF pass to that value is $34.3m, using a Weibull distribution. The LogNormal Mix or Gamma Mix distributions also resulted in acceptable distributions. One of them might be selected if, for example, it produced fewer outliers. If = 1 = 0.2 , a minimum VaR value is $15.19m. The low VaR from the Tukey G&H distribution would therefore be rejected. All others would be rejected on the grounds that they are too high. Of those, the LogNormal and LogNormal Gamma Mix distributions are high, but credible. The LogGamma distribution is borderline credible (just below 10 times the loss sum -a somewhat arbitrary limit, obtained from observation in practice). The others are not credible, and can now be rejected on an objective basis. It should be noted that although the LogNormal distribution is the best fit, its resulting VaR is not optimal.
Two remaining problems are what to do if no distributions pass the GoF test, or if no distributions with a GoF pass satisfy the criteria specified in the equation (9) decision rule. In both cases, a 'solution' is to select the distribution that expresses the minimum GoF fail, and is nearest to the nominal "Sum/2" value. Such cases are rare.
It is nearly always possible to amend data thresholds such that the decision rule can be applied successfully.

Advice for Practitioners
Using the method in practice is very simple using the scaled ratio ′ = . VaR is calculated, and is multiplied by the data sum and the scale factor c. The result can then be compared with the numerical values in the decision rule in equation (9). For example, a recent fraud capital calculation for a medium-to-large European bank resulted in VaR ~ €28.2m, and a data sum of €88.4m, with a 10-year data window. In this case c = 10/5 = 2. The scaled ratio 2×28.2/88.4 = 0.64 falls within the range (0.2, 1), and the measured VaR is deemed to be acceptable. Any measured VaR in excess of €44.2m would be rejected as excessive. The minimum acceptable VaR would be €8.84m.

Limitations of the method
The decision rule in equation (9) should be treated as a guide, rather than an absolute rule. It arises from approximations for the LDA VaR, and is formulated with a view to assessing whether or not a measured VaR is 'sensible'. If a measured VaR value is orders of magnitude greater than the data sum, it is clear that the severity distribution used was unrepresentative of the data. The decision rule is most useful if an empirical VaR value is 'large', but not 'too large'. Considering, again the example in section (6.2), the proposed upper 'reasonable' VaR value was stated as €44.2m. Had the measured VaR been €50m (i.e. approximately 10% above the proposed limit), it would have been accepted. Had the measured VaR been €100m (i.e. nearly four times the proposed limit), we would seek an alternative distribution and recalculate VaR.
A further limitation of the proposed decision rule is that it says nothing about an upper ceiling for VaR. In other words, how 'large', is 'too large'? This is a long-standing problem in OpRisk, and a paper on the subject is currently under at the review stage (Mitic, 2024). The approach for that study was similar to the approach taken for this paper, and the form of the result is also similar: "Ceiling ~ 7 1 3 × Annual Sum of Losses". The multiplier 7 1 3 is fluid to a limited extent.

Funding Statement
This research received no external funding.
ii) Draw a sample of size Z, DZ, from D. iii) Sum the losses in DZ to obtain ∑DZ, the estimate of annual loss. c) Calculate the 99.9 th percentile of the retained sums ∑DZ.