Estimation¶
Our goal is to estimate the parameter \(\theta\) of a copula, where \(X_1, ..., X_d\) are random variables associated to the marginals. We denote \(F_1, ..., F_d\) the cumulative distribution functions (CDF) and \(f_1, ..., f_d\) the probability density functions (PDF), if it exists, of those random variables. Most of estimation methods involve likelihood function \(L\), that we can easily get using the expression of copula’s density \(c\) :
To fit the copulas, we use a dataset \(\mathbf{x}=\left \{(x_{i1}, ..., x_{id}) \right \}_{1 \leq i \leq n}\) and the log-likelihood function will refer to \(\mathcal{L}(\mathbf{x}_i)=\log L(\mathbf{x}_i)\).
Maximum Likelihood Estimation (MLE)¶
-
estimation.
mle
(copula, X, marginals, hyper_param, hyper_param_start=None, hyper_param_bounds=None, theta_start=[0], theta_bounds=None, optimize_method='Nelder-Mead', bounded_optimize_method='SLSQP')¶ Computes the MLE on specified data.
Parameters: copula : Copula
The copula.
X : numpy array (of size n * copula dimension)
The data to fit.
marginals : numpy array
The marginals distributions. Use scipy.stats distributions or equivalent that requires pdf and cdf functions according to rv_continuous class from scipy.stat.
hyper_param : numpy array
The hyper-parameters for each marginal distribution. Use None when the hyper-parameter is unknow and must be estimated.
hyper_param_start : numpy array
The start value of hyper-parameters during optimization. Must be same dimension of hyper_param.
hyper_param_bounds : numpy array
Allowed values for each hyper-parameter.
theta_start : numpy array
Initial value of theta in optimization algorithm.
theta_bounds : couple
Allowed values of theta.
optimize_method : str
The optimization method used in SciPy minimization when no theta_bounds was specified.
bounded_optimize_method : str
The optimization method used in SciPy minimization under constraints.
Returns: optimizeResult : OptimizeResult
The optimization result returned from SciPy.
estimatedHyperParams : numpy array
The estimated hyper-parameters.
The MLE objective is to maximize the log-likelihood function over all parameters and hyper-parameters of marginals. We suppose that \(X_j \sim f(\beta_j)\) where \(\beta_j\) is an hyper-parameter of the copula. The MLE will then return the copula’s parameter and all estimated hyper-parameters at the same time. The estimated parameters \(\hat{\theta}, \hat{\beta}_1, ..., \hat{\beta}_d\) are solution to the following optimization problem.
For instance, suppose that we would like to fit a copula thanks to MLE method where \(X_1 \sim Gamma(\alpha, 1.2)\) and \(X_2 \sim Exp(\lambda)\) with \(\alpha > 0\) and \(\lambda > 0\). Then we would write the following code :
mle(copula, X, marginals=[ scipy.stats.gamma, scipy.stats.expon ], hyper_param=[ { 'a': None, 'scale': 1.2 }, { 'scale': None } ], hyper_param_bounds=[ [0, None], [0, None]])
Use None to consider an hyper-parameter as unknown and None to define \(\pm \infty\) in hyper-parameters bounds. Here is a detailled example on how to fit a Clayton copula with MLE.
clayton = ArchimedeanCopula(family="clayton", dim=2)
boundAlpha = [0, None] # Greater than 0
boundLambda = [0, None]
bounds = [ boundAlpha, boundLambda ]
paramX1 = { 'a': None, 'scale': 1.2 } # Hyper-parameters of Gamma
paramX2 = { 'scale': None } # Hyper-parameters of Exp
hyperParams = [ paramX1, paramX2 ] # The hyper-parameters
gamma = scipy.stats.gamma # The Gamma distribution
expon = scipy.stats.expon # The Exponential distribution
# Fitting copula with MLE method and Gamma/Exp marginals distributions
clayton.fit(data, method='mle', marginals=[gamma, exp], hyper_param=hyperParams, hyper_param_bounds=bounds)
Keep in mind that, in case where there are many hyper-parameters, the computational cost can be extremely high.
Inference Functions for Margins (IFM)¶
-
estimation.
ifm
(copula, X, marginals, hyper_param, hyper_param_start=None, hyper_param_bounds=None, theta_start=0, theta_bounds=None, optimize_method='Nelder-Mead', bounded_optimize_method='SLSQP')¶ Computes the IFM estimation on specified data.
Parameters: copula : Copula
The copula.
X : numpy array (of size n * copula dimension)
The data to fit.
marginals : numpy array
The marginals distributions. Use scipy.stats distributions or equivalent that requires pdf and cdf functions according to rv_continuous class from scipy.stat.
hyper_param : numpy array
The hyper-parameters for each marginal distribution. Use None when the hyper-parameter is unknow and must be estimated.
hyper_param_start : numpy array
The start value of hyper-parameters during optimization. Must be same dimension of hyper_param.
hyper_param_bounds : numpy array
Allowed values for each hyper-parameter.
theta_start : float
Initial value of theta in optimization algorithm.
theta_bounds : couple
Allowed values of theta.
optimize_method : str
The optimization method used in SciPy minimization when no theta_bounds was specified.
bounded_optimize_method : str
The optimization method used in SciPy minimization under constraints
Returns: optimizeResult : OptimizeResult
The optimization result returned from SciPy.
estimatedHyperParams : numpy array
The estimated hyper-parameters
The difference with the previous method is that hyper-parameters are estimated independently from the copula’s parameter.
Then, our observations \(\mathbf{x}\) are transformed into uniform variables \(\mathbf{u}\).
Finally, as we did before, we compute the likelihood function and use optimization algorithm to estimate the parameter \(\theta\).
The specifications of this method are the same of MLE, and you can use this method calling fitting process.
clayton.fit(data, method='ifm', marginals=[gamma, exp], hyper_param=hyperParams, hyper_param_bounds=bounds)
Canonical Maximum Likelihood Estimation (CMLE)¶
-
estimation.
cmle
(log_lh, theta_start=0, theta_bounds=None, optimize_method='Nelder-Mead', bounded_optimize_method='SLSQP')¶ Computes the CMLE on a specified log-likelihood function.
Parameters: log_lh : Function
The log-likelihood.
theta_start : float
Initial value of theta in optimization algorithm.
theta_bounds : couple
Allowed values of theta.
optimize_method : str
The optimization method used in SciPy minimization when no theta_bounds was specified.
bounded_optimize_method : str
The optimization method used in SciPy minimization under constraints
Returns: OptimizeResult
The optimization result returned from SciPy.
This semi-parametric method does not require to specify the marginals distributions of the copula. Indeed, instead of estimating the hyper-parameters, the empirical CDF \(\hat{F}_j\) for each random variable is computed and observations \(\mathbf{x}\) are transformed into uniform variables \(\mathbf{u}\).
Estimating the parameter of the copula is then applied maximizing log-likelihood on transformed data.
Since CMLE does not require marginal distributions, using this method is quite easy. For instance, on the Clayton copula :
clayton.fit(data, method='cmle')