Title: | Estimation and Testing for a Lognormal-Pareto Mixture |
---|---|
Description: | Estimates a lognormal-Pareto mixture by maximizing the profile likelihood function. A likelihood ratio test for discriminating between lognormal and Pareto tail is also implemented. See Bee, M. (2022) <doi:10.1007/s11634-022-00497-4>. |
Authors: | Marco Bee [aut, cre] |
Maintainer: | Marco Bee <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.0 |
Built: | 2025-02-13 20:21:52 UTC |
Source: | https://github.com/marco-bee/lnpar |
This function computes the density of a mixture of a lognormal and a Pareto r.v.
dLnormParMix(x, pi, mu, sigma, xmin, alpha)
dLnormParMix(x, pi, mu, sigma, xmin, alpha)
x |
non-negative numerical vector: values where the density has to be evaluated. |
pi |
scalar, 0 < p < 1: mixing weight. |
mu |
scalar: expected value of the lognormal distribution on the log scale. |
sigma |
positive scalar: standard deviation of the lognormal distribution on the log scale. |
xmin |
positive scalar: threshold. |
alpha |
positive scalar: Pareto shape parameter. |
Density of the lognormal-Pareto distribution evaluated at x.
mixDens <- dLnormParMix(5,.5,0,1,4,1.5)
mixDens <- dLnormParMix(5,.5,0,1,4,1.5)
This function evaluates the density of a Pareto r.v.s
dpareto(x, xmin, alpha)
dpareto(x, xmin, alpha)
x |
numerical vector (>xmin): values where the density has to be evaluated. |
xmin |
positive scalar: Pareto scale parameter. |
alpha |
positive scalar: Pareto shape parameter. |
Density of the Pareto distribution evaluated at x.
parDens <- dpareto(5,4,1.5)
parDens <- dpareto(5,4,1.5)
This function evaluates the log-likelihood function with respect to xmin for a mixture of a lognormal and a Pareto r.v., assuming to know the numerical values of all the other parameters.
ll_lnormparmix(x, pi, mu, sigma, alpha, y)
ll_lnormparmix(x, pi, mu, sigma, alpha, y)
x |
positive scalar: value of xmin where the function is evaluated. |
pi |
scalar, 0 < pi < 1: mixing weight. |
mu |
scalar: expected value of the lognormal distribution on the log scale. |
sigma |
positive scalar: standard deviation of the lognormal distribution on the log scale. |
alpha |
non-negative scalar: Pareto shape parameter. |
y |
(nx1) vector: random sample from the mixture. |
ll numerical value of the log-likelihood function.
y <- rLnormParMix(100,.5,0,1,4,1.5) llMix <- ll_lnormparmix(x,pi,mu,sigma,alpha,y(3,.5,0,1,1.5,y)
y <- rLnormParMix(100,.5,0,1,4,1.5) llMix <- ll_lnormparmix(x,pi,mu,sigma,alpha,y(3,.5,0,1,1.5,y)
This function fits a lognormal-Pareto mixture by means of the ECME algorithm.
LPfitEM(y, eps, maxiter, qxmin0 = 0.5)
LPfitEM(y, eps, maxiter, qxmin0 = 0.5)
y |
numerical vector: random sample from the mixture. |
eps |
non-negative scalar: tolerance for the stopping rule. |
maxiter |
non-negative integer: maximum number of iterations of the ECME algorithm. |
qxmin0 |
scalar, 0 < qxmin0 < 1: quantile level used for determining the starting value of xmin. Defaults to 0.5. |
Estimation of a lognormal-Pareto mixture via the ECME algorithm.
A list with the following elements:
pars: estimated parameters (p, alpha, mu, sigma, xmin).
loglik: maximized log-likelihood.
thRank: estimated rank of xmin.
niter: number of iterations.
postProb: matrix of posterior probabilities.
bootstd: bootstrap standard errors of the estimators.
ysim <- sort(rLnormParMix(100,.9,0,1,5,1)) mixFit <- LPfitEM(ysim,1e-10,1000)
ysim <- sort(rLnormParMix(100,.9,0,1,5,1)) mixFit <- LPfitEM(ysim,1e-10,1000)
This function fits a lognormal-Pareto mixture by maximizing the profile log-likelihood.
LPfitProf(y, minRank, nboot)
LPfitProf(y, minRank, nboot)
y |
numerical vector: random sample from the mixture. |
minRank |
integer: minimum possible rank of the threshold. |
nboot |
number of bootstrap replications used for estimating the standard errors. If omitted, no standard errors are computed. |
Estimation is implemented as in Bee (2022). As of standard errors, at each bootstrap replication the mixture is estimated with thresholds equal to ys(minRank), ys(minRank+1),..., ys(n), where n is the sample size and ys is the sample sorted in ascending order. The latter procedure is implemented via parallel computing. If the algorithm does not converge in 1000 iterations, a message is displayed.
A list with the following elements:
xmin: estimated threshold.
prior: estimated mixing weight.
postProb: matrix of posterior probabilities.
alpha: estimated Pareto shape parameter.
mu: estimated expectation of the lognormal distribution on the lognormal scale.
sigma: estimated standard deviation of the lognormal distribution on the lognormal scale.
loglik: maximized log-likelihood.
nit: number of iterations.
npareto: estimated number of Pareto observations.
bootstd: bootstrap standard errors of the estimators.
Bee M (2024). “On discriminating between lognormal and Pareto tail: an unsupervised mixture-based approach.” Advances in Data Analysis and Classification, 18, 251-269.
mixFit <- LPfitProf(TN2016,90,0)
mixFit <- LPfitProf(TN2016,90,0)
This function draws a bootstrap sample from the null (lognormal) distribution and computes the test for the null hypothesis of a pure lognormal distribution versus the alternative of a lognormal-Pareto mixture, where the parameters of the latter are estimated via maximum profile likelihood. To be only called from ParallelTest. Estimation unde rthe alternative is perfromed
LPtest(x, n, muNull, sigmaNull, minRank)
LPtest(x, n, muNull, sigmaNull, minRank)
x |
list: sequence of integers 1,...,K, where K is the mumber of datasets. Set x = 1 in case of a single dataset. |
n |
sample size. |
muNull |
lognormal expected value under the null hypothesis. |
sigmaNull |
lognormal standard deviation under the null hypothesis. |
minRank |
minimum possible rank of the threshold. |
A list with the following elements:
LR: observed value of the llr test.
Bee M (2024). “On discriminating between lognormal and Pareto tail: an unsupervised mixture-based approach.” Advances in Data Analysis and Classification, 18, 251-269.
n = 100 muNull = mean(log(TN2016)) sigmaNull = sd(log(TN2016)) minRank = 90 res = LPtest(1,n,muNull,sigmaNull,minRank)
n = 100 muNull = mean(log(TN2016)) sigmaNull = sd(log(TN2016)) minRank = 90 res = LPtest(1,n,muNull,sigmaNull,minRank)
This function draws a bootstrap sample from the null (lognormal) distribution and computes the test for the null hypothesis of a pure lognormal distribution versus the alternative of a lognormal-Pareto mixture, where the parameters of the latter are estimated by means of the ECME algorithm. To be only called from ParallelTestEM.
LPtestEM(x, n, muNull, sigmaNull)
LPtestEM(x, n, muNull, sigmaNull)
x |
list: sequence of integers 1,...,K, where K is the mumber of datasets. Set x = 1 in case of a single dataset. |
n |
sample size. |
muNull |
log-expectation value under the null hypothesis. |
sigmaNull |
log-standard deviation under the null hypothesis. |
A list with the following elements:
LR: observed value of the llr test.
n = 100 muNull = mean(log(TN2016)) sigmaNull = sd(log(TN2016)) res = LPtestEM(1,n,muNull,sigmaNull)
n = 100 muNull = mean(log(TN2016)) sigmaNull = sd(log(TN2016)) res = LPtestEM(1,n,muNull,sigmaNull)
This function draws a bootstrap sample and uses it to estimate the parameters of a lognormal-Pareto mixture distribution. Since this is typically called by LPfit, see the help of LPfit for examples.
MLEBoot(x, y, minRank, p0, alpha0, mu0, Psi0)
MLEBoot(x, y, minRank, p0, alpha0, mu0, Psi0)
x |
list: sequence of integers 1,...,K, where K is the mumber of datasets. Set x = 1 in case of a single dataset. |
y |
numerical vector: observed sample. |
minRank |
positive integer: minimum possible rank of the threshold. |
p0 |
(0<p0<1): starting value of the mixing weight. |
alpha0 |
non-negative scalar: starting value of the Pareto shape parameter. |
mu0 |
scalar: starting value of the log-expectation of the lognormal distribution on the log scale. |
Psi0 |
non-negative scalar: starting value of the log-variance of the lognormal distribution on the log scale. |
At each bootstrap replication, the mixture is estimated with thresholds equal to ys(minRank), ys(minRank+1),..., ys(n), where n is the sample size and ys is the sample in ascending order. The function is typically called by LPfit (see the example below).
Estimated parameters obtained from a bootstrap sample.
Bee, M. (2022), “On discriminating between lognormal and Pareto tail: a mixture-based approach”, Advances in Data Analysis and Classification, https://doi.org/10.1007/s11634-022-00497-4
This function estimates the parameters of a Pareto and a lognormal density, assuming a known threshold.
par_logn_mix_known(y, prior1, th, alpha, mu, sigma)
par_logn_mix_known(y, prior1, th, alpha, mu, sigma)
y |
non-negative numerical vector: random sample from the mixture. |
prior1 |
scalar (0<prior1<1): starting value of the prior probability. |
th |
positive scalar: threshold. |
alpha |
non-negative scalar: starting value of the Pareto shape parameter. |
mu |
scalar: starting value of the lognormal parameter mu. |
sigma |
positive scalar: starting value of the lognormal parameter sigma. |
A list with the following elements:
xmin: estimated threshold.
prior: estimated mixing weight.
post: matrix of posterior probabilities.
alpha: estimated Pareto shape parameter.
mu: estimated expectation of the lognormal distribution on the lognormal scale.
sigma: estimated standard deviation of the lognormal distribution on the lognormal scale.
loglik: maximized log-likelihood.
nit: number of iterations.
mixFit <- par_logn_mix_known(TN2016, .5, 4700, 3, 7, 1.2)
mixFit <- par_logn_mix_known(TN2016, .5, 4700, 3, 7, 1.2)
This function computes the bootstrap test for the null hypothesis of a pure lognormal distribution versus the alternative of a lognormal-Pareto mixture, where the parameters of the latter are estimated via maximum profile likelihood. Implemented via parallel computing.
ParallelTest(nboot, y, obsTest, minRank)
ParallelTest(nboot, y, obsTest, minRank)
nboot |
number of bootstrap replications. |
y |
observed data. |
obsTest |
value of the test statistics computed with the data under analysis. |
minRank |
minimum possible rank of the threshold. |
A list with the following elements:
LR: nboot simulated values of the llr test under the null hypothesis.
pval: p-value of the test.
minRank = 90 mixFit <- LPfit(TN2016,minRank,0) ell1 <- mixFit$loglik estNull <- c(mean(log(TN2016)),sd(log(TN2016))) ellNull <- sum(log(dlnorm(TN2016,estNull[1],estNull[2]))) obsTest <- 2*(ell1-ellNull) nboot = 2 TestRes = ParallelTest(nboot,TN2016,obsTest,minRank)
minRank = 90 mixFit <- LPfit(TN2016,minRank,0) ell1 <- mixFit$loglik estNull <- c(mean(log(TN2016)),sd(log(TN2016))) ellNull <- sum(log(dlnorm(TN2016,estNull[1],estNull[2]))) obsTest <- 2*(ell1-ellNull) nboot = 2 TestRes = ParallelTest(nboot,TN2016,obsTest,minRank)
This function computes the bootstrap test for the null hypothesis of a pure lognormal distribution versus the alternative of a lognormal-Pareto mixture, where the parameters of the latter are estimated by means of the ECME algorithm. likelihood. Implemented via parallel computing.
ParallelTestEM(nboot, y, obsTest)
ParallelTestEM(nboot, y, obsTest)
nboot |
number of bootstrap replications. |
y |
observed data. |
obsTest |
value of the test statistics computed with the data under analysis. |
A list with the following elements:
LR: nboot simulated values of the llr test under the null hypothesis.
pval: p-value of the test.
minRank = 90 mixFit <- LPfit(TN2016,minRank,0) ell1 <- mixFit$loglik estNull <- c(mean(log(TN2016)),sd(log(TN2016))) ellNull <- sum(log(dlnorm(TN2016,estNull[1],estNull[2]))) obsTest <- 2*(ell1-ellNull) nboot = 2 TestRes = ParallelTestEM(nboot,TN2016,obsTest)
minRank = 90 mixFit <- LPfit(TN2016,minRank,0) ell1 <- mixFit$loglik estNull <- c(mean(log(TN2016)),sd(log(TN2016))) ellNull <- sum(log(dlnorm(TN2016,estNull[1],estNull[2]))) obsTest <- 2*(ell1-ellNull) nboot = 2 TestRes = ParallelTestEM(nboot,TN2016,obsTest)
This function simulates random numbers for a mixture of a lognormal and a Pareto r.v.
rLnormParMix(n, pi, mu, sigma, xmin, alpha)
rLnormParMix(n, pi, mu, sigma, xmin, alpha)
n |
positive integer: number of simulated random numbers. |
pi |
scalar, 0 < pi < 1: mixing weight. |
mu |
scalar: expected value of the lognormal distribution on the log scale. |
sigma |
positive scalar: standard deviation of the lognormal distribution on the log scale. |
xmin |
positive scalar: threshold. |
alpha |
non-negative scalar: Pareto shape parameter. |
n iid random numbers from the lognormal-Pareto distribution.
ySim <- rLnormParMix(100,.5,0,1,4,1.5)
ySim <- rLnormParMix(100,.5,0,1,4,1.5)
This function simulates random numbers for a Pareto r.v.
rpareto(n, xmin, alpha)
rpareto(n, xmin, alpha)
n |
positive integer: number of simulated random numbers. |
xmin |
positive scalar: Pareto scale parameter. |
alpha |
non-negative scalar: Pareto shape parameter. |
n iid random numbers from the Pareto distribution.
ySim <- rpareto(5,4,1.5)
ySim <- rpareto(5,4,1.5)
A dataset containing the number of employees in year 2016 in all the firms of the Trento district in Northern Italy.
TN2016
TN2016
A numerical vector with 183 rows and 1 column.