Dr. Samira Zahmatkesh, Prof. Mohsen Mohammadzadeh,
Volume 7, Issue 4 (5-2021)
Abstract
Mr. Mehdi Amiri, Mr. Ahad Jamalizadeh, Mr. Salman Izadkhah,
Volume 8, Issue 1 (3-2022)
Abstract
In this paper, random vectors following the multivariate generalized hyperbolic (GH) distribution are compared using the hessian stochastic order. This family includes the classes of symmetric and asymmetric distributions by which different behaviors of kurtosis in skewed and heavy tail data can be captured. By considering some closed convex cones and their duals, we derive some necessary and sufficient conditions for some important applied stochastic orderings. The linear convex orderings are shown to be equivalent with a certain kind of hessian orderings. Based on copulas generated by the GH distributions, it is revealed that the ordering of the GH distributions in terms of their dependence structures corresponds to some hessian stochastic orderings being satisfied. The results are shown to be relevant to some insurance and economic applications.
Dr. Nasiri, Dr. Leila Nasiri,
Volume 8, Issue 1 (3-2022)
Abstract
Mohammad Moghadam, Prof. Mohsen Mohammadzadeh,
Volume 8, Issue 2 (6-2022)
Abstract
Introduction
Estimating the spatial hazard, or in other words, the probability of exceeding a certain boundary is one of the important issues in environmental studies that are used to control the level of pollution and prevent damage from natural disasters. Risk zoning provides useful information to decision-makers; For example, in areas where spatial hazards are high, zoning is used to design preventive policies to avoid adverse effects on the environment or harm to humans.
Generally, the common spatial risk estimating methods are for stationary random fields. In addition, a parametric form is usually considered for the distribution and variogram of the random field. Whereas in practice, sometimes these assumptions are not realistic. For an example of these methods, we can point to the Indicator kriging, Disjunctive kriging, Geostatistical Markov Chain, and simple kriging. In practice utilize the parametric spatial models caused unreliable results. In this paper, we use a nonparametric spatial model to estimate the unconditional probability or spatial risk:
rcs0=PZs0⩾c. (1)
Because the conditional distribution at points close to the observations has less variability than the unconditional probability, nonparametric spatial methods will be used to estimate the unconditional probability.
Material and methods
Let Z=Zs1,…,ZsnT be an observation vector from the random field {Zs;s∈D⊆Rd} which is decomposed as follows
Zs=μs+εs, (2)
where μ(s) is the trend and ε(s) is the error term, that is a second-order stationary random field with zero mean and covariogram Ch=Covεs,εs+h. The local linear model for the trend is given by
μHs= e1TSsTWsSs-1 SsTWsZ≡ ϕTsZ,
where e1 is a vector with 1 in the first entry and all other entries 0, Ss is a matrix with ith row equal to (1, (si-s)T), Ws = diag {KHs1 – s,…,KH(sn-s)}, KHu=H-1K(H-1u), K is a triple multiplicative multivariate kernel function and H is a nonsingular symmetric d×d bandwidth matrix. In this model, the bandwidth matrix obtained from a bias corrected and estimated generalized cross-validation (CGCV).
From nonparametric residuals ε(s) = Z(s) -μ(s) a local linear estimate of the variogram 2 γ(⋅)is obtained as the solution of the following least-squares problem
|
minα.βinεi-εj2-α-βT si-sj-u2 KGsi-sj-u, |
where G is the corresponding bandwidth matrix, that obtained from minimizing cross-validation relative squared error of semi-variogram estimate.
Algorithm1: Semiparametric Bootstrap
- Obtain estimates of the error covariance and nonparametric residuals covariance.
- Generate bootstrap samples with the estimated spatial trend μHs and adding bootstrap errors generated as a spatially correlated set of errors.
- Compute the kriging prediction Z*s0 at each unsampled location s0 from the bootstrap sample Z*s1,…,Z*sn.
- Repeat steps 2 and 3 a larger number times B. Therefore, for each un-sampled location s0, B bootstrap replications Z*(1)s0.…. Z*(B)(s0) are obtained.
- Calculate (1) at position s_0 by calculating the relative frequency of Bootstrap repetition as follows to estimate the unconditional probability of excess of boundary c.
Results and discussion
To analysis the practical behavior of the proposed methods a simulation study is conducted under different scenarios. For N=150 samples and n=16×16 were generated on a regular grid in the unit square following model (2), with mean function
μs=2.5 + sin( 2π x1) + 4x2 - 0.5 2,
and random errors normally distributed with zero mean and isotropic exponential covariogram
Ch= 0.04 + 2.01 1- exp-3 ∥ h∥0.5, h∈ R2.
For comparing nonparametric spatial methods for estimate unconditional risk, conditional risk, and Indicator kriging, we considered 7 missing observations in certain situations. Empirical spatial risk and its estimates are presented in Table 1. The Indicator kriging is overestimating and estimate spatial risk larger than 1. Generally, an estimated risk with unconditional and conditional methods is near value to empirical value.
Table 1. Empirical spatial risk and its estimates
Locations |
|
(0.13, 0) |
(0.87, 0.87) |
(0.80, 0.20) |
(0.94, 0.27) |
(0, 0.47) |
(074, .60) |
(0.34, 0.60) |
Methods |
0.999 |
0.300 |
0.069 |
0.317 |
0.504 |
0.011 |
0.989 |
Empirical |
0.998 |
0.351 |
0.054 |
0.347 |
0.494 |
0.057 |
0.954 |
Conditional |
1.002 |
0.230 |
0.091 |
0.091 |
0.652 |
0.006 |
0.996 |
Indicator |
1.000 |
0.388 |
0.418 |
0.481 |
0.602 |
0.024 |
0.994 |
Unconditional |
The spatial risk mapping for the maximum temperature means of Iran in 364 stations in March 2018 is obtained. By applying Algorithm 1 final trend and semi-variogram estimates are smoother than the pilot version.
The conditional and unconditional spatial risk with 150 bootstrap replicates for two values of threshold 25 and 31 on a 75×75 grid are estimated. The unconditional risk estimate is smoother than the conditional risk estimate. Because of this in the unconditional version, biased residual unused directly in the spatial prediction but in the conditional risk estimating, original residuals and simple kriging used.
Conclusion
The spatial risk estimated with the nonparametric spatial method. For the trend and variability of the random field, modeling applied a local linear nonparametric model. In the simulation study, this method better results than Indicator kriging. Because the flexibility of the nonparametric spatial method could apply for the construction of confidence or prediction intervals and hypothesis testing.
Dr. Elham Basiri,
Volume 8, Issue 2 (6-2022)
Abstract
Introduction
Censored sample arises in a life-testing experiment whenever the experimenter does not observe the failure times of all units placed on a life-test. In medical or industrial studies, researchers have to treat the censored data because they usually do not have sufficient time to observe the lifetime of all subjects in the study. There are different types of censoring. The most common censoring schemes are type I and type II censoring schemes. Progressively type II censoring is also one of the most important methods of censoring.
One of the most common questions any statistician gets asked is "How large a sample size do I need?". Researchers are often surprised to find out that the answer depends on a number of factors and they have to give the statistician some information before they can get an answer. So far different answers have been given to respond this question by considering different criteria.
Cost criterion is one of the criteria that has always been of interest to researchers. So far, many researchers have used this criterion for determining the size of samples in different censoring methods.
In some applications, such as clinical trials and quality control, it is almost impossible to have a fixed sample size all the time because some observations may be missing for various reasons. In other words, the sample size is a random variable.
Material and methods
In this paper, a cost function is introduced. Then, assuming that the sample size of progressively type II censoring is a random variable from the truncated binomial distribution, the optimal parameter of sample size distribution in progressively type II censoring, is determined. This optimal parameter is determined so that the introduced cost function does not exceed a pre-determined value, say
. In this article, the exponential distribution is considered for lifetimes of observations. A simulation study is also provided to evaluate the obtained results. Finally, the conclusion of the article is presented.
Results and discussion
We have computed the values of the expected cost function by considering three different censoring schemes. The results show that the expected cost function is an increasing function of m but a decreasing function of θ, when other components are fixed, as we expected. Also, we can find that considering type II censoring leads to better results than other censoring schemes. On the other hand, we can conclude that type II censoring provides the minimum cost among two other censoring schemes. In the sequel, by assuming an upper bound for the cost function, say
, the optimal parameter of sample size distribution is obtained.
Conclusion
Determining the optimal sample size is one of the issues that has been studied by many researchers. In some cases, it is not possible for the sample size to be a fixed and pre-determined value. In other words, the sample size is a random variable. In this paper, assuming that the sample size of progressively type II censoring is a random variable from the truncated binomial distribution, the optimal parameter of the sample size distribution is determined. The criterion used in this research is the cost criterion. Next, the optimal parameter of the sample size distribution is determined so that the value of the cost function is less than the specified and predetermined value, say
. The results of the paper show that the type II censoring provides less values for the cost function. For all three censorsing schemes, the cost function is an increasing function of m but a decreasing function of θ, when other components are fixed, as we expected. As a result, the best case scenario is taking into account the type II censoring scheme, selecting smaller values for m, larger values for θ, and smaller values for the parameter of sample size distribution.
Ahad Rahimpoor, Masoud Yarmohammadi,
Volume 8, Issue 4 (12-2022)
Abstract
Multivariate time series data, often, modeled using vector autoregressive moving average (VARMA) model. But presence of outliers can violates the stationary assumption and may lead to wrong modeling, biased estimation of parameters and inaccurate prediction. Thus, detection of these points and how to deal properly with them, especially in relation to modeling and parameter estimation of VARMA model is necessary.
By detecting outliers, their effect can be eliminated over time and we obtain the modified data. Using this modified data, the proper estimates of the VARMA model are obtained which have the least effect on the outliers. On the other hand, detect of outliers is important in finding an external event over time. For example, by finding outliers in river water monitoring data, flood times can be obtained.
The parameter estimation of VAR model is less time consuming than VARMA. On the other hand, under condition of invertibility, VARMA models could be approximated by VAR(p) for large p. Therefore, we use this model to fit and investigate the data generated from VARMA models that contaminated by outliers.
Multivariate observations of time series may be contaminated with different types of outliers. However, the effect of different types of outliers in multivariate and univariate case is different, and this observation must be assessed by multivariate approach. In this research, we use a Genetic Algorithm (GA) to develop a procedure for detecting different types of outliers (additive, innovation, level shift and temporary change outliers) in a multivariate time series. GA detects outlier location which minimizes Akaike-like Information Criterion (AIC) and we try to "minimize the number of outliers" and "maximize the likelihood function".
GA is a numerical optimization algorithm whose idea is based on natural selection and natural genetics. This algorithm does not require strong assumptions to obtain the optimal value of a function and has the ability to search for the optimal solution from a space with several local optimal. That is, for example, if a function has several relative maxima, GA finds the absolute maximum of this function as well.
For minimization of a function, GA operates by first generating, at random or optionally, several minimal solutions to the function that this set of solutions called the initial population and each solution as a chromosome. Then, using reproductive operators, we combine chromosomes and make a jump into them. If the function of newly produced chromosomes is lower than the previous chromosomes, these chromosomes can be added to the initial population or replaced with chromosomes with less function in this population. This process is repeated until convergence occurs or the end number of itteration obtained.
Furthermore, we introduce another method of detecting outliers, the Tsay Pena and Pankratz (TPP) method. TPP uses some test statistics based on outliers size and VAR parameters. This method detects outliers in three stages. In stage I, it detects one by one outliers and remove their effects. Iteration done until no outlier found. In stage II, for detected outlier in stage I, the estimation of outliers effects are obtained simultaneously. Then, outliers with insignificant effects are removed. The VAR parameters re-estimated based on modified series of this stage. In stage III, we repeated stage I and II with new VAR parameter estimation.
In each iteration of TPP, an outlier is detected and the effect of this outlier is removed from series (modified series). Then the parameter estimation is obtained from the modified series and the next outlier detection is continued using these estimates. This may lead to biased estimates and wrong detection of the next outlier point. In other words, in the TPP method, one detected outlier hides another outlier (masking), or one detected outlier reveals the usual observation as an outlier (swamping). This method often mis-detects the type of outliers. But in each iteration of GA, a random pattern of outliers (for testing) is first generated and a temporary modified series is obtained by removing effect of this pattern from series. Then the estimation of the parameters obtained and the detection of this pattern is tested. This work reduces the effect of the previously identified outliers on the full pattern of the outliers. In fact, if the random pattern of all outliers is correctly generated, almost effect of all of them will be eliminated in the modified series. Therefore using this temporary modified series, the GA obtained more accurate estimates and detected outliers more accurately.
The simulation results confirm the validity of the GA method and the percentage of correct outlier detection in this method is higher than the TPP method. GA, of course, needs more time to calculate. Also, although the VAR model is used in both detection methods, the percentage of correct outlier detection in the VARMA model data is similar to the VAR model.
Gas-furnace data were analyzed and modeled and it was determined that GA and TPP methods detected similar outliers. Fitting the VAR(6) model on these data shows that the variance of input gas error in modified data of GA to TPP is reduced by 17% and the variance of carbon dioxide error in the modified data of GA to TPP reduced by 43%.
Mehran Naghizadeh Qomi, Zohreh Mahdizadeh,
Volume 9, Issue 2 (9-2023)
Abstract
Usually, the traditional estimation methods is used to estimate Parameters of the linear regression model, such as least-squared error method. Sometimes the researcher has information about the unknown intercept parameter as a guess that is called as non-sample prior information. In this article, a preliminary test estimator for the intercept parameter of the simple linear regression according to the non-sample prior information is introduced and the value of its risk function under the reflected normal loss function is investigated. Also, the behavior of shrinkage pretest estimator is compared with respect to the least-squares estimator using a simulation. The intervals where the shrinkage pretest estimator has the least risk compared to the least-squares estimator presented. The results show that the shrinkage pretest estimator outperforms the least-squares estimator when non-sample prior information is close to the real value. Also, the optimum value of the significant level of test is determined using max-min method. Then, proposed estimators are compared using a real data set.
Masoud Yarmahammadi, Maryam Movahedifar,
Volume 9, Issue 3 (12-2023)
Abstract
Singular Spectrum Analysis (SSA) is a new powerful method in time series analysis. This non-parametric method due to its unique properties, such as there being nonecessity as to making assumptions about stationarity of time series and also about the normality of residuals, has caught the attention of many researchers in the field of Econometrics and time series analysis and its applications are increasingly getting wide spread. Also this method can be used for short time series. The main purpose of SSA method is to decompose time series into interpretable components such as trend, oscillatory component, and unstructured noise. The basis of SSA is singular value decomposition of the trajectory matrix built on the time series. In the basic SSA method the frequency of observations which used in the trajectory matrix is different and so there may be an error in reconstructing and forecasting the time series, especially at the beginning and end of the series. It occurs because the magnitude of eigenvalues, eigenvectors, and consequently, reconstruction and forecasting of future values of time series, is directly related to the trajectory matrix. The purpose of this paper is to improve the trajectory matrix of SSA method to increase the accuracy of the reconstructed time series and forecasting results, which is called singular spectrum decomposition (SSD). In this paper, SSA and SSD methods and their properties are briefly introduced and then the performance of SSD method over SSA method in time series reconstruction and forecasting for simulated and real data is discussed.
Dr Hassan Esfandyarifar, Mr Karim Ahmadi Somaeh,
Volume 9, Issue 3 (12-2023)
Abstract
In this paper, the two-observational percentile, percentile and maximum likelihood estimation of the probability density function of Inverse Weibull random variable are studied. Finally, these estimates are compared using simulation studies and a real data.
Dr Mohammad Moradi, Elnaz Kasani,
Volume 10, Issue 2 (7-2024)
Abstract
Stratified sampling is one of the most widely used sampling designs. In some cases, it is up to the researcher to determine the boundaries of the strata, and in some cases, the population is already stratified. The optimal classification is obtained for a situation of strata boundries, where the variance of the population mean (or total) estimator reaches its lowest value. In traditional methods, the variance of the estimator is considered as a function of the strata boiundries for the response variable, in order to reach the minimum of the variance, equations are obtained which are often solved by numerical methods. The first deficiency of this method is not considering all auxiliary variables. For example, in estimating the average income, classifying the society based on factors such as gender and job history can not only increase the efficiency of the estimator, but also make the interpretability and generalizability of the results easier. The second one is complex equations that do not have a closed and understandable solutions
n this paper, we have tried to construct the optimal classification based on a new criterion that is a combination of variance and a penalty for increasing the number of strata, so that important auxiliary variables in the formation of the decision tree determine the boundries of the strata. The classification process starts from the saturated tree and with successive pruning until reaching the root node, the number of strata decreases, the optimal stratification is achieved based on the introduced combined criterion.