Executive Summary : | The theory of runs has been successfully applied in various fields of science and engineering such as DNA sequences, climatology, reliability theory, computer science and statistical testing, among many others. In 1768, the first statement about a run was introduced by de Moivre which is "what is the probability of getting a run of length r or more in n trials?". Two decades later, it was briefly discussed by Feller in 1968. Further, it became one of the most important topics in the literature and significant developments have been done by Simpson (1740), Laplace (1812), Todhunter (1865) and Marbe (1934), among others. In probability and statistics, a run is defined in pursuance of its significance in the common language which is an uninterrupted sequence. The study of runs is primarily proposed to obtain the distributional properties of the number of consecutive k successes (or failures) in a sequence of independent and identically distributed (iid) Bernoulli trials. In particular, the study of the waiting-time distribution of runs plays a crucial role in several areas of science due to its applicability in many real-life applications. Further, It has been developed for several types of runs in terms of overlapping and non-overlapping, among many others. Moreover, the theory is extended for Markov-dependent trials and one of the big challenges is to extend the theory for non-iid trials which was done by Fu and Koutras [J. Amer. Statist. Assoc., 89, 1050-1058. 1994]. Recently, the theory is extended to multi-state trials by considering several patterns including "at least k1 consecutive 1’s followed by at least k2 consecutive 2’s, . . ., at least km consecutive m's" under iid trials. It is difficult to study the distributional properties of runs, in practice, even if, under iid trials and they are intractable for non-iid trials in several cases. It is therefore interesting to find an appropriate distribution close to the considered pattern's distribution. This will help to identify the nature of runs in multi-state trials. In this proposal, the principal investigator (P.I.) identifies the runs in four-state trials based on DNA sequences. The approximation will be studied with some appropriate distributions via Stein’s method. The appropriate method will also be developed, if necessary, for the suitable distribution. The theory will be extended for the waiting time distributions of the identified patterns. Further, approximation results will be obtained under Markov-dependent and non-iid trials. Moreover, the P.I. will also try to generalize the results for more than four-state trials. The P.I. is confident about the positive outcome of the project and its appreciation in the scientific community. |