Thursday, October 13

Statistics-I (395) - Autumn 2022 - Assignment 1

Statistics-I (395)

Q.1:a) Define the standard normal probability destiny function and the standard normal Cumulative distribution function. Give the equation of normal curve with mean 0 And

standard deviation 1.

The single most important distribution in probability and statistics is the normal probability distribution. The density function of a normal probability distribution is bell shaped and symmetric about the mean. The normal probability distribution was introduced by the French mathematician Abraham de Moivre in 1733. He used it to approximate probabilities associated with binomial random variables when n is large. This was later extended by Laplace to the so-called CLT, which is one of the most important results in probability. Carl Friedrich Gauss in 1809 used the normal distribution to solve the important statistical problem of combining observations. Because Gauss played such a prominent role in determining the usefulness of the normal probability distribution, the normal probability distribution is often called the Gaussian distribution. Gauss and Laplace noticed that measurement errors tend to follow a bell-shaped curve, a normal probability distribution. Today, the normal probability distribution arises repeatedly in diverse areas of applications. For example, in biology, it has been observed that the normal probability distribution fits data on the heights and weights of human and animal populations, among others.

Dear Student,

Ye sample assignment h. Ye bilkul copy paste h jo dusre student k pass b available h. Agr ap ne university assignment send krni h to UNIQUE assignment hasil krne k lye ham c contact kren:

0313-6483019

0334-6483019

0343-6244948

University c related har news c update rehne k lye hamra channel subscribe kren:

AIOU Hub

 

We should also mention here that almost all basic statistical inference is based on the normal probability distribution. The question that often arises is, when do we know that our data follow the normal distribution? To answer this question, we have specific statistical procedures that we study in later chapters, but at this point we can obtain some constructive indications of whether the data follow the normal distribution by using descriptive statistics. That is, if the histogram of our data can be capped with a bell-shaped curve (Fig. 3.2), if the stem-and-leaf diagram is fairly symmetrical with respect to its center, and/or by invoking the empirical rule “backward,” we can obtain a good indication of whether our data follow the normal probability distribution.

The standard normal distribution (z distribution) is a normal distribution with a mean of 0 and a standard deviation of 1. Any point (x) from a normal distribution can be converted to the standard normal distribution (z) with the formula z = (x-mean) / standard deviation. z for any particular x value shows how many standard deviations x is away from the mean for all x values. For example, if 1.4m is the height of a school pupil where the mean for pupils of his age/sex/ethnicity is 1.2m with a standard deviation of 0.4 then z = (1.4-1.2) / 0.4 = 0.5, i.e. the pupil is half a standard deviation from the mean (value at centre of curve).

  

b) Find the value of X which corresponds to a standardized value of -2.05 and 0.86 for each of the following distributions.

             i)  X    N (62.3,38)

             ii) X    N (m,s2 )        

             iii) X   N (a,b)      (10+10)

Standard deviation in statistics, typically denoted by σ, is a measure of variation or dispersion (refers to a distribution's extent of stretching or squeezing) between values in a set of data. The lower the standard deviation, the closer the data points tend to be to the mean (or expected value), μ. Conversely, a higher standard deviation indicates a wider range of values. Similar to other mathematical and statistical concepts, there are many different situations in which standard deviation can be used, and thus many different equations. In addition to expressing population variability, the standard deviation is also often used to measure statistical results such as the margin of error. When used in this manner, standard deviation is often called the standard error of the mean, or standard error of the estimate with regard to a mean. The calculator above computes population standard deviation and sample standard deviation, as well as confidence interval approximations.

Population Standard Deviation

The population standard deviation, the standard definition of σ, is used when an entire population can be measured, and is the square root of the variance of a given data set. In cases where every member of a population can be sampled, the following equation can be used to find the standard deviation of the entire population:

https://d26tpo4cm8sb6k.cloudfront.net/img/standard-dev.gif

Where

xi is an individual value
μ is the mean/expected value
N is the total number of values

For those unfamiliar with summation notation, the equation above may seem daunting, but when addressed through its individual components, this summation is not particularly complicated. The i=1 in the summation indicates the starting index, i.e. for the data set 1, 3, 4, 7, 8, i=1 would be 1, i=2 would be 3, and so on. Hence the summation notation simply means to perform the operation of (xi - μ)2 on each value through N, which in this case is 5 since there are 5 values in this data set.

EX:           μ = (1+3+4+7+8) / 5 = 4.6        
σ = √[(1 - 4.6)2 + (3 - 4.6)2 + ... + (8 - 4.6)2)]/5
σ = √(12.96 + 2.56 + 0.36 + 5.76 + 11.56)/5 = 2.577

 

Q.2: a) What is the difference between precision and accuracy of a result? Explain with Examples.

What is Accuracy?

Accuracy is defined as ‘the degree to which the result of a measurement conforms to the correct value or a standard’ and essentially refers to how close a measurement is to its agreed value.

What is Precision?

Precision is defined as ‘the quality of being exact’ and refers to how close two or more measurements are to each other, regardless of whether those measurements are accurate or not. It is possible for precision measurements to not be accurate.

 

What is the difference between Accuracy and Precision?

Both accuracy and precision reflect how close a measurement is to an actual value, but they are not the same. Accuracy reflects how close a measurement is to a known or accepted value, while precision reflects how reproducible measurements are, even if they are far from the accepted value. Measurements that are both precise and accurate are repeatable and very close to true values.

 

Example of the difference between Accuracy and Precision…

The example of a darts board is often used when talking about the difference between accuracy and precision.

Accurately hitting the target means you are close to the centre of the target, even if all the marks are on different sides of the centre. Precisely hitting a target means all the hits are closely spaced, even if they are very far from the centre of the target.

 

How can Precisa help you with your Precision Measurements?

At Precisa, ensuring the precision of your measurements is our top priority.

Not only do we produce analytical and precision balances which measure up to 0.1 mg, but we also offer calibration services across the UK which guarantee that your readings are 100% precise and accurate.

 

b) What are two broad categories of errors in data collected by sample surveys? What are the methods for reducing sampling error?

The accuracy of a survey estimate refers to the closeness of the estimate to the true population value. Where there is a discrepancy between the value of the survey estimate and true population value, the difference between the two is referred to as the error of the survey estimate. The total error of the survey estimate results from the two types of error:

sampling error, which arises when only a part of the population is used to represent the whole population; and

non-sampling error which can occur at any stage of a sample survey and can also occur with censuses. Sampling error can be measured mathematically whereas measuring non-sampling error can be difficult.

It is important for a researcher to be aware of these errors, in particular non-sampling error, so that they can be either minimised or eliminated from the survey. An introduction to measuring sampling error and the effects of non-sampling error is provided in the following sections.

The target population may not be clearly defined through the use of imprecise definitions or concepts. The survey population may not reflect the target population due to an inadequate sampling frame and poor coverage rules. Problems with the frame include missing units, deaths, out-of-scope units and duplicates. These are discussed in detail in Frames and Population.

 

Non-Response Bias

Non-respondents may differ from respondents in relation to the attributes/variables being measured. Non-response can be total (none of the questions answered) or partial (some questions may be unanswered owing to memory problems, inability to answer, etc.). To improve response rates, care should be taken in designing the questionnaires, training of interviewers, assuring the respondent of confidentiality, motivating him/her to co-operate, and calling back at different times if having difficulties contacting the respondent. "Call-backs" are successful in reducing non-response but can be expensive for personal interviews. Non-response is covered in more detail in Non-Response.

 

Questionnaire problems

The content and wording of the questionnaire may be misleading and the layout of the questionnaire may make it difficult to accurately record responses. Questions should not be loaded, double-barrelled, misleading or ambiguous, and should be directly relevant to the objectives of the survey.

It is essential that questionnaires are tested on a sample of respondents before they are finalised to identify questionnaire flow and question wording problems, and allow sufficient time for improvements to be made to the questionnaire. The questionnaire should then be re-tested to ensure changes made do not introduce other problems. This is discussed in more detail in Questionnaire Design.

 

Respondent Bias

Refusals to answer questions, memory biases and inaccurate information because respondents believe they are protecting their personal interest and integrity may lead to a bias in the estimates. The way the respondent interprets the questionnaire and the wording of the answer the respondent gives can also cause inaccuracies. When designing the survey you should remember that uppermost in the respondent's mind will be protecting their own personal privacy, integrity and interests. Careful questionnaire design and effective questionnaire testing can overcome these problems to some extent.

Respondent bias is covered in more detail below.

 

Processing Errors

There are four stages in the processing of the data where errors may occur: data grooming, data capture, editing and estimation. Data grooming involves preliminary checking before entering the data onto the processing system in the capture stage. Inadequate checking and quality management at this stage can introduce data loss (where data is not entered into the system) and data duplication (where the same data is entered into the system more than once). Inappropriate edit checks and inaccurate weights in the estimation procedure can also introduce errors to the data. To minimise these errors, processing staff should be given adequate training and realistic workloads.

 

Misinterpretation of Results

This can occur if the researcher is not aware of certain factors that influence the characteristics under investigation. A researcher or any other user not involved in the collection stage of the data gathering may be unaware of trends built into the data due to the nature of the collection, such as it's scope. (eg. a survey which collected income as a data item with the survey coverage and scope of all adult persons (ie. 18 years or older), would expect to produce a different estimate than that produced by the ABS Survey of Average Weekly Earnings (AWE) simply because AWE includes persons of age 16 and 17 years as part of it's scope). Researchers should carefully investigate the methodology used in any given survey.

 

Time Period Bias

This occurs when a survey is conducted during an unrepresentative time period. For example, if a survey aims to collect details on ice-cream sales, but only collects a weeks worth of data during the hottest part of summer, it is unlikely to represent the average weekly sales of ice-cream for the year.

 

c) A random sample of 36 cases is drawn from a negatively skewed probability distribution with mean of 2 and standard deviation of 3. Find the mean and standard error of the sampling distribution of X (6+7+7)

First we have to find the mean of the given data;

Mean = (5+10+12+15+20)/5 = 62/5 = 10.5

Now, the standard deviation can be calculated as;

S = Summation of difference between each value of given data and the mean value/Number of values.

Hence,

Standard error example

After solving the above equation, we get;

S = 5.35

Therefore, SE can be estimated with the formula;

SE = S/√n

SE = 5.35/√5 = 2.39

 

Q.3: a)          What is meant by estimation? Distinguish between point estimate and interval estimate. Why is an interval estimate more useful?

Estimation (or estimating) is the process of finding an estimate or approximation, which is a value that is usable for some purpose even if input data may be incomplete, uncertain, or unstable. The value is nonetheless usable because it is derived from the best information available.[1] Typically, estimation involves "using the value of a statistic derived from a sample to estimate the value of a corresponding population parameter".[2] The sample provides information that can be projected, through various formal or informal processes, to determine a range most likely to describe the missing information. An estimate that turns out to be incorrect will be an overestimate if the estimate exceeds the actual result[3] and an underestimate if the estimate falls short of the actual result.

 

Point Estimation

Anna surveys 100 random people out of her town about building a dog park. This is known as a sample. A sample is a part of a population used to describe the whole group. Anna will use the data that she gathers from this sample to describe the population of the town. The data Anna gathers will be statistics.

A statistic is the characteristics of a sample used to infer information about the population. For example, Anna may include questions on her survey such as age and number of pets. If the mean age for her sample is 32, then 32 is a sample statistic. She might infer that the average age of the people in town is also 32.

A point estimation is a type of estimation that uses a single value, a sample statistic, to infer information about the population. Point estimation can be a sample statistic. The sample mean of age for the sample, 32, can be used as a point estimation.

Point estimation is a single value that can be inferred as a population parameter. Let's discuss populations, parameters, and their relationship to point and interval estimations.

 

Interval Estimation

After Anna collected her data from the survey, she can now draw inferences from her sample statistics about the population. A population is all members of a specified group. In this case, the population is literally the population of a town. However, a population can mean many things. For example, if you are doing research about college students, then your population would be all college students. If your research is about high school athletes, then your population would be all high school athletes. It's mostly up to the researcher to define the population.

Once sample information is gathered, the researcher can make inferences about the population, known as a parameter. A parameter is the characteristics used to describe a population. You can use a sample statistic to develop population parameters. For example, Anna found that of the 100 people she surveyed, each household owns an average of two pets. This is a sample statistic. However, Anna can also use this statistic to infer that everyone in the population also owns an average of two pets; at this point it is considered a population parameter. If we were estimating from a single value, such as the average of two pets, then we could say that this population parameter was a point estimation.

However, what about the people that own more than two pets or no pets? Can 100 people really show us the characteristics of an entire town? This is where interval estimation comes in handy. Interval estimation is the range of numbers in which a population parameter lies considering margin of error. Because there is a certain level of uncertainty, an interval estimate gives a range, rather than a single value, of the population parameters.

 

b)       A poll is taken among the residents of a city and the surrounding country to determine the feasibility of a proposal to construct a civic center. If 2400 of 5000 city residents favor the proposal and 1200 of 2000 country residents favor it, find a 95% confidence interval for the true difference in the proportions favouring the proposal to construct the civic center.          (10+10)

Q.4: a)          Why is the z-test usually inappropriate as a test statistic when sample size is small?

There’s a reason Z Test is a necessity for small sample size: it is not just a test for the importance of a specific test (which you should use), but it is an umbrella term for the quantity of information that can be found in a single document. It’s not just that you need to have an independent test, but that you need an independent test as well. The Z Test is an umbrella word for the quantity and quality of information that may be found in the standard document. It is rather a general term that exists in many other fields and is used to refer to the quantity of data that can be seen in a single or multiple documents. For example, the following document may be considered an example of a document that contains information about the number of items that you will be asked to create in the future: The examples below are just two examples that use an independent test. You may want to check out the Z Test on the main page, as it is a good tool for identifying what is expected of small sample size. Just as an example, the Z Test will be applied as follows: Z Test is inappropriate for small sample sizes. 1. Your name and your address Since the Z Test is the standard test for small sample and not just for the number of samples, it is necessary to include a name or address of the project that contains your name or address. If you have a name or an address like: John Smith You may be asked to name your project Smith. The Z Test will give you the name you filled in. Z test is inappropriate. 2. Your name If you are asked to name a project Smith, you will be called Smith. If you are asked who Smith is, you will name Smith. The test will be: name Smith Your name Smith is Smith. 3. Your project name or address A project name is a numerical value that is used to represent the project name of a project. For example, you will have to have a project name: Project Smith Project Name: John Smith The test will be name John Smith – Name click to find out more – – – 5. Your project title or project name The name you will name the project will be: Project Smith Name: John Smith – Project Name: John The project title or title may be a number or a string.    

 

b)  A manufacturer of house dresses sent out advertising by mail. He sent samples of material to each of two groups of 1000 women. For one group he enclosed white returns envelop and for the other group, a blue envelop, he received orders from 9% and 12% respectively. Is it quite certain that the blue envelop will help sals.

          Use a=0.05.  (10+10)

Hypothesis:

https://media.cheggcdn.com/coop/100/1009b014-c87f-4212-978d-6b7081f31687/1614671034172_blob

Given that,

n1 = 1000 p1 = 10% = 0.10

n2 = 1000 p2 = 13% = 0.13

Pooled proportion :

\hat{p} = \frac{n_1p_1+n_2p_2}{n_1+n_2} = \frac{1000*0.10+1000*0.13}{1000+1000} = 0.115

Test statistic,

P-value = 2*P(z < -2.103) = 2* 0.0179 = 0.0358

Decision rule : If p-value is less than level of significance then reject H0, otherwise accept it.

Here p-value is less than 0.05, hence reject H0 .

The proportion of orders received is different at 5% level of significance. Hence we conclude that the colour of the envelope has an effect on the sale.

 

Q.5: a)          An electric company claimed that at least 85% of the parts which they supplied conformed to specifications. A sample 400 parts was tested and 75 did not meet specifications. Can we accept the company’ claim at 0.05 level of significance?

Ho: p >= 0.95 (claim)

Ha: p < 0.95

-----

p-hat = 182/200 = 0.91

z(0.41) = (0.91-0.95)/sqrt[0.95*0.05/200] = -0.04/sqrt[0.95*0.05/200]

= -2.5955

--------------

p-value = P(z < -2.5955) = 0.0047

------

Conclusion: Since the p-value is less than 1%, reject Ho ; reject the claim.

 

b)       In a random sample of 1000 house in a certain city, 618 own color TV sets. Is this sufficient evidence to conclude that 2/3 of houses in this city have color TV sets? Use a +0.02.        (10+10)       

Null hypothesis is:

H0 : p < 2/3

Alternative hypothesis is:

H1 : p = 2/3

population proportion: p = 2/3 = 0.667

sample size: n= 1000, x= 618

sample proportion :  = x/ n =618 / 1000 = 0.618

formula for test statistics is

z = \frac{0.618-0.667}{\sqrt{\frac{0.667(1-0.667)}{1000}}}

z= - 3.29

test statistics = -3.29

So the p value at 0.02 significance is = 0.005

So it is less than 0.05, hence we reject the null hypothesis.

SO the alternative hypthesis is true i.e. there is sufficient evidence.

Dear Student,

Ye sample assignment h. Ye bilkul copy paste h jo dusre student k pass b available h. Agr ap ne university assignment send krni h to UNIQUE assignment hasil krne k lye ham c contact kren:

0313-6483019

0334-6483019

0343-6244948

University c related har news c update rehne k lye hamra channel subscribe kren:

AIOU Hub