AIOU Hub: Statistics-I (394) - Autumn 2022

Statistics-I (394)

Q.1: a) Explain in detail the main aspects of a statistical problem. (20)

Four things make a problem statistical: the way in which you ask the question, the role and nature of the data, the particular ways in which you examine the data, and the types of interpretations you make from the investigation. A statistics problem typically contains four components:

Dear Student,

Ye sample assignment h. Ye bilkul copy paste h jo dusre student k pass b available h. Agr ap ne university assignment send krni h to UNIQUE assignment hasil krne k lye ham c contact kren:

0313-6483019

0334-6483019

0343-6244948

University c related har news c update rehne k lye hamra channel subscribe kren:

AIOU Hub

1. Ask a Question

Asking a question gets the process started. It’s important to ask a question carefully, with an understanding of the data you will use to find your answer.

2, Collect Data

Collecting data to help answer the question is an important step in the process. You obtain data by measuring something, so your measurement methods must be chosen with care. Sampling is one way to collect data; experimentation is another.

3. Analyze Data

Data must be organized, summarized, and represented properly in order to provide good answers to statistical questions. Also, the data you collect usually vary (i.e., they are not all the same), and you will need to account for the sources of this variation.

4. Interpret Results

After you analyze your data, you must interpret it in order to provide an answer — or answers — to the original question.

This four-step process for solving statistical problems is the foundation of all the activities in this course. You will become increasingly familiar with this process as you investigate different statistical problems.

b) Define the following terms:

i) Population and sample

A population is the entire group that you want to draw conclusions about.

A sample is the specific group that you will collect data from. The size of the sample is always less than the total size of the population.

In research, a population doesn’t always refer to people. It can mean a group containing elements of anything you want to study, such as objects, events, organizations, countries, species, organisms, etc.

In statistics, a population is a set of similar items or events which is of interest for some question or experiment. A statistical population can be a group of existing objects or a hypothetical and potentially infinite group of objects conceived as a generalization from experience.

ii) Parameter and statistic

A parameter is a number describing a whole population (e.g., population mean), while a statistic is a number describing a sample (e.g., sample mean).

The goal of quantitative research is to understand characteristics of populations by finding parameters. In practice, it’s often too difficult, time-consuming or unfeasible to collect data from every member of a population. Instead, data is collected from samples.

With inferential statistics, we can use sample statistics to make educated guesses about population parameters.

Q.2 a) What is a histogram? What are the steps which you take to make histogram for continuous grouped data?

A histogram is a graphical representation of data points organized into user-specified ranges. Similar in appearance to a bar graph, the histogram condenses a data series into an easily interpreted visual by taking many data points and grouping them into logical ranges or bins.

Histograms are commonly used in statistics to demonstrate how many of a certain type of variable occur within a specific range.

For example, a census focused on the demography of a town may use a histogram to show how many people are between the ages of zero - 10, 11 - 20, 21 - 30, 31 - 40, 41 - 50, 51 -60, 61 - 70, and 71 - 80.

This histogram example would look similar to the chart below. Let's say the numerals along the vertical access represent thousands of people. To read this histogram example, you can start with the horizontal axis and see that, beginning on the left, there are approximately 500 people in the town who are from less than one year old to 10 years old. There are 4,000 people in town who are 11 to 20 years old. And so on.

Histograms can be customized in several ways by analysts. They can change the interval between buckets. In the example referenced above, there are eight buckets with an interval of ten. This could be changed to four buckets with an interval of 20.

Another way to customize a histogram is to redefine the y-axis. The most basic label used is the frequency of occurrences observed in the data. However, one could also use percentage of total or density instead.

b) The following data gives the record of a company’s savings over the years. Draw a bar diagram to represent it.

Year	1950	1951	1952	1953	1954	1955	1956	1957
Rs.(000)	1010	2050	3458	1980	2300	1295	1520	1070

c) Give the merits and demerits of arithmetic mean. (20)

Merits of Mean :

1) Arithmetic mean rigidly defined by Algebraic Formula.

2) It is easy to calculate and simple to understand.

3) It is based on all observations of the given data.

4) It is capable of being treated mathematically hence it is widely used in statistical analysis.

5) Arithmetic mean can be computed even if the derailed distribution is not known but some of the observation and number of the observation are known.

6) It is least affected by the fluctuation of sampling.

7) For every kind of data mean can be calculated.

Demerits of Arithmetic mean :

1) It can neither be determined by inspection or by graphical location.

2) Arithmetic mean can not be computed for qualitative data like data on intelligence honesty and smoking habit etc.

3) It is too much affected by extreme observations and hence it is not adequately represent data consisting of some extreme point.

4) Arithmetic mean can not be computed when class intervals have open ends.

5) If any one of the data is missing then mean can not be calculated.hhvh kh

Q.3 a) Find mean for the following distribution, where D= X-18 (20)

D	-12	-8	-4	0	4	8	12	16
F	2	5	8	18	22	13	8	4

X = -30, -26, -22, -18, -14, -10, -6, -2

Fx = (-30*2)+(-26*5)+(-22*8)+(-18*18)+(-14*22)+(-10*13)+(-6*8)+(-2*4)

=-1184

Total frequency = 80

Mean = -1184/80

= -14.8

b) Reciprocals of x are given below;

0.0267, 0.0235, 0.0211, 0.0191, 0.0174, 0.0160, 0.0148

Calculate Harmonic mean of the data.

The harmonic mean, like the arithmetic mean and the geometric mean is a type of average, a measure of central tendency.

This calculator uses the following formula to calculate the harmonic mean:

$H = \frac{n}{\sum_{i=1}^n \frac{1}{x_i}}$

where n is the total number of values and x_i (x₂, x₁, ... ,x_n) are the individual numbers in the data set. The formula is equivalent to:

$\frac{n}{ \frac{1}{x_1}+\frac{1}{x_2}+\frac{1}{x_3}+\cdot\cdot\cdot+\frac{1}{x_n} }$

In words: The reciprocal of the arithmetic mean of the reciprocals.

Harmonic mean (H): 0.019058271308377

c) Explain the factors which we consider in selection of suitable measure of central tendency.

Mean is generally considered the best measure of central tendency and the most frequently used one. However, there are some situations where the other measures of central tendency are preferred.

Median is preferred to mean when:

There are few extreme scores in the distribution.

Some scores have undetermined values.

There is an open ended distribution.

Data are measured in an ordinal scale.

Mode is the preferred measure when data are measured in a nominal scale. Geometric mean is the preferred measure of central tendency when data are measured in a logarithmic scale.

Q.4: a) What is meant by skewness and kurtosis? What aspects of the frequency curve

are measured by them?

Skewness is a measure of symmetry, or more precisely, the lack of symmetry. A distribution, or data set, is symmetric if it looks the same to the left and right of the center point.

Kurtosis is a measure of whether the data are heavy-tailed or light-tailed relative to a normal distribution. That is, data sets with high kurtosis tend to have heavy tails, or outliers. Data sets with low kurtosis tend to have light tails, or lack of outliers. A uniform distribution would be the extreme case.

Skewness

It is the degree of distortion from the symmetrical bell curve or the normal distribution. It measures the lack of symmetry in data distribution.

It differentiates extreme values in one versus the other tail. A symmetrical distribution will have a skewness of 0.

There are two types of Skewness: Positive and Negative

Positive Skewness means when the tail on the right side of the distribution is longer or fatter. The mean and median will be greater than the mode.

Negative Skewness is when the tail of the left side of the distribution is longer or fatter than the tail on the right side. The mean and median will be less than the mode.

So, when is the skewness too much?

The rule of thumb seems to be:

If the skewness is between -0.5 and 0.5, the data are fairly symmetrical.

If the skewness is between -1 and -0.5(negatively skewed) or between 0.5 and 1(positively skewed), the data are moderately skewed.

If the skewness is less than -1(negatively skewed) or greater than 1(positively skewed), the data are highly skewed.

Example

Let us take a very common example of house prices. Suppose we have house values ranging from $100k to $1,000,000 with the average being $500,000.

If the peak of the distribution was left of the average value, portraying a positive skewness in the distribution. It would mean that many houses were being sold for less than the average value, i.e. $500k. This could be for many reasons, but we are not going to interpret those reasons here.

If the peak of the distributed data was right of the average value, that would mean a negative skew. This would mean that the houses were being sold for more than the average value.

Kurtosis

Kurtosis is all about the tails of the distribution — not the peakedness or flatness. It is used to describe the extreme values in one versus the other tail. It is actually the measure of outliers present in the distribution.

High kurtosis in a data set is an indicator that data has heavy tails or outliers. If there is a high kurtosis, then, we need to investigate why do we have so many outliers. It indicates a lot of things, maybe wrong data entry or other things. Investigate!

Low kurtosis in a data set is an indicator that data has light tails or lack of outliers. If we get low kurtosis(too good to be true), then also we need to investigate and trim the dataset of unwanted results.

Mesokurtic: This distribution has kurtosis statistic similar to that of the normal distribution. It means that the extreme values of the distribution are similar to that of a normal distribution characteristic. This definition is used so that the standard normal distribution has a kurtosis of three.

Leptokurtic (Kurtosis > 3): Distribution is longer, tails are fatter. Peak is higher and sharper than Mesokurtic, which means that data are heavy-tailed or profusion of outliers.

Outliers stretch the horizontal axis of the histogram graph, which makes the bulk of the data appear in a narrow (“skinny”) vertical range, thereby giving the “skinniness” of a leptokurtic distribution.

Platykurtic: (Kurtosis < 3): Distribution is shorter, tails are thinner than the normal distribution. The peak is lower and broader than Mesokurtic, which means that data are light-tailed or lack of outliers.

The reason for this is because the extreme values are less than that of the normal distribution.

b) What are moments about mean and about an arbitrary value? Give the relation between them.

The shape of any distribution can be described by its various ‘moments’. The first four are:

1) The mean, which indicates the central tendency of a distribution.

2) The second moment is the variance, which indicates the width or deviation.

3) The third moment is the skewness, which indicates any asymmetric ‘leaning’ to either left or right.

4) The fourth moment is the Kurtosis, which indicates the degree of central ‘peakedness’ or, equivalently, the ‘fatness’ of the outer tails

Mathematically interrelated and related to other moments.

All have the same assumptions.

They provide the only measures of skewness and kurtosis.

They provide sufcient information to reconstruct a frequency distribution function.

c) Compute median and mean deviation from median for the data given below: (20)

X	6	8	10	12	14	16	18	20	22
F	5	10	18	20	22	14	7	3	1

Given data:

6,8,10,12,14,1618,20,22

Finding the median:

Ascending order of the given data is: 6,8,10,12,14,16,18,20,22

Number of data values = 9

Median = (n + 1)/2 th observation

= (9 + 1)/2

= 5th observation

Thus, median = 14

The absolute values of the respective deviations from the median, i.e., |xi − M| are:

|6-14|, |8-14|, |10-14|, |12-14|, |14-14|, |16-14|, |18-14|, |20-14|, |22-14|

= -8,-6,-4,-2,-0,2,4,6,8

As we know,

= (-8-6-4-2-0+2+4+6+8)/9

=0 /9

= 0

Therefore, the mean deviation about the median for the given data is 0.

Q.5: a) Define weighted and unweighted index number and explain why weighted Index numbers are preferred over unweighted index numbers.

When all commodities are not of equal importance, we assign weight to each commodity relative to its importance and the index number computed from these weights is called a weighted index number.

In general, all the commodities cannot be given equal importance, so we can assign weights to each commodity according to their importance and the index number computed from these weights are called as weighted index number. The weights can be production, consumption values. If ‘w’ is the weight attached to a commodity, then the price index is given by,

Let us consider the following notations,

p₁ - current year price

p₀ - base year price

q₁ - current year quantity

q₀ - base year quantity

where suffix ‘0’ represents base year and ‘1’ represents current year.

Unweighted indexes are rare, as most indexes are based on market capitalizations, whereby companies with larger market caps are accorded higher index weights than companies with lower market caps.

The most prominent of the unweighted stock indexes is the S&P 500 Equal Weight Index (EWI), which is the unweighted version of the widely-used S&P 500 Index. The S&P 500 EWI includes the same constituents as the capitalization-weighted S&P 500 Index, but each of the 500 companies is allocated a fixed percentage weight of 0.2%.

Implications for Index Funds and ETFs

Passive fund managers construct index funds or exchange-traded funds (ETFs) based on leading indexes such as the S&P 500 Index, which is a weighted index.

Most choose to mimic their investment vehicles on market capitalization-weighted indexes, which means they must buy more of the stocks that are rising in value to match the index, or sell more of the stocks that are declining in value. This can create a circular situation of momentum where an increase in a stock's value leads to more buying of the stock, which will add to the upward pressure on the price. The reverse is also true on the downside.

An index fund or ETF structured on an unweighted index, on the other hand, sticks to equal allocations among the components of an index. In the case of the S&P 500 Equal Weight Index, the fund manager would periodically rebalance investment amounts so that each is 0.2% of the total.

Is Unweighted or Weighted Better?

One type of index isn't necessarily better than another, they are just showing different things. The weighted index shows performance typically by market capitalization, while the unweighted index reflects unweighted performance across the index's components.

One of the pitfalls of a weighted index is that returns will be based largely on the most heavily weighted components, and the smaller component returns may be hidden or have little effect. This could mean that most of the stocks in the S&P 500, for example, are actually declining even though the index is rising because the stocks with the most weight are rising while most of the stocks with little weight are falling.

The flip side of this argument is that smaller companies come and go, and therefore they shouldn't be given as much weight as the large companies with a much larger shareholder base.

An unweighted or equal weight index reflects how a whole pool of stocks is doing. It may be a better index for an investor who isn't investing in the most heavily weighted stocks of a weighted index, or is more interested in whether most stocks are moving higher or lower. The unweighted index does a better job of showing this than a weighted index.

In terms of performance, sometimes an unweighted index outperforms the weighted index, and other times the reverse is true. When deciding which is a better index to track or mimic, look at the performance and volatility of both to assess which is the better option.

b) Construct chain indices for the following years, taking 1940 as base. (20)

Item	Year
Item	1940	1941	1942	1943	1944
Wheat	2.80	3.40	3.60	4.00	4.20
Rice	2.95	3.60	2.90	2.75	2.75
Maize	3.10	3.50	3.40	4.50	3.70

	1940	1941	1942	1943	1944
Wheat	2.8	3.4	3.6	4	4.2
Link Relatives	2.8	121.4286	105.8824	111.1111	105
Chain Indices	100	121.4286	128.5714	142.8571	150
Rice	2.95	3.6	2.9	2.75	2.75
Link Relatives	2.95	122.0339	80.55556	94.82759	100
Chain Indices	100	122.0339	98.30508	93.22034	93.22034
Maize	3.1	3.5	3.4	4.5	3.7
Link Relatives	3.1	112.9032	97.14286	132.3529	82.22222
Chain Indices	100	112.9032	109.6774	145.1613	119.3548

Dear Student,

Ye sample assignment h. Ye bilkul copy paste h jo dusre student k pass b available h. Agr ap ne university assignment send krni h to UNIQUE assignment hasil krne k lye ham c contact kren:

0313-6483019

0334-6483019

0343-6244948

University c related har news c update rehne k lye hamra channel subscribe kren:

AIOU Hub

AIOU Hub

Thursday, October 13

Statistics-I (394) - Autumn 2022 - Assignment 1

Contact Us