Statistics-I
(394)
Q.1: a) Explain in detail the main aspects of a statistical problem. (20)
Four
things make a problem statistical: the way in which you ask the question, the
role and nature of the data, the particular ways in which you examine the data,
and the types of interpretations you make from the investigation. A statistics
problem typically contains four components:
Dear Student,
Ye sample assignment h. Ye bilkul copy paste h jo dusre
student k pass b available h. Agr ap ne university assignment send krni h to UNIQUE assignment hasil krne k lye ham c contact kren:
0313-6483019
0334-6483019
0343-6244948
University c related har news c update rehne k lye hamra
channel subscribe kren:
1. Ask a
Question
Asking a
question gets the process started. It’s important to ask a question carefully,
with an understanding of the data you will use to find your answer.
2,
Collect Data
Collecting
data to help answer the question is an important step in the process. You
obtain data by measuring something, so your measurement methods must be chosen
with care. Sampling is one way to collect data; experimentation is another.
3.
Analyze Data
Data
must be organized, summarized, and represented properly in order to provide
good answers to statistical questions. Also, the data you collect usually vary
(i.e., they are not all the same), and you will need to account for the sources
of this variation.
4.
Interpret Results
After
you analyze your data, you must interpret it in order to provide an answer — or
answers — to the original question.
This
four-step process for solving statistical problems is the foundation of all the
activities in this course. You will become increasingly familiar with this
process as you investigate different statistical problems.
b)
Define the following terms:
i) Population and sample
A
population is the entire group that you want to draw conclusions about.
A sample
is the specific group that you will collect data from. The size of the sample
is always less than the total size of the population.
In
research, a population doesn’t always refer to people. It can mean a group
containing elements of anything you want to study, such as objects, events,
organizations, countries, species, organisms, etc.
In
statistics, a population is a set of similar items or events which is of
interest for some question or experiment. A statistical population can be a
group of existing objects or a hypothetical and potentially infinite group of
objects conceived as a generalization from experience.
ii)
Parameter and statistic
A
parameter is a number describing a whole population (e.g., population mean),
while a statistic is a number describing a sample (e.g., sample mean).
The goal
of quantitative research is to understand characteristics of populations by
finding parameters. In practice, it’s often too difficult, time-consuming or
unfeasible to collect data from every member of a population. Instead, data is
collected from samples.
With
inferential statistics, we can use sample statistics to make educated guesses
about population parameters.
Q.2 a) What is a histogram? What are the steps
which you take to make histogram for
continuous grouped data?
A
histogram is a graphical representation of data points organized into
user-specified ranges. Similar in appearance to a bar graph, the histogram
condenses a data series into an easily interpreted visual by taking many data
points and grouping them into logical ranges or bins.
Histograms
are commonly used in statistics to demonstrate how many of a certain type of
variable occur within a specific range.
For
example, a census focused on the demography of a town may use a histogram to
show how many people are between the ages of zero - 10, 11 - 20, 21 - 30, 31 -
40, 41 - 50, 51 -60, 61 - 70, and 71 - 80.
This
histogram example would look similar to the chart below. Let's say the numerals
along the vertical access represent thousands of people. To read this histogram
example, you can start with the horizontal axis and see that, beginning on the
left, there are approximately 500 people in the town who are from less than one
year old to 10 years old. There are 4,000 people in town who are 11 to 20 years
old. And so on.
Histograms
can be customized in several ways by analysts. They can change the interval
between buckets. In the example referenced above, there are eight buckets with
an interval of ten. This could be changed to four buckets with an interval of
20.
Another
way to customize a histogram is to redefine the y-axis. The most basic label
used is the frequency of occurrences observed in the data. However, one could
also use percentage of total or density instead.
b) The
following data gives the record of a company’s savings over the years. Draw a bar diagram to represent it.
Year |
1950 |
1951 |
1952 |
1953 |
1954 |
1955 |
1956 |
1957 |
Rs.(000) |
1010 |
2050 |
3458 |
1980 |
2300 |
1295 |
1520 |
1070 |
c) Give
the merits and demerits of arithmetic mean.
(20)
Merits
of Mean :
1)
Arithmetic mean rigidly defined by Algebraic Formula.
2) It is
easy to calculate and simple to understand.
3) It is
based on all observations of the given data.
4) It is
capable of being treated mathematically hence it is widely used in statistical
analysis.
5)
Arithmetic mean can be computed even if the derailed distribution is not known
but some of the observation and number of the observation are known.
6) It is
least affected by the fluctuation of sampling.
7) For
every kind of data mean can be calculated.
Demerits
of Arithmetic mean :
1) It
can neither be determined by inspection or by graphical location.
2)
Arithmetic mean can not be computed for qualitative data like data on
intelligence honesty and smoking habit etc.
3) It is
too much affected by extreme observations and hence it is not adequately
represent data consisting of some extreme point.
4)
Arithmetic mean can not be computed when class intervals have open ends.
5) If
any one of the data is missing then mean can not be calculated.hhvh kh
Q.3 a) Find mean for the following distribution,
where D= X-18 (20)
D |
-12 |
-8 |
-4 |
0 |
4 |
8 |
12 |
16 |
F |
2 |
5 |
8 |
18 |
22 |
13 |
8 |
4 |
X = -30,
-26, -22, -18, -14, -10, -6, -2
Fx =
(-30*2)+(-26*5)+(-22*8)+(-18*18)+(-14*22)+(-10*13)+(-6*8)+(-2*4)
=-1184
Total
frequency = 80
Mean =
-1184/80
= -14.8
b)
Reciprocals of x are given below;
0.0267, 0.0235, 0.0211, 0.0191, 0.0174,
0.0160, 0.0148
Calculate Harmonic mean of the data.
The
harmonic mean, like the arithmetic mean and the geometric mean is a type of
average, a measure of central tendency.
This calculator uses the following formula to calculate the
harmonic mean:
where n is the total number of values and xi (x2,
x1, ... ,xn) are the individual numbers in the
data set. The formula is equivalent to:
In words: The reciprocal of the arithmetic mean of the
reciprocals.
Harmonic
mean (H): 0.019058271308377
c) Explain the factors which we consider in
selection of suitable measure of central tendency.
Mean is
generally considered the best measure of central tendency and the most
frequently used one. However, there are some situations where the other
measures of central tendency are preferred.
Median
is preferred to mean when:
There
are few extreme scores in the distribution.
Some
scores have undetermined values.
There is
an open ended distribution.
Data are
measured in an ordinal scale.
Mode is
the preferred measure when data are measured in a nominal scale. Geometric mean
is the preferred measure of central tendency when data are measured in a
logarithmic scale.
Q.4: a) What
is meant by skewness and kurtosis? What aspects of the frequency curve
are measured
by them?
Skewness
is a measure of symmetry, or more precisely, the lack of symmetry. A
distribution, or data set, is symmetric if it looks the same to the left and
right of the center point.
Kurtosis
is a measure of whether the data are heavy-tailed or light-tailed relative to a
normal distribution. That is, data sets with high kurtosis tend to have heavy
tails, or outliers. Data sets with low kurtosis tend to have light tails, or
lack of outliers. A uniform distribution would be the extreme case.
Skewness
It is
the degree of distortion from the symmetrical bell curve or the normal
distribution. It measures the lack of symmetry in data distribution.
It
differentiates extreme values in one versus the other tail. A symmetrical
distribution will have a skewness of 0.
There
are two types of Skewness: Positive and Negative
Positive
Skewness means when the tail on the right side of the distribution is longer or
fatter. The mean and median will be greater than the mode.
Negative
Skewness is when the tail of the left side of the distribution is longer or
fatter than the tail on the right side. The mean and median will be less than
the mode.
So, when
is the skewness too much?
The rule
of thumb seems to be:
If the
skewness is between -0.5 and 0.5, the data are fairly symmetrical.
If the
skewness is between -1 and -0.5(negatively skewed) or between 0.5 and
1(positively skewed), the data are moderately skewed.
If the
skewness is less than -1(negatively skewed) or greater than 1(positively
skewed), the data are highly skewed.
Example
Let us
take a very common example of house prices. Suppose we have house values
ranging from $100k to $1,000,000 with the average being $500,000.
If the
peak of the distribution was left of the average value, portraying a positive
skewness in the distribution. It would mean that many houses were being sold
for less than the average value, i.e. $500k. This could be for many reasons,
but we are not going to interpret those reasons here.
If the
peak of the distributed data was right of the average value, that would mean a
negative skew. This would mean that the houses were being sold for more than
the average value.
Kurtosis
Kurtosis
is all about the tails of the distribution — not the peakedness or flatness. It
is used to describe the extreme values in one versus the other tail. It is
actually the measure of outliers present in the distribution.
High
kurtosis in a data set is an indicator that data has heavy tails or outliers.
If there is a high kurtosis, then, we need to investigate why do we have so
many outliers. It indicates a lot of things, maybe wrong data entry or other
things. Investigate!
Low
kurtosis in a data set is an indicator that data has light tails or lack of
outliers. If we get low kurtosis(too good to be true), then also we need to
investigate and trim the dataset of unwanted results.
Mesokurtic:
This distribution has kurtosis statistic similar to that of the normal
distribution. It means that the extreme values of the distribution are similar
to that of a normal distribution characteristic. This definition is used so
that the standard normal distribution has a kurtosis of three.
Leptokurtic
(Kurtosis > 3): Distribution is longer, tails are fatter. Peak is higher and
sharper than Mesokurtic, which means that data are heavy-tailed or profusion of
outliers.
Outliers
stretch the horizontal axis of the histogram graph, which makes the bulk of the
data appear in a narrow (“skinny”) vertical range, thereby giving the “skinniness”
of a leptokurtic distribution.
Platykurtic:
(Kurtosis < 3): Distribution is shorter, tails are thinner than the normal
distribution. The peak is lower and broader than Mesokurtic, which means that
data are light-tailed or lack of outliers.
The
reason for this is because the extreme values are less than that of the normal
distribution.
b) What are moments about mean and about an
arbitrary value? Give the relation between them.
The
shape of any distribution can be described by its various ‘moments’. The first
four are:
1) The
mean, which indicates the central tendency of a distribution.
2) The
second moment is the variance, which indicates the width or deviation.
3) The
third moment is the skewness, which indicates any asymmetric ‘leaning’ to
either left or right.
4) The
fourth moment is the Kurtosis, which indicates the degree of central
‘peakedness’ or, equivalently, the ‘fatness’ of the outer tails
Mathematically
interrelated and related to other moments.
All have
the same assumptions.
They
provide the only measures of skewness and kurtosis.
They
provide sufcient information to reconstruct a frequency distribution function.
c) Compute median and mean deviation from
median for the data given below: (20)
X |
6 |
8 |
10 |
12 |
14 |
16 |
18 |
20 |
22 |
F |
5 |
10 |
18 |
20 |
22 |
14 |
7 |
3 |
1 |
Given
data:
6,8,10,12,14,1618,20,22
Finding
the median:
Ascending
order of the given data is: 6,8,10,12,14,16,18,20,22
Number
of data values = 9
Median =
(n + 1)/2 th observation
= (9 +
1)/2
= 5th
observation
Thus,
median = 14
The absolute
values of the respective deviations from the median, i.e., |xi − M| are:
|6-14|,
|8-14|, |10-14|, |12-14|, |14-14|, |16-14|, |18-14|, |20-14|, |22-14|
= -8,-6,-4,-2,-0,2,4,6,8
As we
know,
= (-8-6-4-2-0+2+4+6+8)/9
=0 /9
= 0
Therefore,
the mean deviation about the median for the given data is 0.
Q.5: a)
Define weighted and unweighted index number and explain why weighted Index
numbers are preferred over unweighted index numbers.
When all
commodities are not of equal importance, we assign weight to each commodity
relative to its importance and the index number computed from these weights is
called a weighted index number.
In general, all the commodities cannot be given equal importance,
so we can assign weights to each commodity according to their importance and
the index number computed from these weights are called as weighted index
number. The weights can be production, consumption values. If ‘w’ is the weight
attached to a commodity, then the price index is given by,
Let us consider the following notations,
p1 - current year price
p0 - base year price
q1 - current year quantity
q0 - base year quantity
where suffix ‘0’ represents base year and ‘1’ represents current
year.
Unweighted
indexes are rare, as most indexes are based on market capitalizations, whereby
companies with larger market caps are accorded higher index weights than
companies with lower market caps.
The most
prominent of the unweighted stock indexes is the S&P 500 Equal Weight Index
(EWI), which is the unweighted version of the widely-used S&P 500 Index.
The S&P 500 EWI includes the same constituents as the
capitalization-weighted S&P 500 Index, but each of the 500 companies is
allocated a fixed percentage weight of 0.2%.
Implications
for Index Funds and ETFs
Passive
fund managers construct index funds or exchange-traded funds (ETFs) based on
leading indexes such as the S&P 500 Index, which is a weighted index.
Most
choose to mimic their investment vehicles on market capitalization-weighted
indexes, which means they must buy more of the stocks that are rising in value
to match the index, or sell more of the stocks that are declining in value.
This can create a circular situation of momentum where an increase in a stock's
value leads to more buying of the stock, which will add to the upward pressure
on the price. The reverse is also true on the downside.
An index
fund or ETF structured on an unweighted index, on the other hand, sticks to
equal allocations among the components of an index. In the case of the S&P
500 Equal Weight Index, the fund manager would periodically rebalance
investment amounts so that each is 0.2% of the total.
Is
Unweighted or Weighted Better?
One type
of index isn't necessarily better than another, they are just showing different
things. The weighted index shows performance typically by market
capitalization, while the unweighted index reflects unweighted performance
across the index's components.
One of
the pitfalls of a weighted index is that returns will be based largely on the
most heavily weighted components, and the smaller component returns may be
hidden or have little effect. This could mean that most of the stocks in the
S&P 500, for example, are actually declining even though the index is
rising because the stocks with the most weight are rising while most of the
stocks with little weight are falling.
The flip
side of this argument is that smaller companies come and go, and therefore they
shouldn't be given as much weight as the large companies with a much larger shareholder
base.
An
unweighted or equal weight index reflects how a whole pool of stocks is doing.
It may be a better index for an investor who isn't investing in the most
heavily weighted stocks of a weighted index, or is more interested in whether
most stocks are moving higher or lower. The unweighted index does a better job
of showing this than a weighted index.
In terms
of performance, sometimes an unweighted index outperforms the weighted index,
and other times the reverse is true. When deciding which is a better index to
track or mimic, look at the performance and volatility of both to assess which
is the better option.
b) Construct
chain indices for the following years, taking 1940 as base. (20)
Item |
Year |
||||
1940 |
1941 |
1942 |
1943 |
1944 |
|
Wheat |
2.80 |
3.40 |
3.60 |
4.00 |
4.20 |
Rice |
2.95 |
3.60 |
2.90 |
2.75 |
2.75 |
Maize |
3.10 |
3.50 |
3.40 |
4.50 |
3.70 |
|
1940 |
1941 |
1942 |
1943 |
1944 |
Wheat |
2.8 |
3.4 |
3.6 |
4 |
4.2 |
Link
Relatives |
2.8 |
121.4286 |
105.8824 |
111.1111 |
105 |
Chain
Indices |
100 |
121.4286 |
128.5714 |
142.8571 |
150 |
Rice |
2.95 |
3.6 |
2.9 |
2.75 |
2.75 |
Link
Relatives |
2.95 |
122.0339 |
80.55556 |
94.82759 |
100 |
Chain
Indices |
100 |
122.0339 |
98.30508 |
93.22034 |
93.22034 |
Maize |
3.1 |
3.5 |
3.4 |
4.5 |
3.7 |
Link
Relatives |
3.1 |
112.9032 |
97.14286 |
132.3529 |
82.22222 |
Chain
Indices |
100 |
112.9032 |
109.6774 |
145.1613 |
119.3548 |
Dear Student,
Ye sample assignment h. Ye bilkul copy paste h jo dusre
student k pass b available h. Agr ap ne university assignment send krni h to UNIQUE assignment hasil krne k lye ham c contact kren:
0313-6483019
0334-6483019
0343-6244948
University c related har news c update rehne k lye hamra
channel subscribe kren: