Statistics-II (395)
Q. 1 a) What is the importance of normal distribution in statistical theory? Describe its properties?
b) The mean inside diameter of a
sample of 250 washers produced by a machine is 5.05mm and the standard
deviation is 0.05mm. The purpose for which these washers are intended allows a
maximum tolerance in the diameter of 4.95mm to 5.10mm, otherwise the washers are
considered defective.
Statistics-I (394)
Q. 1: a) Define
descriptive and inferential statistics and differentiate between them.
b) Define the following terms:
i) Population and sample ii)
Parameter and statistic
ii) Quantitative variable iv) Qualitative variable
Certainly, I'll provide a detailed explanation of the topics
you mentioned. Due to the text limit, I won't be able to provide a 1000-word
response in one go. Let's start with the first part:
Dear Student,
Ye sample assignment h. Ye bilkul
copy paste h jo dusre student k pass b available h. Agr ap ne university
assignment send krni h to UNIQUE assignment
hasil krne k lye ham c contact kren:
0313-6483019
0334-6483019
0343-6244948
University c related har news c
update rehne k lye hamra channel subscribe kren:
Descriptive and
Inferential Statistics
Descriptive Statistics:
Descriptive statistics involves the organization, analysis,
and presentation of data to provide a summary or description of its main
features. It helps in simplifying large amounts of data in a meaningful way.
Common measures in descriptive statistics include measures of central tendency
(mean, median, mode), measures of variability (range, variance, standard
deviation), and measures of distribution (skewness, kurtosis).
For example, if we have a dataset of the ages of a group of
people, descriptive statistics would help us understand the typical age (mean),
the age at which most people fall (mode), and how spread out the ages are
(standard deviation).
#### Inferential
Statistics:
Inferential statistics involves using data from a sample to
make inferences or draw conclusions about a population. It uses probability
theory to make predictions or generalizations about a larger group based on a
smaller subset of that group. Inferential statistics includes hypothesis
testing, regression analysis, and analysis of variance.
Continuing with the age example, inferential statistics
might involve using the ages of a sample of people to make predictions or
inferences about the ages of the entire population from which the sample was
drawn.
**Differentiation
between Descriptive and Inferential Statistics:**
- **Purpose:**
- Descriptive statistics aim to summarize and describe the
main features of a dataset.
- Inferential statistics make inferences and predictions
about a population based on a sample of that population.
- **Example:**
- Descriptive: Calculating
the average age of a group of students.
- Inferential: Using the average age of a sample to make
predictions about the average age of all students in a school.
- **Data
Representation:**
- Descriptive statistics use charts, graphs, and summary
measures.
- Inferential statistics involve probability distributions
and confidence intervals.
Now, let's move on to the second part of your question.
### Definitions:
#### i) Population
and Sample:
- **Population:**
- The population is the entire group that is the subject of
the study.
- Example: All students in a university.
- **Sample:**
- A sample is a subset of the population.
- Example: A group of 100 students selected from the entire
university.
#### ii) Parameter
and Statistic:
- **Parameter:**
- A parameter is a numerical value that describes a
characteristic of a population.
- Example: The
average income of all households in a city.
- **Statistic:**
- A statistic is a numerical value that describes a
characteristic of a sample.
- Example: The
average income of a sample of 100 households in a city.
#### iii)
Quantitative Variable:
- **Quantitative
Variable:**
- A quantitative variable is a type of variable that takes
numerical values and represents some kind of measurement.
- Example: Height, weight, income.
#### iv) Qualitative
Variable:
- **Qualitative
Variable:**
- A qualitative variable is a type of variable that
represents categories or labels.
- Example: Gender, color, marital status.
This provides an overview of the terms you asked about. If
you'd like more details or examples, feel free to let me know!
Q. 2: a) Write
down the important points for drawing graphs.
b) Write is a frequency distribution?
How is it constructed?
c) Give the merits and demerits of
arithmetic mean and median. (6+7+7)
Certainly, let's
delve into each part of your question:
### a) Important
Points for Drawing Graphs:
Drawing graphs is an essential aspect of data analysis,
providing visual representation for better understanding. Here are key points
for drawing graphs:
1. **Selecting the
Right Type of Graph:**
- Choose a graph type that suits the data and the message
you want to convey. Common types include bar graphs, line graphs, scatter
plots, and pie charts.
2. **Labeling Axes:**
- Clearly label the x-axis and y-axis with appropriate
variable names. Include units of measurement when applicable.
. **Choosing Appropriate Scale:**
- Select a suitable scale for each axis to ensure that the
data fits well within the graph, avoiding crowding or excessive white space.
4. **Title and
Legend:**
- Provide a clear and concise title that summarizes the main
point of the graph. Include a legend if the graph includes multiple data
series.
5. **Color and
Style:**
- Use colors and styles thoughtfully to enhance clarity.
Ensure that colors are distinguishable for those with color vision
deficiencies.
6. **Data Accuracy:**
- Double-check data points to ensure accuracy. Mistakes in
data entry can lead to misleading graphs.
7. **Consistency:**
- Maintain consistency in formatting throughout the graph,
such as bar widths or line styles. This aids in clarity and interpretation.
8. **Highlighting Key
Points:**
- Emphasize important data points or trends using
annotations, arrows, or other visual cues.
9. **Data Source:**
- Include a note about the source of the data to establish
credibility and transparency.
10. **Audience
Consideration:**
- Consider the audience when designing graphs. Ensure that
the graph is understandable to both experts and non-experts.
### b) Frequency
Distribution:
A frequency distribution is a table that displays the
distribution of a set of data. It shows the number of observations falling into
different intervals or categories. The construction involves several steps:
1. **Data
Collection:**
- Gather the raw data that you want to analyze.
2. **Determine the
Number of Classes:**
- Decide on the number of intervals or classes. Too few
classes may oversimplify, while too many can obscure patterns.
3. **Calculate the
Range:**
- Find the range of the data (difference between the maximum
and minimum values).
4. **Calculate Class
Width:**
- Determine the width of each class interval by dividing the
range by the number of classes. Round up to ensure all data points are
included.
5. **Set up the
Classes:**
- Establish the intervals using the class width. The classes
should be mutually exclusive and exhaustive, covering the entire range of data.
6. **Tally and
Count:**
- Tally the number of observations falling into each class
interval.
7. **Create Frequency
Table:**
- Construct a table with columns for classes and their
respective frequencies.
8. **Calculate
Cumulative Frequency:**
- Optionally, add a column for cumulative frequency, which
represents the total frequency up to a given class.
### c) Merits and
Demerits of Arithmetic Mean and Median:
#### Arithmetic Mean:
**Merits:**
1. **Sensitive to all
Values:**
- The mean considers all values in the dataset, making it
sensitive to changes in any observation.
2. **Balancing
Property:**
- The sum of deviations above the mean equals the sum of
deviations below the mean, maintaining balance.
3. **Useful in
Statistical Analysis:**
- The mean is often used in statistical analysis and various
mathematical calculations.
**Demerits:**
1. **Affected by
Extreme Values:**
- Outliers or extreme values can significantly impact the
mean, making it less representative of the central tendency.
2. **Not Appropriate
for Skewed Distributions:**
- In skewed distributions, the mean may not accurately
reflect the central location, as it is influenced by the skewness.
#### Median:
**Merits:**
1. **Not Sensitive to
Extreme Values:**
- The median is not influenced by extreme values or
outliers, making it a robust measure of central tendency.
2. **Appropriate for
Skewed Distributions:**
- It is suitable for describing the central tendency in
skewed distributions.
3. **Simple to
Understand:**
- The median is easy to understand and calculate, especially
for ordinal or interval data.
**Demerits:**
1. **Less Sensitive
to Small Changes:**
- The median may not reflect small changes in the dataset,
particularly when dealing with a large sample.
2. **Not Utilizing
All Data Points:**
- It does not use all the information in the dataset; it only
considers the middle value(s).
In conclusion, both the mean and median have their merits
and demerits. The choice between them depends on the nature of the data and the
specific goals of the analysis.
Q. 3:a) Define Histogram. Draw a Histogram for the following
frequency distribution:
X 32 37 42 47 52 57 62 67
f 3 17 28 47 54 31 14 4
b) Define measures of location. Explain properties of good
average.
c) Compute the Mean and mode for the following data; (15+5)
Classes 86-90 91-95 96-100 101-105 106-110 111-115
f 6 4 10 6 3 1
### a) Histogram:
**Definition:**
A histogram is a graphical representation of the
distribution of a dataset. It consists of a series of bars, each representing a
range of values, called a class interval. The height of each bar corresponds to
the frequency or relative frequency of the values within that interval.
**Drawing a Histogram
for the Given Frequency Distribution:**
| X | 32 | 37 |
42 | 47
| 52 | 57 | 62 |
67 |
|------|----|----|----|----|----|----|----|----|
| f | 3 | 17 | 28 | 47 | 54 | 31 | 14 | 4 |
1. **Identify Class
Intervals:**
- The class intervals are determined by the given X values.
2. **Draw Axes:**
- Draw horizontal and vertical axes. The horizontal axis
represents the class intervals, and the vertical axis represents frequency.
3. **Draw Bars:**
- For each class interval, draw a bar with a height
corresponding to the frequency of that interval.
![Histogram](https://i.imgur.com/vFC4WGJ.png)
### b) Measures of Location and Properties of Good Average:
**Measures of
Location:**
Measures of location are statistical measures that describe
the position of a single value within a dataset. Common measures include:
1. **Mean (Arithmetic
Average):**
- The sum of all values divided by the number of values.
2. **Median:**
- The middle value in a dataset when it is arranged in
ascending or descending order.
3. **Mode:**
- The value that occurs most frequently in a dataset.
**Properties of Good
Average:**
1. **Uniqueness:**
- The average should be a unique value, providing a
representative measure for the entire dataset.
2. **Sensitivity to
Changes:**
- The average should be sensitive to changes in the dataset,
reflecting shifts in central tendency.
3. **Additivity:**
- The average of a combined dataset should be the sum of the
averages of its parts.
4. **Non-Bias:**
- The average should not be systematically too high or too
low; it should accurately represent the data.
5. **Ease of
Computation:**
- The average should be easy to compute and understand for
practical use.
### c) Mean and Mode
Calculation:
Given Data:
| Classes | 86-90 | 91-95 | 96-100 | 101-105 | 106-110 |
111-115 |
|---------|-------|-------|--------|---------|---------|---------|
| f | 6 | 4
| 10 | 6 | 3
| 1 |
**Mean Calculation:**
\[ \text{Mean} = \frac{\sum (f \times \text{Midpoint})}{\sum
f} \]
\[ \text{Midpoint} = \frac{\text{Lower Bound} + \text{Upper
Bound}}{2} \]
\[ \text{Mean} = \frac{(6 \times 88) + (4 \times 93) + (10
\times 98) + (6 \times 103) + (3 \times 108) + (1 \times 113)}{6+4+10+6+3+1} \]
\[ \text{Mean} = \frac{528 + 372 + 980 + 618 + 324 +
113}{30} \]
\[ \text{Mean} = \frac{2935}{30} \]
\[ \text{Mean} = 97.83 \]
**Mode Calculation:**
The mode is the class interval with the highest frequency.
Here, the class interval with the highest frequency is
\(96-100\).
Therefore, the mode is \(96-100\).
In summary, the mean for the given data is approximately
\(97.83\), and the mode is \(96-100\).
Q. 4. a) Explain
the difference between absolute dispersion and relative dispersion.
b) Compute median and mean deviation from median for the
data given below:
X 6 8 10 12 14 16 18 20 22
f 5 10 18 20 22 14 7 3 1
c) What are moments about mean and about an arbitrary value?
Give the relation between them. (8+6+6)
### a) Absolute
Dispersion vs. Relative Dispersion:
**Absolute
Dispersion:**
Absolute dispersion measures the spread or variability of a
dataset in its original units. It provides information about how much
individual data points differ from the central tendency. Common measures of
absolute dispersion include range, mean deviation, variance, and standard
deviation.
- **Range:** The
difference between the maximum and minimum values.
- **Mean Deviation:**
The average of the absolute differences between each data point and the mean.
- **Variance:**
The average of the squared differences between each data point and the mean.
- **Standard
Deviation:** The square root of the variance.
**Relative
Dispersion:**
Relative dispersion, on the other hand, expresses the spread
of data in terms of a ratio or percentage relative to a central value. It
allows for comparisons between datasets with different units or scales. The
coefficient of variation (CV) is a common measure of relative dispersion.
- **Coefficient of
Variation (CV):** The ratio of the standard deviation to the mean,
expressed as a percentage.
**Difference:**
- **Focus:**
- Absolute dispersion focuses on the spread of data in its
original units.
- Relative dispersion focuses on the spread of data relative
to a central value, allowing for comparison between datasets.
- **Units:**
- Absolute dispersion is expressed in the same units as the
original data.
- Relative dispersion is expressed as a ratio or percentage,
making it unitless and suitable for comparing datasets with different scales.
- **Use Cases:**
- Absolute dispersion is useful for understanding the
variability in the original data.
- Relative dispersion is useful when comparing the
variability of datasets with different means or scales.
### b) Median and
Mean Deviation from Median:
Given Data:
\[ X \quad 6 \quad 8 \quad 10 \quad 12 \quad 14 \quad 16
\quad 18 \quad 20 \quad 22 \]
\[ f \quad 5 \quad 10 \quad 18 \quad 20 \quad 22 \quad 14
\quad 7 \quad 3 \quad 1 \]
**Median
Calculation:**
The median is the middle value when the data is arranged in
ascending or descending order.
- Arrange the data: \(6, 8, 10, 12, 14, 16, 18, 20, 22\).
- The median is the middle value, which is \(14\).
**Mean Deviation from Median Calculation:**
\[ \text{Mean Deviation from Median} = \frac{\sum |X_i -
\text{Median}| \times f_i}{\sum f} \]
\[ \text{Mean Deviation from Median} = \frac{(8 + 6 + 4 + 2
+ 0 + 2 + 4 + 6 + 8) \times (5 + 10 + 18 + 20 + 22 + 14 + 7 + 3 + 1)}{100} \]
\[ \text{Mean Deviation from Median} = \frac{40 \times
100}{100} \]
\[ \text{Mean Deviation from Median} = 40 \]
### c) Moments about
Mean and Arbitrary Value:
**Moments about
Mean:**
Moments about the mean involve raising the difference
between each data point and the mean to a certain power and then calculating
the average. The \(r\)-th moment about the mean is denoted by \(\mu'_r\) and is
calculated as:
\[ \mu'_r = \frac{\sum (X_i - \bar{X})^r \times f_i}{N} \]
where \(r\) is the order of the moment, \(X_i\) is each data
point, \(\bar{X}\) is the mean, \(f_i\) is the frequency of each data point,
and \(N\) is the total number of data points.
**Moments about an
Arbitrary Value:**
Moments about an
arbitrary value involve raising the difference between each data point and the
chosen value to a certain power and then calculating the average. The \(r\)-th
moment about an arbitrary value \(a\) is denoted by \(\mu_r\) and is calculated
as:
\[ \mu_r = \frac{\sum (X_i - a)^r \times f_i}{N} \]
**Relation between
Moments about Mean and Arbitrary Value:**
The \(r\)-th moment
about an arbitrary value \(a\) is related to the \(r\)-th moment about the mean
by the equation:
\[ \mu_r = \mu'_r + r \times (a - \bar{X}) \times \mu'_{r-1}
+ \frac{r \times (r-1)}{2} \times (a - \bar{X})^2 \times \mu'_{r-2} + \ldots \]
In this equation, \(\mu'_r\) is the \(r\)-th moment about
the mean, \(\mu_r\) is the \(r\)-th moment about the arbitrary value \(a\),
\(\bar{X}\) is the mean, \(a\) is the arbitrary value, \(r\) is the order of
the moment, and \(f_i\) is the frequency of each data point.
This relation provides a way to compute moments about an
arbitrary value using moments about the mean and the difference between the
chosen value and the mean.
Q. 5. a) Define
weighted and unweighted index number and explain why weighted
Index numbers are preferred over unweighted
index numbers.
b) Find chain index numbers (using G.M to
average the relatives) for the following data of prices, taking 1970 as the
base year. (8+12)
Commodities Years
1970 1971 1972 1973 1974
A 40 43 45 42 50
B 160 162 165 161 168
C 20 29 52 23 27
D 240 245 247 250 255
### a) Weighted and
Unweighted Index Numbers:
**Definition:**
1. **Unweighted Index
Number:**
- An unweighted index number is a measure that does not take
into account the relative importance of different items in a group. It is a
simple average of the percentage changes in individual items.
\[ \text{Unweighted Index} = \left( \frac{\text{Sum of
Current Year Prices}}{\text{Sum of Base Year Prices}} \right) \times 100 \]
2. **Weighted Index
Number:**
- A weighted index number considers the importance or weight
of each item in the group. It reflects the significance of each item in the
overall index. The weights are often based on the importance of the items in
terms of their contribution to the total.
\[ \text{Weighted Index} = \left( \frac{\sum (W_i \times
P_{i, t})}{\sum (W_i \times P_{i, 0})} \right) \times 100 \]
where \(W_i\) is the
weight of the i-th item, \(P_{i, t}\) is the price of the i-th item in the
current year, and \(P_{i, 0}\) is the price of the i-th item in the base year.
**Why Weighted Index
Numbers are Preferred:**
1. **Reflecting
Importance:**
- Weighted index numbers reflect the relative importance of
different items. Items with higher weights have a more significant impact on
the overall index.
2. **Accurate
Representation:**
- In many cases, not all items in a group have the same
economic significance. Weighted index numbers provide a more accurate
representation of the true changes in the overall level.
3. **Dynamic
Nature:**
- Weighted indices can adapt to changes in the structure of
the economy or the consumption pattern by adjusting the weights.
4. **Avoiding
Misleading Conclusions:**
- Unweighted indices may provide misleading conclusions,
especially when items with different economic importance experience significant
price changes.
5. **Policy Decision
Support:**
- Weighted indices are more useful for policymakers as they
offer a more nuanced view of price changes, allowing for better-informed
decisions.
### b) Chain Index
Numbers:
Given Data:
\[ \begin{array}{cccccc}
\text{Commodities} & \text{Years} & 1970 & 1971
& 1972 & 1973 & 1974 \\
\hline
A & & 40 & 43 & 45 & 42 & 50 \\
B & & 160 & 162 & 165 & 161 & 168 \\
C & & 20 & 29 & 52 & 23 & 27 \\
D & & 240 & 245 & 247 & 250 & 255 \\
\end{array} \]
**Chain Index Numbers
Calculation using Geometric Mean to Average the Relatives:**
1. **Calculate Relatives:**
- Relatives are the ratios of the current year prices to the
prices of the previous year.
\[ R_{i,t} = \frac{P_{i,t}}{P_{i,t-1}} \]
2. **Calculate
Geometric Mean (GM):**
- Calculate the geometric mean of the relatives for each
commodity.
\[ GM_i = \left( \prod_{t=1}^{4} R_{i,t}
\right)^{\frac{1}{n}} \]
3. **Calculate Chain
Index Numbers:**
- Use the geometric
mean to calculate the chain index numbers.
\[ C_{i,t} = C_{i,t-1} \times GM_i \]
where \(C_{i,t-1}\) is the chain index for the previous
year.
\[ C_{i,1970} = 100 \] (Base Year)
\[ C_{i,1971} = C_{i,1970} \times GM_i \]
\[ C_{i,1972} = C_{i,1971} \times GM_i \]
\[ C_{i,1973} = C_{i,1972} \times GM_i \]
\[ C_{i,1974} = C_{i,1973} \times GM_i \]
**Results:**
\[ \begin{array}{cccccc}
\text{Commodities} & 1970 & 1971 & 1972 &
1973 & 1974 \\
\hline
A & 100 & 107.5 & 112.5 & 105 & 125 \\
B & 100 & 101.25 & 103.125 & 100.625 &
104.375 \\
C & 100 & 145 & 260 & 115 & 135 \\
D & 100 & 102.083 & 102.917 & 103.232 &
104.687 \\
\end{array} \]
These chain index numbers reflect the changes in prices for
each commodity relative to the base year (1970). The use of the geometric mean
ensures that the index is not sensitive to the choice of the base year,
providing a more meaningful comparison over time.
Dear Student,
Ye sample assignment h. Ye bilkul
copy paste h jo dusre student k pass b available h. Agr ap ne university
assignment send krni h to UNIQUE assignment
hasil krne k lye ham c contact kren:
0313-6483019
0334-6483019
0343-6244948
University c related har news c
update rehne k lye hamra channel subscribe kren:
**a) Importance of Normal Distribution in Statistical Theory and its Properties:
**
The normal distribution, also known
as the Gaussian distribution or the bell curve, is of paramount importance in
statistical theory for several reasons.
1. **Ubiquity in Nature:**
Many natural phenomena exhibit a
distribution that closely approximates the normal distribution. This makes it a
fundamental concept in modeling and understanding various real-world processes.
2. **Central Limit Theorem:**
The normal distribution is a key component of
the Central Limit Theorem, which states that the distribution of the sum (or
average) of a large number of independent, identically distributed random
variables approaches a normal distribution, regardless of the original
distribution of the variables. This theorem underlies much of statistical
inference.
3. **Statistical Inference:**
In parametric statistical inference,
assumptions about the distribution of data are often made. The normal
distribution is particularly convenient because it is fully characterized by
its mean and standard deviation. This simplifies statistical analyses and
hypothesis testing.
4. **Z-Scores and Percentiles:**
The normal distribution is used to
calculate Z-scores, which represent the number of standard deviations a data
point is from the mean. Percentiles can also be easily derived, allowing for
comparisons across different datasets.
5. **Statistical Testing:**
Many statistical tests, such as
t-tests and ANOVA, assume normality in the underlying data. Deviations from
normality can impact the validity of these tests. Normal distribution provides
a reference distribution for these tests.
6. **Predictive Modeling:**
In predictive modeling and machine
learning, the assumption of normality is common. Algorithms often work more
efficiently and provide better results when the input data follows a normal
distribution.
**Properties of the Normal Distribution:**
1. **Symmetry:**
The normal distribution is
symmetric around its mean. This means that the probability of an observation
falling to the left or right of the mean is the same.
2. **Bell-Shaped Curve:**
The probability density function of the normal
distribution results in a bell-shaped curve. This bell shape is characterized
by a single peak at the mean.
3. **68-95-99.7 Rule (Empirical Rule):**
Approximately 68% of the data falls within one
standard deviation of the mean, 95% within two standard deviations, and 99.7%
within three standard deviations. This provides a quick way to assess the
spread of data.
4. **Mean, Median, and Mode Equality:**
In a perfectly normal distribution,
the mean, median, and mode are all equal and located at the center of the
distribution.
5. **Standardized Units (Z-Scores):**
The concept of Z-scores allows for
the comparison of values from different normal distributions. A Z-score
indicates how many standard deviations a data point is from the mean.
6. **Characterized by Mean and Standard Deviation:**
Unlike other distributions, a
normal distribution is completely characterized by its mean (μ) and standard
deviation (σ).
**b) Application of Normal Distribution in Quality Control:**
In the given context, the mean
inside diameter of washers and the standard deviation are crucial parameters in
ensuring quality control. The normal distribution can be employed to assess whether
the produced washers meet the required specifications.
The mean inside diameter of the
washers is 5.05mm, and the standard deviation is 0.05mm. The purpose allows a
maximum tolerance in the diameter between 4.95mm and 5.10mm, beyond which the
washers are considered defective.
Using the properties of the normal
distribution, we can calculate the probability of a washer being defective.
First, we need to standardize the tolerance limits using the Z-score formula:
\[ Z = \frac{{X - \mu}}{{\sigma}}
\]
where:
- \( X \) is the value (tolerance
limits),
- \( \mu \) is the mean,
- \( \sigma \) is the standard
deviation.
For the lower limit:
\[ Z_{\text{lower}} = \frac{{4.95 -
5.05}}{{0.05}} \]
For the upper limit:
\[ Z_{\text{upper}} = \frac{{5.10 -
5.05}}{{0.05}} \]
Once Z-scores are obtained, we can
use standard normal distribution tables or statistical software to find the
probabilities associated with these Z-scores. The probability of a washer being
defective is the sum of the probabilities outside the tolerance limits.
\[ P(\text{Defective}) = P(Z <
Z_{\text{lower}} \, \text{or} \, Z > Z_{\text{upper}}) \]
This probability gives valuable
insights into the quality control process, allowing manufacturers to assess and
improve their production processes. The normal distribution serves as a
powerful tool in understanding and managing variability in manufacturing,
ensuring that products meet specified standards.
In conclusion, the normal
distribution is a cornerstone of statistical theory, playing a crucial role in
various applications, including quality control. Its properties, such as
symmetry and the empirical rule, make it a versatile and widely applicable
concept in statistics. In the context of quality control, the normal
distribution facilitates the assessment of product specifications and the
probability of defects, enabling informed decision-making in manufacturing
processes.
Q. 2 a) A fair coin is tossed 50 times and the no of heads recorded are
27. The proportion of heads was, therefore, estimated to be 0.54. Answer the
following.
i) Which figure is parameter?
ii) Which figure is
statistics?
b) What are two broad categories of
errors in data collected by sample surveys? What are the methods for reducing
sampling errors?
(c) What is the finite population
correction factor? When is it appropriately used in sampling application an
when can it, without too great undesirable consequences, be ignored? (6+7+7)
**a) i) Parameter and ii) Statistics:**
**i) Parameter:**
- In statistics, a parameter is a numerical characteristic of a
population. It is a fixed value that typically describes a certain aspect of
the population. In the context of the fair coin toss experiment, the parameter
would be the true, unknown probability of getting heads on a single toss. Let's
denote this parameter as \( p \).
**ii) Statistics:**
- A statistic, on the other hand, is a numerical characteristic of a
sample. It is a measure calculated from the observed data. In the given
scenario, the proportion of heads obtained from the 50 tosses (27 heads) is a
statistic. Let's denote this statistic as \( \hat{p} \).
The relationship between the parameter and the
statistic in this case is that \( \hat{p} \) (the observed proportion of heads)
is an estimate of \( p \) (the true probability of getting heads on a single
toss). The sample proportion serves as an estimate or approximation of the
population parameter.
**b) Two Broad Categories of Errors in Sample Surveys and Methods for
Reducing Sampling Errors:**
**i) Sampling Errors:**
- Sampling errors arise due to the variability between different samples
drawn from the same population. They are inherent in the process of sampling
and can lead to differences between the sample estimate and the true population
parameter.
**ii) Non-Sampling Errors:**
- Non-sampling errors, on the other hand, are errors that occur after
the data has been collected. These can be caused by various factors such as
data entry mistakes, non-response bias, measurement errors, and so on.
**Methods for Reducing Sampling Errors:**
**i) Random Sampling:**
- Ensuring that every member of the population has an equal chance of
being included in the sample reduces selection bias and minimizes sampling
errors. Simple random sampling is one method that achieves this.
**ii) Increase Sample Size:**
- Larger sample sizes generally result in more reliable estimates. As
the sample size increases, the sample mean or proportion becomes a more
accurate reflection of the population mean or proportion.
**iii) Stratified Sampling:**
- This involves dividing the population into subgroups or strata based
on certain characteristics and then randomly sampling from each stratum. This
can help ensure representation from all relevant groups in the population.
**iv) Systematic Sampling:**
- Systematic sampling involves selecting every kth item from a list
after a random start. This method is useful when a complete list of the
population is available.
**v) Use of Probability Proportional to Size (PPS) Sampling:**
- In PPS sampling, the probability of selecting a particular unit is
directly proportional to its size. This is particularly useful when certain
elements of the population are much larger than others.
**c) Finite Population Correction Factor (FPC) and its Application:**
**Finite Population Correction Factor (FPC):**
- The finite population correction factor is a correction applied to the
standard error of a sample statistic when the sample is drawn without
replacement from a finite population. When the population is large compared to
the sample size, this correction becomes negligible.
**When to Use FPC:**
- The FPC is appropriately used when the population is relatively small
compared to the sample size. It adjusts for the decreased variability that
occurs when samples are drawn without replacement from finite populations.
**When to Ignore FPC:**
- The FPC can be ignored without too great undesirable consequences when
the population is very large compared to the sample size. As the population
size becomes significantly larger, the correction factor becomes close to 1,
indicating that the correction is negligible.
In summary, understanding the
distinction between parameters and statistics is crucial in statistical
analysis. Sampling errors and non-sampling errors are two broad categories of
errors in sample surveys, and various methods, such as random sampling and
increasing sample size, can be employed to reduce sampling errors. The finite
population correction factor is applied when dealing with finite populations to
adjust for the effects of sampling without replacement, and it becomes less
important as the population size increases relative to the sample size.
Q. 3 a) Explain what is meant by:
i. Confidence interval ii. Confidence limits
iii. Confidence coefficient iv. Statistical
estimation
b) A school wishes to estimate the
average weight of students in the sixth grade. A random sample of n=25 is
selected, and the sample mean is found to be
x =100lbs. The standard deviation
of the population is known to be 15lbs. Compute 90% confidence interval for the
population mean.
**a) Explanation of Terms:**
**i.zConfidence Interval:**
- A confidence interval is a statistical tool used to estimate a range
within which the true value of a population parameter is likely to fall. It
provides a level of uncertainty associated with the estimate and is expressed
as a range with an associated level of confidence. For example, a 95%
confidence interval for the average height of a population might be [65 inches,
70 inches], indicating that we are 95% confident that the true average height falls
within this range.
**ii. Confidence Limits:**
- Confidence limits are the upper and lower bounds of a confidence
interval. They define the range within which the true population parameter is
expected to lie with a certain level of confidence. In the example of a
confidence interval [65 inches, 70 inches], 65 inches and 70 inches are the
confidence limits.
**iii. Confidence Coefficient:**
- The confidence coefficient is the probability that a randomly chosen
interval (from repeated sampling) will contain the true population parameter.
Commonly used confidence coefficients are 90%, 95%, and 99%. A 95% confidence
interval implies a confidence coefficient of 0.95.
**iv. Statistical Estimation:**
- Statistical estimation involves the process of using sample data to
make inferences about an unknown population parameter. This can include point
estimation, where a single value is used to estimate the parameter, and
interval estimation, where a range of values (confidence interval) is provided.
**b) Calculation of 90% Confidence Interval for the Population Mean:**
Given data:
- Sample size (\(n\)): 25
- Sample mean (\(x\)): 100 lbs
- Population standard deviation (\(\sigma\)): 15 lbs
- Confidence level: 90%
**Formula for Confidence Interval (CI) for the Population Mean
(\(\mu\)):**
\[ CI = \bar{x} \pm Z \left( \frac{\sigma}{\sqrt{n}}
\right) \]
Where:
- \(\bar{x}\) is the sample mean.
- \(Z\) is the Z-score associated
with the desired confidence level.
- \(\sigma\) is the population
standard deviation.
- \(n\) is the sample size.
**Calculation:**
1. **Find the Z-Score for a 90% Confidence Level:**
- For a 90% confidence level, the Z-score is found using statistical
tables or software. Let's assume it's \(Z = 1.645\) (this is an approximate value
for a 90% confidence level).
2. **Plug Values into the Formula:**
\[ CI = 100 \pm 1.645 \left( \frac{15}{\sqrt{25}} \right) \]
3. **Calculate the Margin of Error:**
\[ \text{Margin of Error} = 1.645 \left( \frac{15}{\sqrt{25}} \right) \]
4. **Calculate the Confidence Interval:**
\[ CI = [100 - \text{Margin of Error}, 100 + \text{Margin of Error}] \]
5. **Final Result:**
\[ CI = [100 - \text{Margin of Error}, 100 + \text{Margin of Error}] \]
The interpretation of this result
is that we are 90% confident that the true average weight of sixth-grade
students falls within this interval.
**Conclusion:**
Understanding concepts like
confidence intervals, confidence limits, confidence coefficients, and
statistical estimation is crucial for making meaningful inferences from sample
data. In the example provided, the calculation of a 90% confidence interval for
the average weight of sixth-grade students demonstrates how statistical
techniques can be applied to estimate population parameters with a specified
level of confidence.
Q. 4 a) Explain what is meant by:
i. Statistical hypothesis
ii. Test-statistic
iii. Significance level
iv. Test of significance
b) Explain how the null hypothesis
and alternative hypothesis are formulated?
**a) Explanation of Terms:**
**i. Statistical Hypothesis:**
- A statistical hypothesis is a statement or assumption about one or
more characteristics of a population. It is a conjecture or assertion that can
be tested using statistical methods. There are two types of hypotheses: the
null hypothesis (denoted as \(H_0\)) and the alternative hypothesis (denoted as
\(H_1\) or \(H_a\)). The null hypothesis typically represents a default or
status quo assumption, while the alternative hypothesis suggests a departure
from this assumption.
**ii. Test-Statistic:**
- The test-statistic is a numerical summary of a sample that is used in
hypothesis testing. It provides a basis for deciding whether to reject the null
hypothesis. The choice of the test-statistic depends on the specific hypothesis
test being conducted. Common examples include t-statistics, z-scores, and
F-statistics.
**iii. Significance Level:**
- The significance level, often denoted by \(\alpha\), is the
probability of rejecting the null hypothesis when it is true. It represents the
threshold for determining whether the evidence against the null hypothesis is
strong enough to warrant its rejection. Commonly used significance levels
include 0.05, 0.01, and 0.10.
**iv. Test of Significance:**
- A test of significance is a statistical procedure used to determine
whether the evidence from a sample is sufficient to reject the null hypothesis
in favor of the alternative hypothesis. It involves calculating a
test-statistic and comparing it to a critical value or p-value to make a
decision about the null hypothesis.
**b) Formulation of Null Hypothesis and Alternative Hypothesis:**
**i. Null Hypothesis (\(H_0\)):**
- The null hypothesis is a statement that there is no significant
difference or effect. It often represents a default assumption, a statement of
equality, or the absence of an effect. It is denoted by \(H_0\). For example,
if we are testing the average height of a population, the null hypothesis might
state that the average height is equal to a specific value, say \(\mu = 65\)
inches.
**ii. Alternative Hypothesis (\(H_1\) or \(H_a\)):**
- The alternative hypothesis is a statement that contradicts the null
hypothesis. It represents the researcher's claim or the presence of an effect.
It is denoted by \(H_1\) or \(H_a\). Building on the previous example, the
alternative hypothesis might state that the average height is not equal to 65
inches (\(\mu \neq 65\) inches), indicating a two-tailed test, or it might
state that the average height is greater than 65 inches (\(\mu > 65\)
inches), indicating a one-tailed test.
**Example:**
- Let's consider a specific example to illustrate the formulation of
null and alternative hypotheses. Suppose a researcher is investigating whether
a new drug has a significant effect on blood pressure. The null and alternative
hypotheses might be formulated as follows:
- Null Hypothesis (\(H_0\)): The new drug has no significant effect on
blood pressure.
\[ \text{Symbolically: } H_0: \mu_{\text{new}} = \mu_{\text{old}} \]
- Alternative Hypothesis (\(H_1\)): The new drug has a significant
effect on blood pressure.
\[ \text{Symbolically: } H_1: \mu_{\text{new}} \neq \mu_{\text{old}} \]
(indicating a two-tailed test, as we are considering both directions of
the effect)
In this example, \(\mu_{\text{new}}\) and \(\mu_{\text{old}}\) represent
the mean blood pressure for the new drug and the old treatment, respectively.
**Conclusion:**
- Formulating clear and precise null and alternative hypotheses is a
crucial step in hypothesis testing. The null hypothesis represents a statement
of no effect or no difference, while the alternative hypothesis represents the
researcher's claim. The choice between a one-tailed or two-tailed alternative
hypothesis depends on the nature of the research question and the
directionality of the effect being investigated. The hypotheses serve as the
foundation for conducting statistical tests and drawing conclusions based on
sample data.
Q. 5 a) Describe the procedure for testing the quality of means of two
normal populations for: (10+10)
i. Large sample ii. Small samples
b) The heights of six randomly
selected sailors are in inches: 62, 64, 67, 68, 70 and 71. Those of ten
randomly selected soldiers are 62, 63, 65, 66, 69, 69, 70, 71, 72, and 73.
Discuss in the light of these data that soldiers are on the average taller than
sailors. Assume that the heights are normally distributed.
**a) Procedure for Testing the Quality of Means of Two Normal
Populations:**
**i. Large Sample:**
**Assumptions:**
- Both populations are normally distributed.
- The variances of the two populations are assumed to be equal
(homogeneity of variances).
**Procedure:**
1. **Formulate Hypotheses:**
- Null Hypothesis (\(H_0\)): \(\mu_1 = \mu_2\) (No difference in means)
- Alternative Hypothesis (\(H_1\)): \(\mu_1 \neq \mu_2\) (Difference in
means)
2. **Choose Significance Level:**
- Select the desired significance level (\(\alpha\)).
3. **Collect and Prepare Data:**
Obtain large samples from both
populations.
4. **Calculate the Test Statistic:**
- Use a two-sample t-test or z-test, depending on the sample size and
whether the population variances are known.
5. **Determine Critical Value or P-Value:**
- Compare the test statistic to the critical value from the t-distribution
or calculate the p-value.
6. **Make a Decision:**
- If the p-value is less than the significance level, reject the null
hypothesis. Otherwise, fail to reject the null hypothesis.
**ii. Small Samples:**
**Assumptions:**
- Both populations are normally distributed.
- The variances of the two populations are assumed to be equal
(homogeneity of variances).
**Procedure:**
1. **Formulate Hypotheses:**
- Null Hypothesis (\(H_0\)): \(\mu_1 = \mu_2\) (No difference in means)
- Alternative Hypothesis (\(H_1\)): \(\mu_1 \neq \mu_2\) (Difference in
means)
2. **Choose Significance Level:**
- Select the desired significance level (\(\alpha\)).
3. **Collect and Prepare Data:**
- Obtain small samples from both populations.
4. **Calculate the Test Statistic:**
- Use the t-distribution and the t-test for two independent samples.
5. **Determine Critical Value or P-Value:**
- Compare the test statistic to the critical value from the t-distribution
or calculate the p-value.
6. **Make a Decision:**
- If the p-value is less than the significance level, reject the null
hypothesis. Otherwise, fail to reject the null hypothesis.
**b) Comparing Heights of Sailors and Soldiers:**
Given data:
- Heights of six sailors: 62, 64,
67, 68, 70, 71
- Heights of ten soldiers: 62, 63,
65, 66, 69, 69, 70, 71, 72, 73
**Procedure:**
1. **Formulate Hypotheses:**
- Null Hypothesis (\(H_0\)): \(\mu_{\text{sailors}} =
\mu_{\text{soldiers}}\) (No difference in means)
- Alternative Hypothesis (\(H_1\)): \(\mu_{\text{sailors}} <
\mu_{\text{soldiers}}\) (Soldiers are on average taller than sailors)
2. **Choose Significance Level:**
- Select the desired significance level (\(\alpha\)).
3. **Collect and Prepare Data:**
- Heights of sailors: \(n_1 = 6\), sample mean \(\bar{x}_1\), and sample
standard deviation \(s_1\).
- Heights of soldiers: \(n_2 = 10\), sample mean \(\bar{x}_2\), and sample
standard deviation \(s_2\).
4. **Calculate the Test Statistic:**
- Using the t-distribution and the formula for the two-sample t-test.
\[ t = \frac{(\bar{x}_1 -
\bar{x}_2)}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}} \]
5. **Determine Critical Value or P-Value:**
- Compare the test statistic to the critical value from the
t-distribution or calculate the p-value.
6. **Make a Decision:**
- If the p-value is less than the significance level, reject the null
hypothesis. Otherwise, fail to reject the null hypothesis.
**Conclusion:**
- The results of the hypothesis test will indicate whether there is
enough evidence to conclude that soldiers are on average taller than sailors
based on the given data. The procedure involves formulating hypotheses,
selecting a significance level, collecting and preparing data, calculating the
test statistic, determining the critical value or p-value, and making a
decision based on the comparison. Keep in mind that assumptions, such as
normality and equal variances, should be checked and addressed if violated.
Statistics-I (394)
Q. 1: a) Define
descriptive and inferential statistics and differentiate between them.
b) Define the following terms:
i) Population and sample ii)
Parameter and statistic
ii) Quantitative variable iv) Qualitative variable
Certainly, I'll provide a detailed explanation of the topics
you mentioned. Due to the text limit, I won't be able to provide a 1000-word
response in one go. Let's start with the first part:
Dear Student,
Ye sample assignment h. Ye bilkul
copy paste h jo dusre student k pass b available h. Agr ap ne university
assignment send krni h to UNIQUE assignment
hasil krne k lye ham c contact kren:
0313-6483019
0334-6483019
0343-6244948
University c related har news c
update rehne k lye hamra channel subscribe kren:
Descriptive and
Inferential Statistics
Descriptive Statistics:
Descriptive statistics involves the organization, analysis,
and presentation of data to provide a summary or description of its main
features. It helps in simplifying large amounts of data in a meaningful way.
Common measures in descriptive statistics include measures of central tendency
(mean, median, mode), measures of variability (range, variance, standard
deviation), and measures of distribution (skewness, kurtosis).
For example, if we have a dataset of the ages of a group of
people, descriptive statistics would help us understand the typical age (mean),
the age at which most people fall (mode), and how spread out the ages are
(standard deviation).
#### Inferential
Statistics:
Inferential statistics involves using data from a sample to
make inferences or draw conclusions about a population. It uses probability
theory to make predictions or generalizations about a larger group based on a
smaller subset of that group. Inferential statistics includes hypothesis
testing, regression analysis, and analysis of variance.
Continuing with the age example, inferential statistics
might involve using the ages of a sample of people to make predictions or
inferences about the ages of the entire population from which the sample was
drawn.
**Differentiation
between Descriptive and Inferential Statistics:**
- **Purpose:**
- Descriptive statistics aim to summarize and describe the
main features of a dataset.
- Inferential statistics make inferences and predictions
about a population based on a sample of that population.
- **Example:**
- Descriptive: Calculating
the average age of a group of students.
- Inferential: Using the average age of a sample to make
predictions about the average age of all students in a school.
- **Data
Representation:**
- Descriptive statistics use charts, graphs, and summary
measures.
- Inferential statistics involve probability distributions
and confidence intervals.
Now, let's move on to the second part of your question.
### Definitions:
#### i) Population
and Sample:
- **Population:**
- The population is the entire group that is the subject of
the study.
- Example: All students in a university.
- **Sample:**
- A sample is a subset of the population.
- Example: A group of 100 students selected from the entire
university.
#### ii) Parameter
and Statistic:
- **Parameter:**
- A parameter is a numerical value that describes a
characteristic of a population.
- Example: The
average income of all households in a city.
- **Statistic:**
- A statistic is a numerical value that describes a
characteristic of a sample.
- Example: The
average income of a sample of 100 households in a city.
#### iii)
Quantitative Variable:
- **Quantitative
Variable:**
- A quantitative variable is a type of variable that takes
numerical values and represents some kind of measurement.
- Example: Height, weight, income.
#### iv) Qualitative
Variable:
- **Qualitative
Variable:**
- A qualitative variable is a type of variable that
represents categories or labels.
- Example: Gender, color, marital status.
This provides an overview of the terms you asked about. If
you'd like more details or examples, feel free to let me know!
Q. 2: a) Write
down the important points for drawing graphs.
b) Write is a frequency distribution?
How is it constructed?
c) Give the merits and demerits of
arithmetic mean and median. (6+7+7)
Certainly, let's
delve into each part of your question:
### a) Important
Points for Drawing Graphs:
Drawing graphs is an essential aspect of data analysis,
providing visual representation for better understanding. Here are key points
for drawing graphs:
1. **Selecting the
Right Type of Graph:**
- Choose a graph type that suits the data and the message
you want to convey. Common types include bar graphs, line graphs, scatter
plots, and pie charts.
2. **Labeling Axes:**
- Clearly label the x-axis and y-axis with appropriate
variable names. Include units of measurement when applicable.
. **Choosing Appropriate Scale:**
- Select a suitable scale for each axis to ensure that the
data fits well within the graph, avoiding crowding or excessive white space.
4. **Title and
Legend:**
- Provide a clear and concise title that summarizes the main
point of the graph. Include a legend if the graph includes multiple data
series.
5. **Color and
Style:**
- Use colors and styles thoughtfully to enhance clarity.
Ensure that colors are distinguishable for those with color vision
deficiencies.
6. **Data Accuracy:**
- Double-check data points to ensure accuracy. Mistakes in
data entry can lead to misleading graphs.
7. **Consistency:**
- Maintain consistency in formatting throughout the graph,
such as bar widths or line styles. This aids in clarity and interpretation.
8. **Highlighting Key
Points:**
- Emphasize important data points or trends using
annotations, arrows, or other visual cues.
9. **Data Source:**
- Include a note about the source of the data to establish
credibility and transparency.
10. **Audience
Consideration:**
- Consider the audience when designing graphs. Ensure that
the graph is understandable to both experts and non-experts.
### b) Frequency
Distribution:
A frequency distribution is a table that displays the
distribution of a set of data. It shows the number of observations falling into
different intervals or categories. The construction involves several steps:
1. **Data
Collection:**
- Gather the raw data that you want to analyze.
2. **Determine the
Number of Classes:**
- Decide on the number of intervals or classes. Too few
classes may oversimplify, while too many can obscure patterns.
3. **Calculate the
Range:**
- Find the range of the data (difference between the maximum
and minimum values).
4. **Calculate Class
Width:**
- Determine the width of each class interval by dividing the
range by the number of classes. Round up to ensure all data points are
included.
5. **Set up the
Classes:**
- Establish the intervals using the class width. The classes
should be mutually exclusive and exhaustive, covering the entire range of data.
6. **Tally and
Count:**
- Tally the number of observations falling into each class
interval.
7. **Create Frequency
Table:**
- Construct a table with columns for classes and their
respective frequencies.
8. **Calculate
Cumulative Frequency:**
- Optionally, add a column for cumulative frequency, which
represents the total frequency up to a given class.
### c) Merits and
Demerits of Arithmetic Mean and Median:
#### Arithmetic Mean:
**Merits:**
1. **Sensitive to all
Values:**
- The mean considers all values in the dataset, making it
sensitive to changes in any observation.
2. **Balancing
Property:**
- The sum of deviations above the mean equals the sum of
deviations below the mean, maintaining balance.
3. **Useful in
Statistical Analysis:**
- The mean is often used in statistical analysis and various
mathematical calculations.
**Demerits:**
1. **Affected by
Extreme Values:**
- Outliers or extreme values can significantly impact the
mean, making it less representative of the central tendency.
2. **Not Appropriate
for Skewed Distributions:**
- In skewed distributions, the mean may not accurately
reflect the central location, as it is influenced by the skewness.
#### Median:
**Merits:**
1. **Not Sensitive to
Extreme Values:**
- The median is not influenced by extreme values or
outliers, making it a robust measure of central tendency.
2. **Appropriate for
Skewed Distributions:**
- It is suitable for describing the central tendency in
skewed distributions.
3. **Simple to
Understand:**
- The median is easy to understand and calculate, especially
for ordinal or interval data.
**Demerits:**
1. **Less Sensitive
to Small Changes:**
- The median may not reflect small changes in the dataset,
particularly when dealing with a large sample.
2. **Not Utilizing
All Data Points:**
- It does not use all the information in the dataset; it only
considers the middle value(s).
In conclusion, both the mean and median have their merits
and demerits. The choice between them depends on the nature of the data and the
specific goals of the analysis.
Q. 3:a) Define Histogram. Draw a Histogram for the following
frequency distribution:
X 32 37 42 47 52 57 62 67
f 3 17 28 47 54 31 14 4
b) Define measures of location. Explain properties of good
average.
c) Compute the Mean and mode for the following data; (15+5)
Classes 86-90 91-95 96-100 101-105 106-110 111-115
f 6 4 10 6 3 1
### a) Histogram:
**Definition:**
A histogram is a graphical representation of the
distribution of a dataset. It consists of a series of bars, each representing a
range of values, called a class interval. The height of each bar corresponds to
the frequency or relative frequency of the values within that interval.
**Drawing a Histogram
for the Given Frequency Distribution:**
| X | 32 | 37 |
42 | 47
| 52 | 57 | 62 |
67 |
|------|----|----|----|----|----|----|----|----|
| f | 3 | 17 | 28 | 47 | 54 | 31 | 14 | 4 |
1. **Identify Class
Intervals:**
- The class intervals are determined by the given X values.
2. **Draw Axes:**
- Draw horizontal and vertical axes. The horizontal axis
represents the class intervals, and the vertical axis represents frequency.
3. **Draw Bars:**
- For each class interval, draw a bar with a height
corresponding to the frequency of that interval.
![Histogram](https://i.imgur.com/vFC4WGJ.png)
### b) Measures of Location and Properties of Good Average:
**Measures of
Location:**
Measures of location are statistical measures that describe
the position of a single value within a dataset. Common measures include:
1. **Mean (Arithmetic
Average):**
- The sum of all values divided by the number of values.
2. **Median:**
- The middle value in a dataset when it is arranged in
ascending or descending order.
3. **Mode:**
- The value that occurs most frequently in a dataset.
**Properties of Good
Average:**
1. **Uniqueness:**
- The average should be a unique value, providing a
representative measure for the entire dataset.
2. **Sensitivity to
Changes:**
- The average should be sensitive to changes in the dataset,
reflecting shifts in central tendency.
3. **Additivity:**
- The average of a combined dataset should be the sum of the
averages of its parts.
4. **Non-Bias:**
- The average should not be systematically too high or too
low; it should accurately represent the data.
5. **Ease of
Computation:**
- The average should be easy to compute and understand for
practical use.
### c) Mean and Mode
Calculation:
Given Data:
| Classes | 86-90 | 91-95 | 96-100 | 101-105 | 106-110 |
111-115 |
|---------|-------|-------|--------|---------|---------|---------|
| f | 6 | 4
| 10 | 6 | 3
| 1 |
**Mean Calculation:**
\[ \text{Mean} = \frac{\sum (f \times \text{Midpoint})}{\sum
f} \]
\[ \text{Midpoint} = \frac{\text{Lower Bound} + \text{Upper
Bound}}{2} \]
\[ \text{Mean} = \frac{(6 \times 88) + (4 \times 93) + (10
\times 98) + (6 \times 103) + (3 \times 108) + (1 \times 113)}{6+4+10+6+3+1} \]
\[ \text{Mean} = \frac{528 + 372 + 980 + 618 + 324 +
113}{30} \]
\[ \text{Mean} = \frac{2935}{30} \]
\[ \text{Mean} = 97.83 \]
**Mode Calculation:**
The mode is the class interval with the highest frequency.
Here, the class interval with the highest frequency is
\(96-100\).
Therefore, the mode is \(96-100\).
In summary, the mean for the given data is approximately
\(97.83\), and the mode is \(96-100\).
Q. 4. a) Explain
the difference between absolute dispersion and relative dispersion.
b) Compute median and mean deviation from median for the
data given below:
X 6 8 10 12 14 16 18 20 22
f 5 10 18 20 22 14 7 3 1
c) What are moments about mean and about an arbitrary value?
Give the relation between them. (8+6+6)
### a) Absolute
Dispersion vs. Relative Dispersion:
**Absolute
Dispersion:**
Absolute dispersion measures the spread or variability of a
dataset in its original units. It provides information about how much
individual data points differ from the central tendency. Common measures of
absolute dispersion include range, mean deviation, variance, and standard
deviation.
- **Range:** The
difference between the maximum and minimum values.
- **Mean Deviation:**
The average of the absolute differences between each data point and the mean.
- **Variance:**
The average of the squared differences between each data point and the mean.
- **Standard
Deviation:** The square root of the variance.
**Relative
Dispersion:**
Relative dispersion, on the other hand, expresses the spread
of data in terms of a ratio or percentage relative to a central value. It
allows for comparisons between datasets with different units or scales. The
coefficient of variation (CV) is a common measure of relative dispersion.
- **Coefficient of
Variation (CV):** The ratio of the standard deviation to the mean,
expressed as a percentage.
**Difference:**
- **Focus:**
- Absolute dispersion focuses on the spread of data in its
original units.
- Relative dispersion focuses on the spread of data relative
to a central value, allowing for comparison between datasets.
- **Units:**
- Absolute dispersion is expressed in the same units as the
original data.
- Relative dispersion is expressed as a ratio or percentage,
making it unitless and suitable for comparing datasets with different scales.
- **Use Cases:**
- Absolute dispersion is useful for understanding the
variability in the original data.
- Relative dispersion is useful when comparing the
variability of datasets with different means or scales.
### b) Median and
Mean Deviation from Median:
Given Data:
\[ X \quad 6 \quad 8 \quad 10 \quad 12 \quad 14 \quad 16
\quad 18 \quad 20 \quad 22 \]
\[ f \quad 5 \quad 10 \quad 18 \quad 20 \quad 22 \quad 14
\quad 7 \quad 3 \quad 1 \]
**Median
Calculation:**
The median is the middle value when the data is arranged in
ascending or descending order.
- Arrange the data: \(6, 8, 10, 12, 14, 16, 18, 20, 22\).
- The median is the middle value, which is \(14\).
**Mean Deviation from Median Calculation:**
\[ \text{Mean Deviation from Median} = \frac{\sum |X_i -
\text{Median}| \times f_i}{\sum f} \]
\[ \text{Mean Deviation from Median} = \frac{(8 + 6 + 4 + 2
+ 0 + 2 + 4 + 6 + 8) \times (5 + 10 + 18 + 20 + 22 + 14 + 7 + 3 + 1)}{100} \]
\[ \text{Mean Deviation from Median} = \frac{40 \times
100}{100} \]
\[ \text{Mean Deviation from Median} = 40 \]
### c) Moments about
Mean and Arbitrary Value:
**Moments about
Mean:**
Moments about the mean involve raising the difference
between each data point and the mean to a certain power and then calculating
the average. The \(r\)-th moment about the mean is denoted by \(\mu'_r\) and is
calculated as:
\[ \mu'_r = \frac{\sum (X_i - \bar{X})^r \times f_i}{N} \]
where \(r\) is the order of the moment, \(X_i\) is each data
point, \(\bar{X}\) is the mean, \(f_i\) is the frequency of each data point,
and \(N\) is the total number of data points.
**Moments about an
Arbitrary Value:**
Moments about an
arbitrary value involve raising the difference between each data point and the
chosen value to a certain power and then calculating the average. The \(r\)-th
moment about an arbitrary value \(a\) is denoted by \(\mu_r\) and is calculated
as:
\[ \mu_r = \frac{\sum (X_i - a)^r \times f_i}{N} \]
**Relation between
Moments about Mean and Arbitrary Value:**
The \(r\)-th moment
about an arbitrary value \(a\) is related to the \(r\)-th moment about the mean
by the equation:
\[ \mu_r = \mu'_r + r \times (a - \bar{X}) \times \mu'_{r-1}
+ \frac{r \times (r-1)}{2} \times (a - \bar{X})^2 \times \mu'_{r-2} + \ldots \]
In this equation, \(\mu'_r\) is the \(r\)-th moment about
the mean, \(\mu_r\) is the \(r\)-th moment about the arbitrary value \(a\),
\(\bar{X}\) is the mean, \(a\) is the arbitrary value, \(r\) is the order of
the moment, and \(f_i\) is the frequency of each data point.
This relation provides a way to compute moments about an
arbitrary value using moments about the mean and the difference between the
chosen value and the mean.
Q. 5. a) Define
weighted and unweighted index number and explain why weighted
Index numbers are preferred over unweighted
index numbers.
b) Find chain index numbers (using G.M to
average the relatives) for the following data of prices, taking 1970 as the
base year. (8+12)
Commodities Years
1970 1971 1972 1973 1974
A 40 43 45 42 50
B 160 162 165 161 168
C 20 29 52 23 27
D 240 245 247 250 255
### a) Weighted and
Unweighted Index Numbers:
**Definition:**
1. **Unweighted Index
Number:**
- An unweighted index number is a measure that does not take
into account the relative importance of different items in a group. It is a
simple average of the percentage changes in individual items.
\[ \text{Unweighted Index} = \left( \frac{\text{Sum of
Current Year Prices}}{\text{Sum of Base Year Prices}} \right) \times 100 \]
2. **Weighted Index
Number:**
- A weighted index number considers the importance or weight
of each item in the group. It reflects the significance of each item in the
overall index. The weights are often based on the importance of the items in
terms of their contribution to the total.
\[ \text{Weighted Index} = \left( \frac{\sum (W_i \times
P_{i, t})}{\sum (W_i \times P_{i, 0})} \right) \times 100 \]
where \(W_i\) is the
weight of the i-th item, \(P_{i, t}\) is the price of the i-th item in the
current year, and \(P_{i, 0}\) is the price of the i-th item in the base year.
**Why Weighted Index
Numbers are Preferred:**
1. **Reflecting
Importance:**
- Weighted index numbers reflect the relative importance of
different items. Items with higher weights have a more significant impact on
the overall index.
2. **Accurate
Representation:**
- In many cases, not all items in a group have the same
economic significance. Weighted index numbers provide a more accurate
representation of the true changes in the overall level.
3. **Dynamic
Nature:**
- Weighted indices can adapt to changes in the structure of
the economy or the consumption pattern by adjusting the weights.
4. **Avoiding
Misleading Conclusions:**
- Unweighted indices may provide misleading conclusions,
especially when items with different economic importance experience significant
price changes.
5. **Policy Decision
Support:**
- Weighted indices are more useful for policymakers as they
offer a more nuanced view of price changes, allowing for better-informed
decisions.
### b) Chain Index
Numbers:
Given Data:
\[ \begin{array}{cccccc}
\text{Commodities} & \text{Years} & 1970 & 1971
& 1972 & 1973 & 1974 \\
\hline
A & & 40 & 43 & 45 & 42 & 50 \\
B & & 160 & 162 & 165 & 161 & 168 \\
C & & 20 & 29 & 52 & 23 & 27 \\
D & & 240 & 245 & 247 & 250 & 255 \\
\end{array} \]
**Chain Index Numbers
Calculation using Geometric Mean to Average the Relatives:**
1. **Calculate Relatives:**
- Relatives are the ratios of the current year prices to the
prices of the previous year.
\[ R_{i,t} = \frac{P_{i,t}}{P_{i,t-1}} \]
2. **Calculate
Geometric Mean (GM):**
- Calculate the geometric mean of the relatives for each
commodity.
\[ GM_i = \left( \prod_{t=1}^{4} R_{i,t}
\right)^{\frac{1}{n}} \]
3. **Calculate Chain
Index Numbers:**
- Use the geometric
mean to calculate the chain index numbers.
\[ C_{i,t} = C_{i,t-1} \times GM_i \]
where \(C_{i,t-1}\) is the chain index for the previous
year.
\[ C_{i,1970} = 100 \] (Base Year)
\[ C_{i,1971} = C_{i,1970} \times GM_i \]
\[ C_{i,1972} = C_{i,1971} \times GM_i \]
\[ C_{i,1973} = C_{i,1972} \times GM_i \]
\[ C_{i,1974} = C_{i,1973} \times GM_i \]
**Results:**
\[ \begin{array}{cccccc}
\text{Commodities} & 1970 & 1971 & 1972 &
1973 & 1974 \\
\hline
A & 100 & 107.5 & 112.5 & 105 & 125 \\
B & 100 & 101.25 & 103.125 & 100.625 &
104.375 \\
C & 100 & 145 & 260 & 115 & 135 \\
D & 100 & 102.083 & 102.917 & 103.232 &
104.687 \\
\end{array} \]
These chain index numbers reflect the changes in prices for
each commodity relative to the base year (1970). The use of the geometric mean
ensures that the index is not sensitive to the choice of the base year,
providing a more meaningful comparison over time.
Dear Student,
Ye sample assignment h. Ye bilkul
copy paste h jo dusre student k pass b available h. Agr ap ne university
assignment send krni h to UNIQUE assignment
hasil krne k lye ham c contact kren:
0313-6483019
0334-6483019
0343-6244948
University c related har news c
update rehne k lye hamra channel subscribe kren: