AIOU Hub: Statistics-I-1 (394) Auttumn 2023

Statistics-I (394)

Q. 1: a) Define descriptive and inferential statistics and differentiate between them.

b) Define the following terms:

i) Population and sample ii) Parameter and statistic

ii) Quantitative variable iv) Qualitative variable

Certainly, I'll provide a detailed explanation of the topics you mentioned. Due to the text limit, I won't be able to provide a 1000-word response in one go. Let's start with the first part:

Dear Student,

Ye sample assignment h. Ye bilkul copy paste h jo dusre student k pass b available h. Agr ap ne university assignment send krni h to UNIQUE assignment hasil krne k lye ham c contact kren:

0313-6483019

0334-6483019

0343-6244948

University c related har news c update rehne k lye hamra channel subscribe kren:

AIOU Hub

Descriptive and Inferential Statistics

Descriptive Statistics:

Descriptive statistics involves the organization, analysis, and presentation of data to provide a summary or description of its main features. It helps in simplifying large amounts of data in a meaningful way. Common measures in descriptive statistics include measures of central tendency (mean, median, mode), measures of variability (range, variance, standard deviation), and measures of distribution (skewness, kurtosis).

For example, if we have a dataset of the ages of a group of people, descriptive statistics would help us understand the typical age (mean), the age at which most people fall (mode), and how spread out the ages are (standard deviation).

#### Inferential Statistics:

Inferential statistics involves using data from a sample to make inferences or draw conclusions about a population. It uses probability theory to make predictions or generalizations about a larger group based on a smaller subset of that group. Inferential statistics includes hypothesis testing, regression analysis, and analysis of variance.

Continuing with the age example, inferential statistics might involve using the ages of a sample of people to make predictions or inferences about the ages of the entire population from which the sample was drawn.

**Differentiation between Descriptive and Inferential Statistics:**

- **Purpose:**

- Descriptive statistics aim to summarize and describe the main features of a dataset.

- Inferential statistics make inferences and predictions about a population based on a sample of that population.

- **Example:**

- Descriptive: Calculating the average age of a group of students.

- Inferential: Using the average age of a sample to make predictions about the average age of all students in a school.

- **Data Representation:**

- Descriptive statistics use charts, graphs, and summary measures.

- Inferential statistics involve probability distributions and confidence intervals.

Now, let's move on to the second part of your question.

### Definitions:

#### i) Population and Sample:

- **Population:**

- The population is the entire group that is the subject of the study.

- Example: All students in a university.

- **Sample:**

- A sample is a subset of the population.

- Example: A group of 100 students selected from the entire university.

#### ii) Parameter and Statistic:

- **Parameter:**

- A parameter is a numerical value that describes a characteristic of a population.

- Example: The average income of all households in a city.

- **Statistic:**

- A statistic is a numerical value that describes a characteristic of a sample.

- Example: The average income of a sample of 100 households in a city.

#### iii) Quantitative Variable:

- **Quantitative Variable:**

- A quantitative variable is a type of variable that takes numerical values and represents some kind of measurement.

- Example: Height, weight, income.

#### iv) Qualitative Variable:

- **Qualitative Variable:**

- A qualitative variable is a type of variable that represents categories or labels.

- Example: Gender, color, marital status.

This provides an overview of the terms you asked about. If you'd like more details or examples, feel free to let me know!

Q. 2: a) Write down the important points for drawing graphs.

b) Write is a frequency distribution? How is it constructed?

c) Give the merits and demerits of arithmetic mean and median. (6+7+7)

Certainly, let's delve into each part of your question:

### a) Important Points for Drawing Graphs:

Drawing graphs is an essential aspect of data analysis, providing visual representation for better understanding. Here are key points for drawing graphs:

1. **Selecting the Right Type of Graph:**

- Choose a graph type that suits the data and the message you want to convey. Common types include bar graphs, line graphs, scatter plots, and pie charts.

2. **Labeling Axes:**

- Clearly label the x-axis and y-axis with appropriate variable names. Include units of measurement when applicable.

. **Choosing Appropriate Scale:**

- Select a suitable scale for each axis to ensure that the data fits well within the graph, avoiding crowding or excessive white space.

4. **Title and Legend:**

- Provide a clear and concise title that summarizes the main point of the graph. Include a legend if the graph includes multiple data series.

5. **Color and Style:**

- Use colors and styles thoughtfully to enhance clarity. Ensure that colors are distinguishable for those with color vision deficiencies.

6. **Data Accuracy:**

- Double-check data points to ensure accuracy. Mistakes in data entry can lead to misleading graphs.

7. **Consistency:**

- Maintain consistency in formatting throughout the graph, such as bar widths or line styles. This aids in clarity and interpretation.

8. **Highlighting Key Points:**

- Emphasize important data points or trends using annotations, arrows, or other visual cues.

9. **Data Source:**

- Include a note about the source of the data to establish credibility and transparency.

10. **Audience Consideration:**

- Consider the audience when designing graphs. Ensure that the graph is understandable to both experts and non-experts.

### b) Frequency Distribution:

A frequency distribution is a table that displays the distribution of a set of data. It shows the number of observations falling into different intervals or categories. The construction involves several steps:

1. **Data Collection:**

- Gather the raw data that you want to analyze.

2. **Determine the Number of Classes:**

- Decide on the number of intervals or classes. Too few classes may oversimplify, while too many can obscure patterns.

3. **Calculate the Range:**

- Find the range of the data (difference between the maximum and minimum values).

4. **Calculate Class Width:**

- Determine the width of each class interval by dividing the range by the number of classes. Round up to ensure all data points are included.

5. **Set up the Classes:**

- Establish the intervals using the class width. The classes should be mutually exclusive and exhaustive, covering the entire range of data.

6. **Tally and Count:**

- Tally the number of observations falling into each class interval.

7. **Create Frequency Table:**

- Construct a table with columns for classes and their respective frequencies.

8. **Calculate Cumulative Frequency:**

- Optionally, add a column for cumulative frequency, which represents the total frequency up to a given class.

### c) Merits and Demerits of Arithmetic Mean and Median:

#### Arithmetic Mean:

**Merits:**

1. **Sensitive to all Values:**

- The mean considers all values in the dataset, making it sensitive to changes in any observation.

2. **Balancing Property:**

- The sum of deviations above the mean equals the sum of deviations below the mean, maintaining balance.

3. **Useful in Statistical Analysis:**

- The mean is often used in statistical analysis and various mathematical calculations.

**Demerits:**

1. **Affected by Extreme Values:**

- Outliers or extreme values can significantly impact the mean, making it less representative of the central tendency.

2. **Not Appropriate for Skewed Distributions:**

- In skewed distributions, the mean may not accurately reflect the central location, as it is influenced by the skewness.

#### Median:

**Merits:**

1. **Not Sensitive to Extreme Values:**

- The median is not influenced by extreme values or outliers, making it a robust measure of central tendency.

2. **Appropriate for Skewed Distributions:**

- It is suitable for describing the central tendency in skewed distributions.

3. **Simple to Understand:**

- The median is easy to understand and calculate, especially for ordinal or interval data.

**Demerits:**

1. **Less Sensitive to Small Changes:**

- The median may not reflect small changes in the dataset, particularly when dealing with a large sample.

2. **Not Utilizing All Data Points:**

- It does not use all the information in the dataset; it only considers the middle value(s).

In conclusion, both the mean and median have their merits and demerits. The choice between them depends on the nature of the data and the specific goals of the analysis.

Q. 3:a) Define Histogram. Draw a Histogram for the following frequency distribution:

X 32 37 42 47 52 57 62 67

f 3 17 28 47 54 31 14 4

b) Define measures of location. Explain properties of good average.

c) Compute the Mean and mode for the following data; (15+5)

Classes 86-90 91-95 96-100 101-105 106-110 111-115

f 6 4 10 6 3 1

### a) Histogram:

**Definition:**

A histogram is a graphical representation of the distribution of a dataset. It consists of a series of bars, each representing a range of values, called a class interval. The height of each bar corresponds to the frequency or relative frequency of the values within that interval.

**Drawing a Histogram for the Given Frequency Distribution:**

| X | 32 | 37 | 42 | 47 | 52 | 57 | 62 | 67 |

|------|----|----|----|----|----|----|----|----|

| f | 3 | 17 | 28 | 47 | 54 | 31 | 14 | 4 |

1. **Identify Class Intervals:**

- The class intervals are determined by the given X values.

2. **Draw Axes:**

- Draw horizontal and vertical axes. The horizontal axis represents the class intervals, and the vertical axis represents frequency.

3. **Draw Bars:**

- For each class interval, draw a bar with a height corresponding to the frequency of that interval.

![Histogram](https://i.imgur.com/vFC4WGJ.png)

### b) Measures of Location and Properties of Good Average:

**Measures of Location:**

Measures of location are statistical measures that describe the position of a single value within a dataset. Common measures include:

1. **Mean (Arithmetic Average):**

- The sum of all values divided by the number of values.

2. **Median:**

- The middle value in a dataset when it is arranged in ascending or descending order.

3. **Mode:**

- The value that occurs most frequently in a dataset.

**Properties of Good Average:**

1. **Uniqueness:**

- The average should be a unique value, providing a representative measure for the entire dataset.

2. **Sensitivity to Changes:**

- The average should be sensitive to changes in the dataset, reflecting shifts in central tendency.

3. **Additivity:**

- The average of a combined dataset should be the sum of the averages of its parts.

4. **Non-Bias:**

- The average should not be systematically too high or too low; it should accurately represent the data.

5. **Ease of Computation:**

- The average should be easy to compute and understand for practical use.

### c) Mean and Mode Calculation:

Given Data:

| Classes | 86-90 | 91-95 | 96-100 | 101-105 | 106-110 | 111-115 |

|---------|-------|-------|--------|---------|---------|---------|

| f | 6 | 4 | 10 | 6 | 3 | 1 |

**Mean Calculation:**

\[ \text{Mean} = \frac{\sum (f \times \text{Midpoint})}{\sum f} \]

\[ \text{Midpoint} = \frac{\text{Lower Bound} + \text{Upper Bound}}{2} \]

\[ \text{Mean} = \frac{(6 \times 88) + (4 \times 93) + (10 \times 98) + (6 \times 103) + (3 \times 108) + (1 \times 113)}{6+4+10+6+3+1} \]

\[ \text{Mean} = \frac{528 + 372 + 980 + 618 + 324 + 113}{30} \]

\[ \text{Mean} = \frac{2935}{30} \]

\[ \text{Mean} = 97.83 \]

**Mode Calculation:**

The mode is the class interval with the highest frequency.

Here, the class interval with the highest frequency is \(96-100\).

Therefore, the mode is \(96-100\).

In summary, the mean for the given data is approximately \(97.83\), and the mode is \(96-100\).

Q. 4. a) Explain the difference between absolute dispersion and relative dispersion.

b) Compute median and mean deviation from median for the data given below:

X 6 8 10 12 14 16 18 20 22

f 5 10 18 20 22 14 7 3 1

c) What are moments about mean and about an arbitrary value? Give the relation between them. (8+6+6)

### a) Absolute Dispersion vs. Relative Dispersion:

**Absolute Dispersion:**

Absolute dispersion measures the spread or variability of a dataset in its original units. It provides information about how much individual data points differ from the central tendency. Common measures of absolute dispersion include range, mean deviation, variance, and standard deviation.

- **Range:** The difference between the maximum and minimum values.

- **Mean Deviation:** The average of the absolute differences between each data point and the mean.

- **Variance:** The average of the squared differences between each data point and the mean.

- **Standard Deviation:** The square root of the variance.

**Relative Dispersion:**

Relative dispersion, on the other hand, expresses the spread of data in terms of a ratio or percentage relative to a central value. It allows for comparisons between datasets with different units or scales. The coefficient of variation (CV) is a common measure of relative dispersion.

- **Coefficient of Variation (CV):** The ratio of the standard deviation to the mean, expressed as a percentage.

**Difference:**

- **Focus:**

- Absolute dispersion focuses on the spread of data in its original units.

- Relative dispersion focuses on the spread of data relative to a central value, allowing for comparison between datasets.

- **Units:**

- Absolute dispersion is expressed in the same units as the original data.

- Relative dispersion is expressed as a ratio or percentage, making it unitless and suitable for comparing datasets with different scales.

- **Use Cases:**

- Absolute dispersion is useful for understanding the variability in the original data.

- Relative dispersion is useful when comparing the variability of datasets with different means or scales.

### b) Median and Mean Deviation from Median:

Given Data:

\[ X \quad 6 \quad 8 \quad 10 \quad 12 \quad 14 \quad 16 \quad 18 \quad 20 \quad 22 \]

\[ f \quad 5 \quad 10 \quad 18 \quad 20 \quad 22 \quad 14 \quad 7 \quad 3 \quad 1 \]

**Median Calculation:**

The median is the middle value when the data is arranged in ascending or descending order.

- Arrange the data: \(6, 8, 10, 12, 14, 16, 18, 20, 22\).

- The median is the middle value, which is \(14\).

**Mean Deviation from Median Calculation:**

\[ \text{Mean Deviation from Median} = \frac{\sum |X_i - \text{Median}| \times f_i}{\sum f} \]

\[ \text{Mean Deviation from Median} = \frac{(8 + 6 + 4 + 2 + 0 + 2 + 4 + 6 + 8) \times (5 + 10 + 18 + 20 + 22 + 14 + 7 + 3 + 1)}{100} \]

\[ \text{Mean Deviation from Median} = \frac{40 \times 100}{100} \]

\[ \text{Mean Deviation from Median} = 40 \]

### c) Moments about Mean and Arbitrary Value:

**Moments about Mean:**

Moments about the mean involve raising the difference between each data point and the mean to a certain power and then calculating the average. The \(r\)-th moment about the mean is denoted by \(\mu'_r\) and is calculated as:

\[ \mu'_r = \frac{\sum (X_i - \bar{X})^r \times f_i}{N} \]

where \(r\) is the order of the moment, \(X_i\) is each data point, \(\bar{X}\) is the mean, \(f_i\) is the frequency of each data point, and \(N\) is the total number of data points.

**Moments about an Arbitrary Value:**

Moments about an arbitrary value involve raising the difference between each data point and the chosen value to a certain power and then calculating the average. The \(r\)-th moment about an arbitrary value \(a\) is denoted by \(\mu_r\) and is calculated as:

\[ \mu_r = \frac{\sum (X_i - a)^r \times f_i}{N} \]

**Relation between Moments about Mean and Arbitrary Value:**

The \(r\)-th moment about an arbitrary value \(a\) is related to the \(r\)-th moment about the mean by the equation:

\[ \mu_r = \mu'_r + r \times (a - \bar{X}) \times \mu'_{r-1} + \frac{r \times (r-1)}{2} \times (a - \bar{X})^2 \times \mu'_{r-2} + \ldots \]

In this equation, \(\mu'_r\) is the \(r\)-th moment about the mean, \(\mu_r\) is the \(r\)-th moment about the arbitrary value \(a\), \(\bar{X}\) is the mean, \(a\) is the arbitrary value, \(r\) is the order of the moment, and \(f_i\) is the frequency of each data point.

This relation provides a way to compute moments about an arbitrary value using moments about the mean and the difference between the chosen value and the mean.

Q. 5. a) Define weighted and unweighted index number and explain why weighted

Index numbers are preferred over unweighted index numbers.

b) Find chain index numbers (using G.M to average the relatives) for the following data of prices, taking 1970 as the base year. (8+12)

Commodities Years

1970 1971 1972 1973 1974

A 40 43 45 42 50

B 160 162 165 161 168

C 20 29 52 23 27

D 240 245 247 250 255

### a) Weighted and Unweighted Index Numbers:

**Definition:**

1. **Unweighted Index Number:**

- An unweighted index number is a measure that does not take into account the relative importance of different items in a group. It is a simple average of the percentage changes in individual items.

\[ \text{Unweighted Index} = \left( \frac{\text{Sum of Current Year Prices}}{\text{Sum of Base Year Prices}} \right) \times 100 \]

2. **Weighted Index Number:**

- A weighted index number considers the importance or weight of each item in the group. It reflects the significance of each item in the overall index. The weights are often based on the importance of the items in terms of their contribution to the total.

\[ \text{Weighted Index} = \left( \frac{\sum (W_i \times P_{i, t})}{\sum (W_i \times P_{i, 0})} \right) \times 100 \]

where \(W_i\) is the weight of the i-th item, \(P_{i, t}\) is the price of the i-th item in the current year, and \(P_{i, 0}\) is the price of the i-th item in the base year.

**Why Weighted Index Numbers are Preferred:**

1. **Reflecting Importance:**

- Weighted index numbers reflect the relative importance of different items. Items with higher weights have a more significant impact on the overall index.

2. **Accurate Representation:**

- In many cases, not all items in a group have the same economic significance. Weighted index numbers provide a more accurate representation of the true changes in the overall level.

3. **Dynamic Nature:**

- Weighted indices can adapt to changes in the structure of the economy or the consumption pattern by adjusting the weights.

4. **Avoiding Misleading Conclusions:**

- Unweighted indices may provide misleading conclusions, especially when items with different economic importance experience significant price changes.

5. **Policy Decision Support:**

- Weighted indices are more useful for policymakers as they offer a more nuanced view of price changes, allowing for better-informed decisions.

### b) Chain Index Numbers:

Given Data:

\[ \begin{array}{cccccc}

\text{Commodities} & \text{Years} & 1970 & 1971 & 1972 & 1973 & 1974 \\

\hline

A & & 40 & 43 & 45 & 42 & 50 \\

B & & 160 & 162 & 165 & 161 & 168 \\

C & & 20 & 29 & 52 & 23 & 27 \\

D & & 240 & 245 & 247 & 250 & 255 \\

\end{array} \]

**Chain Index Numbers Calculation using Geometric Mean to Average the Relatives:**

1. **Calculate Relatives:**

- Relatives are the ratios of the current year prices to the prices of the previous year.

\[ R_{i,t} = \frac{P_{i,t}}{P_{i,t-1}} \]

2. **Calculate Geometric Mean (GM):**

- Calculate the geometric mean of the relatives for each commodity.

\[ GM_i = \left( \prod_{t=1}^{4} R_{i,t} \right)^{\frac{1}{n}} \]

3. **Calculate Chain Index Numbers:**

- Use the geometric mean to calculate the chain index numbers.

\[ C_{i,t} = C_{i,t-1} \times GM_i \]

where \(C_{i,t-1}\) is the chain index for the previous year.

\[ C_{i,1970} = 100 \] (Base Year)

\[ C_{i,1971} = C_{i,1970} \times GM_i \]

\[ C_{i,1972} = C_{i,1971} \times GM_i \]

\[ C_{i,1973} = C_{i,1972} \times GM_i \]

\[ C_{i,1974} = C_{i,1973} \times GM_i \]

**Results:**

\[ \begin{array}{cccccc}

\text{Commodities} & 1970 & 1971 & 1972 & 1973 & 1974 \\

\hline

A & 100 & 107.5 & 112.5 & 105 & 125 \\

B & 100 & 101.25 & 103.125 & 100.625 & 104.375 \\

C & 100 & 145 & 260 & 115 & 135 \\

D & 100 & 102.083 & 102.917 & 103.232 & 104.687 \\

\end{array} \]

These chain index numbers reflect the changes in prices for each commodity relative to the base year (1970). The use of the geometric mean ensures that the index is not sensitive to the choice of the base year, providing a more meaningful comparison over time.

Dear Student,

Ye sample assignment h. Ye bilkul copy paste h jo dusre student k pass b available h. Agr ap ne university assignment send krni h to UNIQUE assignment hasil krne k lye ham c contact kren:

0313-6483019

0334-6483019

0343-6244948

University c related har news c update rehne k lye hamra channel subscribe kren:

AIOU Hub

AIOU Hub

Friday, November 17

Statistics-I-1 (394) Auttumn 2023

Contact Us