Sunday, April 21

Course: Business Statistics (1430) - Auttumm 2023 - Assignment 1

Course: Business Statistics (1430)

Q. 1 (a) Differentiate between populations and samples, and describe some advantages

of samples over populations.

(b) Why a frequency distribution is constructed? Explain various steps

involved in the construction of a frequency distribution.

(a) Populations and samples are two concepts commonly used in statistics:

1. **Populations**: A population refers to the entire group of individuals, events, or objects that are of interest to a researcher. It encompasses all possible observations that meet certain criteria. For example, if a researcher is studying the average height of all adult males in a country, the population would consist of every adult male in that country.

Dear Student,

Ye sample assignment h. Ye bilkul copy paste h jo dusre student k pass b available h. Agr ap ne university assignment send krni h to UNIQUE assignment hasil krne k lye ham c contact kren:

0313-6483019

0334-6483019

0343-6244948

University c related har news c update rehne k lye hamra channel subscribe kren:

AIOU Hub

2. **Samples**: A sample is a subset of the population that is selected for study. It represents a smaller, manageable group from the larger population. Using the previous example, instead of measuring the height of every adult male in the country (which could be impractical or impossible), the researcher might select a sample of adult males from different regions to represent the entire population.

Advantages of samples over populations include:

- **Practicality**: It is often more feasible to collect data from a sample rather than the entire population, especially when the population is large or spread out.

- **Cost-effectiveness**: Conducting research on a sample can be more cost-effective in terms of resources, time, and money.

- **Accuracy**: With appropriate sampling techniques, a well-chosen sample can provide accurate estimates of population parameters.

- **Manageability**: Working with a smaller sample size allows for easier data collection, analysis, and interpretation.

 

(b) A frequency distribution is constructed to organize and summarize a set of data by displaying the number of times each value (or range of values) occurs within the dataset. It provides insights into the distribution, central tendency, and variability of the data. The steps involved in constructing a frequency distribution are as follows:

1. **Determine the range**: Identify the minimum and maximum values in the dataset to establish the range of values that will be included in the frequency distribution.

2. **Determine the number of intervals or classes**: Decide on the number of intervals or classes to use in the frequency distribution. This depends on the range of values and the desired level of detail in the distribution.

3. **Calculate the interval width**: Divide the range of values by the number of intervals to determine the width of each interval. Ensure that each interval is of equal width.

4. **Create intervals**: Based on the interval width, create non-overlapping intervals that cover the entire range of values in the dataset.

5. **Count frequencies**: Count the number of observations falling within each interval. This involves tallying or counting the occurrences of values within each interval.

6. **Construct the frequency table**: Organize the intervals and their corresponding frequencies into a table format, typically with two columns: one for the intervals and another for the frequencies.

7. **Optional: Create a histogram or other graphical representation**: To visually represent the frequency distribution, create a histogram or other appropriate graphical display. This helps in understanding the shape and characteristics of the distribution.

 

Q. 2 A certain Transportation Commission is concerned about the speed motorists are

deriving on a section of the main highway. Here are the speeds of 45 motorists:

 15 32 45 46 42 39 68 47 18 31 48 49

56 52 39 48 69 61 44 42 38 52 55 58

62 58 48 56 58 48 47 52 37 64 29 55

38 29 62 49 69 18 61 55

Use these data to construct relative frequency distributions using 5 equal intervals

and II equal intervals. The Department of transportation (DOT) reports that no

more than 10 percent of the motorists exceed 55 mph.

(a) Do the motorists follow the DOT's report about driving pattern? Which

distribution did you use to answer this part?

(b) The DOT has determined that the safest speed for this highway is more than

36 but less than 59 mph. What percent of the motorists drive within this

range? Which distribution did you use to answer this part?

To construct relative frequency distributions, we first need to determine the intervals for the data provided. Then we'll count the number of observations falling within each interval and calculate the relative frequencies. Finally, we'll answer the questions based on the constructed distributions.

Given the data:

15, 32, 45, 46, 42, 39, 68, 47, 18, 31, 48, 49, 56, 52, 39, 48, 69, 61, 44, 42, 38, 52, 55, 58, 62, 58, 48, 56, 58, 48, 47, 52, 37, 64, 29, 55, 38, 29, 62, 49, 69, 18, 61, 55

(a) To determine if the motorists follow the DOT's report about driving pattern (no more than 10% exceed 55 mph), we'll use the distribution with 5 equal intervals.

Interval width = (Maximum speed - Minimum speed) / Number of intervals

Interval width = (69 - 15) / 5 = 54 / 5 ≈ 10.8

Round up to the nearest convenient value, we get interval width ≈ 11.

Using intervals of width 11:

- 15-25

- 26-36

- 37-47

- 48-58

- 59-69

Now, let's count the number of motorists in each interval:

- 15-25: 3

- 26-36: 2

- 37-47: 8

- 48-58: 18

- 59-69: 14

Now, calculate the relative frequency for each interval:

- 15-25: 3/45 ≈ 0.067

- 26-36: 2/45 ≈ 0.044

- 37-47: 8/45 ≈ 0.178

- 48-58: 18/45 = 0.4

- 59-69: 14/45 ≈ 0.311

Now, let's check if more than 10% of motorists exceed 55 mph. In the interval 59-69, the relative frequency is approximately 0.311, which is greater than 0.1. So, more than 10% of motorists exceed 55 mph according to this distribution.

(b) To determine the percent of motorists driving within the range of more than 36 but less than 59 mph, we'll use the same distribution.

Count the number of motorists in the intervals 37-47 and 48-58:

- 37-47: 8

- 48-58: 18

Add these counts together to find the total number of motorists driving within the specified range: 8 + 18 = 26.

Calculate the relative frequency for this range:

Relative frequency = 26/45 ≈ 0.578

So, approximately 57.8% of the motorists drive within the range of more than 36 but less than 59 mph according to this distribution.

 

Q. 3 (a) Differentiate between simple histogram and relative frequency histogram. (10)

 (b) Here is a frequency distribution of length of phone calls made by 175 people

during a Labor Day weekend. Construct histogram and frequency polygon

for these data.

Length in

Minutes 1–7 8–14 15–21 22–28 29–35 39–42 43–49 50–56

Frequency 45 32 34 22 16 12 9 5

(a) The main difference between a simple histogram and a relative frequency histogram lies in how the data is represented:

1. **Simple Histogram**: In a simple histogram, the vertical axis represents the frequency or count of observations within each interval or bin, while the horizontal axis represents the intervals themselves. The heights of the bars correspond to the frequencies of the intervals. It shows the absolute distribution of data.

2. **Relative Frequency Histogram**: In a relative frequency histogram, the vertical axis represents the relative frequency or proportion of observations within each interval or bin, while the horizontal axis represents the intervals themselves. The heights of the bars correspond to the relative frequencies of the intervals, which are calculated by dividing the frequency of each interval by the total number of observations. It shows the distribution of data relative to the total sample size, providing insights into the proportion of observations in each interval.

In summary, while a simple histogram displays the absolute frequency of observations in each interval, a relative frequency histogram displays the proportion or relative frequency of observations in each interval relative to the total sample size.

(b) To construct the histogram and frequency polygon for the given frequency distribution of phone call lengths, we first need to determine the intervals and their frequencies:

| Length in Minutes | Frequency |

|-------------------|-----------|

| 1–7               | 45        |

| 8–14              | 32        |

| 15–21             | 34        |

| 22–28             | 22        |

| 29–35             | 16        |

| 36–42             | 12        |

| 43–49             | 9         |

| 50–56             | 5         |

Now, let's construct the histogram and frequency polygon using these data. We'll use the upper limits of the intervals for the x-axis.

Length in Minutes | Frequency

--- | ---

7 | 45

14 | 32

21 | 34

28 | 22

35 | 16

42 | 12

49 | 9

56 | 5

Now, we can create the histogram and frequency polygon using this data.

 

Q. 4 The administrator of a hospital surveyed the number of days 200 randomly chosen

patients stayed in the hospital following an operation. The data are.

Hospital stay in Days 1–3 4–6 7–9 10–12 13–15 16–18 19–21 22–24

Frequency 18 90 44 21 9 9 4 5

(a) Calculate mean, standard deviation and coefficient of variation.

(b) According to Chebyshev's theorem, how many stays should be between and 17 days? How many are actually in that interval?

(c) Because the distribution is roughly bell-shaped, how many stays can we

expect between 0 and 17 days?

(a) To calculate the mean, standard deviation, and coefficient of variation, we first need to find the midpoint of each interval and then use these midpoints to calculate the mean and standard deviation.

| Hospital stay in Days | Midpoint (x) | Frequency (f) | fx |

|-----------------------|--------------|---------------|----|

| 1–3                   | 2            | 18            | 36 |

| 4–6                   | 5            | 90            | 450|

| 7–9                   | 8            | 44            | 352|

| 10–12                 | 11           | 21            | 231|

| 13–15                 | 14           | 9             | 126|

| 16–18                 | 17           | 9             | 153|

| 19–21                 | 20           | 4             | 80 |

| 22–24                 | 23           | 5             | 115|

Total frequency (Σf) = 200

Total fx (Σfx) = 1443

(a) Mean (μ):

\[ \mu = \frac{\sum{fx}}{\sum{f}} = \frac{1443}{200} = 7.215 \]

Standard Deviation (σ):

\[ \sigma = \sqrt{\frac{\sum{(x - \mu)^2f}}{N}} \]

\[ \sigma = \sqrt{\frac{\sum{(x - 7.215)^2f}}{200}} \]

Coefficient of Variation (CV):

\[ CV = \frac{\sigma}{\mu} \times 100\]

(b) According to Chebyshev's theorem, at least \(1 - \frac{1}{{k^2}}\) of the data lie within k standard deviations of the mean. For example, for k = 2, at least \(1 - \frac{1}{2^2} = \frac{3}{4}\) or 75% of the data should lie within 2 standard deviations of the mean.

For the interval between 0 and 17 days (mean - 2 standard deviations):

\[ 7.215 - 2 \times \text{standard deviation} \]

 

(c) Since the distribution is roughly bell-shaped and follows a normal distribution, we can use the empirical rule (also known as the 68-95-99.7 rule) to estimate the percentage of data within certain ranges. According to this rule, approximately 68% of the data lies within one standard deviation of the mean, approximately 95% lies within two standard deviations, and approximately 99.7% lies within three standard deviations.

So, between 0 and 17 days, we can expect approximately 95% of the stays.

 

Q. 5 (a) Differentiate between the following: (i) Type I and Type II errors, (ii) Twotailed and one tailed tests of hypotheses and (Hi) Hypotheses testing of

means when the population standard deviation is known and not known.

(b) For a sample of 60 women taken from population of over 5000 enrolled in a weight-reducing program?

(a)  (i) **Type I and Type II errors**:

- **Type I Error**: This occurs when a null hypothesis is incorrectly rejected when it is actually true. In other words, it's a false positive, where the test wrongly concludes that there is a significant effect or difference when there isn't one.

- **Type II Error**: This occurs when a null hypothesis is incorrectly not rejected when it is actually false. In other words, it's a false negative, where the test fails to detect a significant effect or difference when there actually is one.

(ii) **Two-tailed and one-tailed tests of hypotheses**:

- **Two-tailed Test**: This type of test is used to determine if there is a significant difference or effect in both directions from the hypothesized value. It checks if the observed data falls into either tail of the distribution. It's used when the research question is concerned with whether there is a difference, regardless of the direction.

- **One-tailed Test**: This type of test is used to determine if there is a significant difference or effect in only one direction from the hypothesized value. It checks if the observed data falls into one specific tail of the distribution. It's used when the research question specifies a particular direction of difference.

(iii) **Hypothesis testing of means when the population standard deviation is known and not known**:

- **Known Population Standard Deviation**: When the population standard deviation (σ) is known, the z-test is typically used for hypothesis testing of means. This is because the population standard deviation allows for accurate calculation of the standard error of the mean, which is necessary for calculating the test statistic.

- **Unknown Population Standard Deviation**: When the population standard deviation (σ) is unknown, the t-test is used for hypothesis testing of means. In this case, the sample standard deviation (s) is used as an estimate of the population standard deviation, and the t-distribution is used instead of the normal distribution.

(b) The question seems to be incomplete. Could you please provide the rest of the information or clarify what you would like to know about the sample of 60 women from the weight-reducing program population?