Course: Business Statistics (1430)
Q. 1 (a) Differentiate between populations
and samples, and describe some advantages
of samples over populations.
(b) Why a frequency distribution is
constructed? Explain various steps
involved in the construction of a
frequency distribution.
(a) Populations and samples are two concepts commonly used in statistics:
1. **Populations**: A
population refers to the entire group of individuals, events, or objects that
are of interest to a researcher. It encompasses all possible observations that
meet certain criteria. For example, if a researcher is studying the average
height of all adult males in a country, the population would consist of every
adult male in that country.
Dear Student,
Ye sample assignment h. Ye bilkul
copy paste h jo dusre student k pass b available h. Agr ap ne university
assignment send krni h to UNIQUE assignment
hasil krne k lye ham c contact kren:
0313-6483019
0334-6483019
0343-6244948
University c related har news c
update rehne k lye hamra channel subscribe kren:
2. **Samples**: A sample
is a subset of the population that is selected for study. It represents a
smaller, manageable group from the larger population. Using the previous
example, instead of measuring the height of every adult male in the country
(which could be impractical or impossible), the researcher might select a
sample of adult males from different regions to represent the entire
population.
Advantages of samples over populations
include:
- **Practicality**: It is
often more feasible to collect data from a sample rather than the entire
population, especially when the population is large or spread out.
- **Cost-effectiveness**:
Conducting research on a sample can be more cost-effective in terms of
resources, time, and money.
- **Accuracy**: With
appropriate sampling techniques, a well-chosen sample can provide accurate
estimates of population parameters.
- **Manageability**:
Working with a smaller sample size allows for easier data collection, analysis,
and interpretation.
(b) A
frequency distribution is constructed to organize and summarize a set of data
by displaying the number of times each value (or range of values) occurs within
the dataset. It provides insights into the distribution, central tendency, and
variability of the data. The steps involved in constructing a frequency
distribution are as follows:
1. **Determine the range**: Identify
the minimum and maximum values in the dataset to establish the range of values
that will be included in the frequency distribution.
2. **Determine the number of intervals or
classes**: Decide on the number of intervals or classes to use in the
frequency distribution. This depends on the range of values and the desired
level of detail in the distribution.
3. **Calculate the interval width**:
Divide the range of values by the number of intervals to determine the width of
each interval. Ensure that each interval is of equal width.
4. **Create intervals**: Based
on the interval width, create non-overlapping intervals that cover the entire
range of values in the dataset.
5. **Count frequencies**: Count
the number of observations falling within each interval. This involves tallying
or counting the occurrences of values within each interval.
6. **Construct the frequency table**:
Organize the intervals and their corresponding frequencies into a table format,
typically with two columns: one for the intervals and another for the
frequencies.
7. **Optional:
Create a histogram or other graphical representation**: To visually represent
the frequency distribution, create a histogram or other appropriate graphical
display. This helps in understanding the shape and characteristics of the
distribution.
Q. 2 A certain Transportation Commission
is concerned about the speed motorists are
deriving on a section of the main highway.
Here are the speeds of 45 motorists:
15
32 45 46 42 39 68 47 18 31 48 49
56 52 39 48 69 61 44 42 38 52 55 58
62 58 48 56 58 48 47 52 37 64 29 55
38 29 62 49 69 18 61 55
Use these data to construct relative
frequency distributions using 5 equal intervals
and II equal intervals. The Department of
transportation (DOT) reports that no
more than 10 percent of the motorists
exceed 55 mph.
(a) Do the motorists follow the DOT's
report about driving pattern? Which
distribution did you use to answer this
part?
(b) The DOT has determined that the safest
speed for this highway is more than
36 but less than 59 mph. What percent of
the motorists drive within this
range? Which distribution did you use to
answer this part?
To
construct relative frequency distributions, we first need to determine the
intervals for the data provided. Then we'll count the number of observations
falling within each interval and calculate the relative frequencies. Finally,
we'll answer the questions based on the constructed distributions.
Given the data:
15,
32, 45, 46, 42, 39, 68, 47, 18, 31, 48, 49, 56, 52, 39, 48, 69, 61, 44, 42, 38,
52, 55, 58, 62, 58, 48, 56, 58, 48, 47, 52, 37, 64, 29, 55, 38, 29, 62, 49, 69,
18, 61, 55
(a) To
determine if the motorists follow the DOT's report about driving pattern (no
more than 10% exceed 55 mph), we'll use the distribution with 5 equal
intervals.
Interval
width = (Maximum speed - Minimum speed) / Number of intervals
Interval
width = (69 - 15) / 5 = 54 / 5 ≈ 10.8
Round
up to the nearest convenient value, we get interval width ≈ 11.
Using intervals of width 11:
-
15-25
-
26-36
-
37-47
-
48-58
-
59-69
Now, let's count the number of motorists
in each interval:
- 15-25: 3
- 26-36: 2
- 37-47: 8
- 48-58: 18
- 59-69: 14
Now, calculate the relative frequency for
each interval:
- 15-25: 3/45 ≈ 0.067
- 26-36: 2/45 ≈ 0.044
- 37-47: 8/45 ≈ 0.178
- 48-58: 18/45 = 0.4
- 59-69: 14/45 ≈ 0.311
Now,
let's check if more than 10% of motorists exceed 55 mph. In the interval 59-69,
the relative frequency is approximately 0.311, which is greater than 0.1. So,
more than 10% of motorists exceed 55 mph according to this distribution.
(b) To
determine the percent of motorists driving within the range of more than 36 but
less than 59 mph, we'll use the same distribution.
Count the number of motorists in the
intervals 37-47 and 48-58:
- 37-47: 8
- 48-58: 18
Add
these counts together to find the total number of motorists driving within the
specified range: 8 + 18 = 26.
Calculate the relative frequency for this
range:
Relative
frequency = 26/45 ≈ 0.578
So,
approximately 57.8% of the motorists drive within the range of more than 36 but
less than 59 mph according to this distribution.
Q. 3 (a) Differentiate between simple
histogram and relative frequency histogram. (10)
(b)
Here is a frequency distribution of length of phone calls made by 175 people
during a Labor Day weekend. Construct
histogram and frequency polygon
for these data.
Length in
Minutes 1–7 8–14 15–21 22–28 29–35 39–42
43–49 50–56
Frequency 45 32 34 22 16 12 9 5
(a) The main difference between a simple
histogram and a relative frequency histogram lies in how the data is
represented:
1. **Simple Histogram**: In a
simple histogram, the vertical axis represents the frequency or count of
observations within each interval or bin, while the horizontal axis represents
the intervals themselves. The heights of the bars correspond to the frequencies
of the intervals. It shows the absolute distribution of data.
2. **Relative Frequency Histogram**: In a
relative frequency histogram, the vertical axis represents the relative
frequency or proportion of observations within each interval or bin, while the
horizontal axis represents the intervals themselves. The heights of the bars
correspond to the relative frequencies of the intervals, which are calculated
by dividing the frequency of each interval by the total number of observations.
It shows the distribution of data relative to the total sample size, providing
insights into the proportion of observations in each interval.
In
summary, while a simple histogram displays the absolute frequency of
observations in each interval, a relative frequency histogram displays the
proportion or relative frequency of observations in each interval relative to
the total sample size.
(b) To construct the histogram and
frequency polygon for the given frequency distribution of phone call lengths,
we first need to determine the intervals and their frequencies:
|
Length in Minutes | Frequency |
|-------------------|-----------|
|
1–7 | 45 |
|
8–14 | 32 |
|
15–21 | 34 |
|
22–28 | 22 |
|
29–35 | 16 |
| 36–42 | 12 |
|
43–49 | 9 |
|
50–56 | 5 |
Now,
let's construct the histogram and frequency polygon using these data. We'll use
the upper limits of the intervals for the x-axis.
Length
in Minutes | Frequency
--- |
---
7 | 45
14 |
32
21 |
34
28 |
22
35 |
16
42 |
12
49 | 9
56 | 5
Now,
we can create the histogram and frequency polygon using this data.
Q. 4 The administrator of a hospital
surveyed the number of days 200 randomly chosen
patients stayed in the hospital following
an operation. The data are.
Hospital stay in Days 1–3 4–6 7–9 10–12
13–15 16–18 19–21 22–24
Frequency 18 90 44 21 9 9 4 5
(a) Calculate mean, standard deviation and
coefficient of variation.
(b) According to Chebyshev's theorem, how
many stays should be between and 17 days? How many are actually in that
interval?
(c) Because the distribution is roughly
bell-shaped, how many stays can we
expect between 0 and 17 days?
(a) To
calculate the mean, standard deviation, and coefficient of variation, we first
need to find the midpoint of each interval and then use these midpoints to
calculate the mean and standard deviation.
|
Hospital stay in Days | Midpoint (x) | Frequency (f) | fx |
|-----------------------|--------------|---------------|----|
|
1–3 | 2 | 18 | 36 |
|
4–6 | 5 | 90 | 450|
|
7–9 | 8 | 44 | 352|
|
10–12 | 11 | 21 | 231|
| 13–15 | 14 | 9 | 126|
|
16–18 | 17 | 9 | 153|
|
19–21 | 20 | 4 | 80 |
|
22–24 | 23 | 5 | 115|
Total
frequency (Σf) = 200
Total
fx (Σfx) = 1443
(a) Mean (μ):
\[ \mu
= \frac{\sum{fx}}{\sum{f}} = \frac{1443}{200} = 7.215 \]
Standard Deviation (σ):
\[
\sigma = \sqrt{\frac{\sum{(x - \mu)^2f}}{N}} \]
\[
\sigma = \sqrt{\frac{\sum{(x - 7.215)^2f}}{200}} \]
Coefficient of Variation (CV):
\[ CV
= \frac{\sigma}{\mu} \times 100\]
(b)
According to Chebyshev's theorem, at least \(1 - \frac{1}{{k^2}}\) of the data
lie within k standard deviations of the mean. For example, for k = 2, at least
\(1 - \frac{1}{2^2} = \frac{3}{4}\) or 75% of the data should lie within 2
standard deviations of the mean.
For the interval between 0 and 17 days
(mean - 2 standard deviations):
\[
7.215 - 2 \times \text{standard deviation} \]
(c)
Since the distribution is roughly bell-shaped and follows a normal
distribution, we can use the empirical rule (also known as the 68-95-99.7 rule)
to estimate the percentage of data within certain ranges. According to this
rule, approximately 68% of the data lies within one standard deviation of the mean,
approximately 95% lies within two standard deviations, and approximately 99.7%
lies within three standard deviations.
So,
between 0 and 17 days, we can expect approximately 95% of the stays.
Q. 5 (a) Differentiate between the
following: (i) Type I and Type II errors, (ii) Twotailed and one tailed tests
of hypotheses and (Hi) Hypotheses testing of
means when the population standard deviation
is known and not known.
(b) For a sample of 60 women taken from
population of over 5000 enrolled in a weight-reducing program?
(a)
(i) **Type I and Type II errors**:
- **Type I Error**: This
occurs when a null hypothesis is incorrectly rejected when it is actually true.
In other words, it's a false positive, where the test wrongly concludes that
there is a significant effect or difference when there isn't one.
- **Type II Error**: This
occurs when a null hypothesis is incorrectly not rejected when it is actually
false. In other words, it's a false negative, where the test fails to detect a
significant effect or difference when there actually is one.
(ii) **Two-tailed and one-tailed tests of
hypotheses**:
- **Two-tailed Test**: This
type of test is used to determine if there is a significant difference or
effect in both directions from the hypothesized value. It checks if the
observed data falls into either tail of the distribution. It's used when the
research question is concerned with whether there is a difference, regardless
of the direction.
- **One-tailed Test**: This
type of test is used to determine if there is a significant difference or
effect in only one direction from the hypothesized value. It checks if the
observed data falls into one specific tail of the distribution. It's used when
the research question specifies a particular direction of difference.
(iii) **Hypothesis testing of means when
the population standard deviation is known and not known**:
- **Known Population Standard Deviation**: When
the population standard deviation (σ) is known, the z-test is typically used
for hypothesis testing of means. This is because the population standard
deviation allows for accurate calculation of the standard error of the mean,
which is necessary for calculating the test statistic.
- **Unknown Population Standard
Deviation**: When the population standard deviation (σ) is
unknown, the t-test is used for hypothesis testing of means. In this case, the
sample standard deviation (s) is used as an estimate of the population standard
deviation, and the t-distribution is used instead of the normal distribution.
(b)
The question seems to be incomplete. Could you please provide the rest of the
information or clarify what you would like to know about the sample of 60 women
from the weight-reducing program population?