1. Eighteen SJSU students were surveyed about their opinions on two teaching evaluations: ratemyprofessor.com and SOTES (Student Opinion of Teaching Effectiveness). They indicated the level of effectiveness on a 7-point scale, with “1” being “not very effective” and “7” being “very effective”. The rating scores are shown below by teaching evaluation.Observation SOTES ratemyprofessor.com1 4 72 6 63 5 44 6 55 7 36 5 47 7 68 4 29 510 3A) Which teaching evaluation is more effective? Show all the calculations (1 pt).B) Students have more different or diverse opinions on which teaching evaluation? Use all the observations and show all the calculations (3 pts).2. Ms. Green is responsible for recruiting international students for a summer internship at her company. Her recruiting job is almost done, but she needs to pick one last student from three candidates. They are all very competitive and it seems that the only way to rank them is their college GPAs. Ming is from China and his score is 91.8 out of 100. Leif is from Norway and his score is 4.21 out of 5. Jason is from the U.S. and his score is 3.55 out of 4. Luckily, Ms. Green also has the mean and standard deviation of college GPAs from each country (see table). If you were Ms. Green, which candidate will you choose for the summer internship? Show all the calculations (2 pts).Country Mean SD Candidate’s Score China 82.7 8.9 91.8 (Ming) Norway 2.46 1.63 4.21 (Leif) US 2.24 1.29 3.55 (Jason) 3.Suppose a fast-food restaurant wishes to estimate average sales volume for a new menu item. The restaurant has analyzed the sales of the item at a similar outlet and observed the following results:X = 715 (mean daily sales)S = 114 (standard deviation of sample)n = 36 (sample size)The restaurant manager wants to know into what range the mean daily sales should fall 95 percent of the time. Perform this calculation (2 pts).4. A group of marketing researchers study the expenditure on dinning out. They want to have a 95 percent confident level (Z) and accept a magnitude of error (E) of less than $2.75. The estimate of the standard deviation is $15.80 based on their pilot study. What is the calculated sample size if they want to run a survey? Perform this calculation (2 pts)

ch_13_big_data_basics_with_notes.ppt

z_table.pdf

Unformatted Attachment Preview

Ch 13

Big Data Basics: Describing

Samples and Populations

Dr. Jing Zhang

BUS 138 Marketing Research

LEARNING OUTCOMES

1. Use basic descriptive statistics to analyze

data and make basic inferences about

population metrics.

2. Distinguish among the concepts of population,

sample, and sampling distributions

3. Explain the central-limit theorem

4. Use confidence intervals to express inferences

about population characteristics.

5. Understand the major issues in specifying

sample size.

13–2

Introduction

• All the statistics in this chapter are univariate in

the sense that only one variable is involved

• This chapter provides a good background for

understanding equations related to sample size

requirements

13–

3

Descriptive Statistics and

Basic Inferences

• Raw data are simply numbers and words – with little

meaning

• The most basic statistical tools for summarizing

information from data include:

• Frequency distributions

• Proportions

• Measures of central tendency and dispersion

• Metrics provide a means of comparison

13–

4

• Metrics: A summary number that allows analysts to

compare the characteristics of a sample with some

population benchmark, characteristics of another

sample, or some other critical values.

• Inferential statistics: A summary representation of data

from a sample that allows us to understand (i.e., infer

from sample to population) an entire population.

• Two applications of statistics:

• To describe characteristics of the population or sample

and

• To generalize from a sample to a population

13–5

Descriptive Statistics and

Basic Inferences (cont’d.)

• Sample statistics: Summary measures about variables

computed using only data taken from a sample.

• English letters to denote sample statistics (e.g., X or

S)

• Population parameters: Summary characteristics of

information describing the properties of a population.

• Greek lowercase letters to denote population

parameters (e.g., or )

13–6

What Are Sample Statistics and

Population Parameters?

Frequency Distributions

• Constructing a frequency table or frequency

distribution is one of the most common means of

summarizing a set of data

• The frequency of a value is the number of times a

particular value of a variable occurs

• A distribution of relative frequency, or a percentage

distribution, is developed by dividing the frequency

of each value by the total number of observations,

and multiplying the result by 100

• Probability is the long-run relative frequency with

which an event will occur

13–

7

13–8

EXHIBIT 13.1 Frequency Distribution of Deposits

EXHIBIT 13.2 Percentage Distribution of Deposits

13–9

Relative Frequency

EXHIBIT 13.3 Probability Distribution of Deposits

13–10

Long-term relative frequency

Proportions

• A proportion indicates the percentage of

population elements that meet some criteria for

membership in a category.

• May be expressed as a percentage, a fraction, or

a decimal number

13–

11

Top-Box/Bottom-Box Scores

• A top box score generally refers to the portion of

respondents who choose the most favorable choice in a

multiple-choice question usually dealing with customer

opinions.

• The portion that would highly recommend a business to

others or the portion expressing the highest likelihood of

doing business again

• The logic is that respondents who choose the most

extreme response are really quite unique compared to

the others

13–

12

Top-Box/Bottom-Box Scores (cont’d.)

Percentage of participants by agreement level:

“I will highly recommend Kindle to my friends.”

30

25

What is the

top box score?

20

15

%

10

5

0

SD

D

N

A

SA

13

Top-Box/Bottom-Box Scores (cont’d.)

• Managers should examine the bottom-box score –

the portion of respondents who choose the least

favorable response to some question about

customer opinion

• More diagnostic of customer problems

• Often signals a need for some managerial reaction

• Net Promoter Score (NPS) reveals the frequency of

promoters (top-box) and distractors (bottom-box).

13–

14

Central Tendency Metrics: The Mean

•

Sample Mean

13–

15

Central Tendency Metrics: The Median

• The midpoint of the distribution, or the 50th

percentile

• The value below which half the values in the sample

fall

• A better measure of central tendency in the

presence of extreme values or outliers

13–

16

Central Tendency Metrics: The Mode

• The measure of central tendency that merely

identifies the value that occurs most often

• Determined by listing each possible value and noting

the number of times each value occurs

• Used for data that is less than interval, with one

large peak

13–

17

Dispersion Metrics

• Accurate analysis of data also requires knowing the

tendency of observations to depart from the central

tendency

• Another way to summarize the data is to calculate

the dispersion of the data, or how the observations

vary from the mean

13–

18

13–19

EXHIBIT 13.5 Sales Levels for Two Products with Identical Average Sales

Dispersion Metrics: The Range

• The simplest measure of dispersion – the distance

between the smallest and largest values of a

frequency distribution

• Does not take into account all the observations

• Indicates the extreme values of the distribution

• In a skinny distribution, values are a short distance

from the mean; in a fat distribution values are

spread out (see Exhibit 13.6 on the next slide).

13–

20

13–21

EXHIBIT 13.6 Low Dispersion versus High Dispersion

Dispersion Metrics: Deviation Scores

• A deviation of any observation from the mean can

be calculated by subtracting the mean from that

observation

• In Exhibit 13.5: January Product A: d= 196 – 200 = -4;

Product B: d=150-200 = -50

13–

22

Dispersion Metrics:

Why Use the Standard Deviation?

• It is perhaps the most valuable index of spread, or

dispersion

• Variance – useful for describing the sample

variability; will equal to zero if and only if each and

every observation in the distribution is the same as

the mean

13–

23

Why Use The Standard Deviation?

(cont’d.)

• Standard deviation

• The square root of the variance for distribution is

called the standard deviation

• It is in the original measurement units (e.g., $)

rather than in squared units (e.g., $^2)

• S is the symbol for the sample standard deviation

Standard Deviation =

13–

24

13–25

EXHIBIT 13.7

Calculating a Standard Deviation: Number of Sales Calls per

Day for Eight Salespeople

Video: How to Calculate Standard Deviation?

LEARNING OUTCOMES

1. Use basic descriptive statistics to analyze data

and make basic inferences about population

metrics.

2. Distinguish among the concepts of

population, sample, and sampling

distributions

3. Explain the central-limit theorem

4. Use confidence intervals to express inferences

about population characteristics.

5. Understand the major issues in specifying

sample size.

13–26

The Normal Distribution

• Normal Distribution

• A symmetrical, bell-shaped distribution (normal

curve) that describes the expected probability

distribution of many chance occurrences.

• IQ scores, SAT scores, shoe size, quarterly

revenue.

• Standardized Normal Distribution

• A purely theoretical probability distribution that

reflects a specific normal curve for the

standardized value, z.

13–27

EXHIBIT 13.8 Normal Distribution: Distribution of

Intelligence Quotient (IQ) Scores

• The graph is symmetrical about the mean.

• Curve is always above horizontal axis.

• Mean, mode, and median are converged on the

same point.

13–28

EXHIBIT 13.8 Normal Distribution: Distribution of Intelligence

Quotient (IQ) Scores

What about 95%?

•

•

•

•

The total area under the curve equals 100%.

68.3% = + – 1 SD of the mean. [xx, xxx]

95.4% = + – 2 SD’s of the mean. [xx, xxx]

99.7% = + – 3 SD’s of the mean. [xx, xxx]

13–29

EXHIBIT 13.9

Standardized Normal Distribution

13–30

The Normal Distribution (cont’d)

• Characteristics of a Standardized Normal Distribution

1. It is symmetrical about its mean.

2. The mean identifies the normal curve’s highest

point (the mode) and the vertical line about which

this normal curve is symmetrical.

3. The normal curve has an infinite number of cases

(it is a continuous distribution), and the area under

the curve has a probability density equal to 1.0.

4. The standardized normal distribution has a mean

of 0 and a standard deviation of 1.

13–31

Standardized Normal Table: Area under Half of the Normal Curve

13–32

EXHIBIT 13.10

Http://www.mathisfun.com/data/standard-normal-distribution-table.html

• The standardized normal distribution is extremely

valuable because we can translate or transform any

normal variable, X, into the standardized value, Z

• This has many pragmatic implications for the

marketing researcher

• A typical standardized normal table allows us to

evaluate the probability of the occurrence of certain

events without any difficulty

13–33

The Standardized Normal

Distribution and Z-Scores

Computing Z Scores

• Subtract the mean from the value to be

transformed, and divide by the standard deviation

(all expressed in original units)

• In the formula, note that σ, the population standard

deviation, is used for calculation:

X −

Z =

where μ is the hypothesized or expected value of the

mean

13–

34

Population Distribution and

Sample Distribution

13–35

•

Three Important Distributions

13–36

Sampling Distribution (cont’d.)

•

13–

37

Sampling Distribution

• Defined as a theoretical probability distribution that

shows the functional relation between the possible

values of some summary characteristic of n cases

drawn at random and the probability associated

with each value over all possible samples of size n

from a particular population.

• In a nutshell, sampling distribution is a portray of the

means of all possible samples of a given size.

13–

38

Sampling Distribution: Example

• Study the dollar amount that kids spend on their

most recent toy in the U.S.

• Randomly choose a sample of 20 kids and ask how

much they spend on the recent toy.

• I then randomly sample another 20 kids and record

the same information.

• I do this a total of 6 times. The results are displayed

in the table on the next slide.

13–

39

Sampling Distribution: Example (Cont’d)

Sample ID

Sample Size

Average of Dollar Amount

#1

20

26.8

#2

20

25.4

#3

20

27.5

#4

20

32.6

#5

20

30.1

#6

20

23.8

• Each sample has its own mean value, and each value is different

• Continue this experiment by selecting and measuring more samples

and observe the pattern of sample means

• This pattern of sample means represents the sampling distribution for

the dollar amount kids spend on toys.

13–

40

Sampling Distribution: Example (Cont’d)

• What happens to the sampling distribution if we

increase the sample size?

• As the sample size (n) gets larger, the sample means

tend to follow a normal probability distribution; they

tend to cluster around the true population mean.

Hence, the sampling distribution approaches to a

normal distribution.

13–

41

LEARNING OUTCOMES

1. Use basic descriptive statistics to analyze data

and make basic inferences about population

metrics.

2. Distinguish among the concepts of population,

sample, and sampling distributions

3. Explain the central-limit theorem

4. Use confidence intervals to express inferences

about population characteristics.

5. Understand the major issues in specifying

sample size.

13–42

Central Limit Theorem

•

13–

43

13–44

EXHIBIT 13.13

The Mean Distribution of Any Distribution Approaches Normal as n

Increases

13–45

EXHIBIT 13.13

The Mean Distribution of Any Distribution Approaches Normal as n

Increases (cont’d.)

LEARNING OUTCOMES

1. Use basic descriptive statistics to analyze data

and make basic inferences about population

metrics.

2. Distinguish among the concepts of population,

sample, and sampling distributions

3. Explain the central-limit theorem

4. Use confidence intervals to express

inferences about population characteristics.

5. Understand the major issues in specifying

sample size.

13–46

Estimation of Parameters:

Point Estimate

13–47

•

Estimation of Parameters:

Confidence Intervals

•

13–

48

Calculating a Confidence Interval

Approximate location (value) of the population mean

Estimation of the sampling error

13–49

Step By Step Calculation of the

Confidence Interval

13–50

LEARNING OUTCOMES

1. Use basic descriptive statistics to analyze data

and make basic inferences about population

metrics.

2. Distinguish among the concepts of population,

sample, and sampling distributions

3. Explain the central-limit theorem

4. Use confidence intervals to express inferences

about population characteristics.

5. Understand the major issues in specifying

sample size.

13–51

• Three factors required to specify sample size

• 1. The variance, or heterogeneity, of the population

in statistical terms refers to the standard deviation of

the population parameter

• A heterogeneous population has more variance (a

larger standard deviation) which will require a

larger sample.

• A homogeneous population has less variance (a

smaller standard deviation) which permits a

smaller sample.

13–52

Factors in Determining Sample Size

for Questions Involving Means

• 2. The magnitude of error, or the confidence

interval, is defined in statistical terms as E

• How precise must the estimate be?

• From a managerial perspective, the importance of

the decision in terms of profitability will influence

the researcher’s specifications of the range of

error

• 3. Confidence level (typically 95 percent)

• How much error will be tolerated?

13–53

Factors in Determining Sample Size for

Questions Involving Means (cont’d.)

13–54

EXHIBIT 13.16 Statistical Information Needed to Determine Sample Size for

Questions Involving Means

Estimating Sample Size for Questions

Involving Means

• Estimating sample size:

13–55

Sample Size Example

• Suppose a survey researcher, studying

expenditures on lipstick, wishes to have a 95

percent confident level (Z) and a range of error (E)

of less than $2.00. The estimate of the standard

deviation is $29.00. What is the calculated sample

size?

13–56

Sample Size Example (Cont’d)

• Suppose, in the same example as the one before, the range

of error (E) is acceptable at $4.00. Sample size is reduced.

• Doubling the range of acceptable error reduces sample size

requirement dramatically.

13–57

• Sample size may also be determined on the basis of

managerial judgments

• Using a sample size similar to those used in

previous studies.

• Another judgmental factor is the selection of the

appropriate item, question, or characteristics to

be used for the sample size calculations

• Often the item that will produce the largest

sample size will be used to determine the ultimate

sample size

13–58

Determining Sample Size on the

Basis of Judgment

• Another consideration stems from most researchers’

need to analyze the various subgroups within the

sample

• Rule of thumb for selecting minimum subgroup

sample size: each subgroup to be separately

analyzed should have a minimum of 100 or more

units in each category of the major breakdowns

13–59

Determining Sample Size on the

Basis of Judgment (cont’d.)

…

Purchase answer to see full

attachment