Up A Brief Quiz Data Distributions Descriptive Statistics Excel Computer Analysis SPSS Computer Analysis

Chapter 12

Descriptive Statistics

 Outline

Concepts

I.  Statistics in Communication Research

descriptive statistics: numbers that characterize some information
inferential statistics: tools that help researchers draw conclusions about the probable populations from which samples did or did not belong

    A.  Measures of Central Tendency
         (Arithmetic Mean, Median, and Mode)

measures of central tendency: measures that describe what is going  on within sample groups or populations on the average
arithmetic mean: (the number most people call "the average") the sum of a set of scores divided by the number of scores
--unbiased estimator: a sample statistic that is likely to
  approximate the population parameter
median: a score that appears in the middle of an ordered list of scores
mode: the most commonly occurring score
--bimodal: a distribution that has two
  modes

    B.  Measures of Variability or Dispersion

 

         1.   Range

range: the difference between the highest and lowest scores (range is greatly affected by extreme scores)

         2.   Variance

variance: though computed differently, a measure that attempts to summarize the average of squared differences of scores from the mean, symbolized for the sample variance as s2 and for the population variance s2 .

         3.   Standard Deviation

standard deviation: though computed differently, a measure that attempts to summarize the average deviation of  scores from the mean, by estimating such a value from the square root of the variance s2; symbolized for the sample standard deviation as s and for the population standard deviation as s

II.  Distributions
    A.  Nonnormal and Skewed Distributions

 

1.  Types of Skew

     --if the skewness coefficient is positive, then the long tail is “above” or to the right of the “ground zero” mean of the distribution

--if the skewness coefficient is negative, then the long tail is “below” or to the left of the “ground zero” mean of the distribution          

skew: a measure of centeredness (skewness reveals the side of the distribution in which the longest "tail" lies)

           2.  Peakedness of Distributions

kurtosis: a measure of peakedness of a distribution (in a perfect normal distribution, the distribution is as high as three standard deviations is wide)
platykurtic: a very flat distribution
mesokurtic: a distribution that is neither very high nor very low
leptokurtic: a distribution that is tall

    B.  Standard Normal Distribution

the standard normal curve: a probability distribution that tells the expected value that would be obtained by sampling at random

          1. The Gaussian Curve
               --characteristics of the curve:
                  median, mean, and mode are all at the same place on
                  the distribution, marked as 0 and symbolized as mu (
m).
                  Skewness is 0 and kurtosis is 3 since the distribution is
                  perfectly centered and peaked. Tails never touch bottom. A                   standard deviation equals 1.
          2.  Interpreting Areas Under the Normal Curve
               --approximately 2/3rds (68.2%) of the distribution exists from
                  1
s below the mean to 1s above the mean.

--the standard normal curve can help identify long run expectations we might have for samples we take. 

probability distributions: the theoretical pattern of expected “values of a random variable and of the probabilities of occurrences of these values” (Upton & Cook, 2002, p. 294)

data distributions: data collected from actual samples of events

3.  Using z Scores
--researchers can use the standard normal curve to make
  decisions by changing their sample data into "z scores"
  (also called standard scores). Z scores permit us to
  represent data scores as units under the standard normal
  curve.

z scores: scores that transform values from other distributions into equivalent units under the standard normal curve with means of 0 and standard deviations and variances of 1.

 

III.  Measures of Association

correlation: a measure of the coincidence of variables
--correlation coefficients can range from -1.00 to 1.00

     A.  Interpreting Correlations
          --direct and inverse relationships  
          --a correlation between .80 to 1.00 is a highly dependable
            relationship;  between .60 to .79 is a moderate to marked
            relationship; between .40 to .59 is a fair degree of relationship;
            between .20 to .39 is a slight relationship; between .00 to .19
            is a negligible or chance relationship 


direct relationship: a correlation indicating that as one variable increases,  the other variable also increases

--in a scatterplot, researchers often add a line of "best fit"
   through the data (sometimes it is called a "line of
  regression")
inverse relationship: a correlation coefficient indicating that an increase in one variable corresponds to a decrease in the other (identified by negative signs before correlation coefficients)

         --calculating proportions of  variance explained

coefficient of determination: the percentage of variation in one variable that can be explained by a knowledge of the other variable alone (computed by squaring a correlation coefficient or computing eta if nonlinear patterns are to be examined)

     B.  Major Forms of Correlations
           --though causal relationships should
produce high correlations,
             a correlation cannot show causation by itself (researchers
             must use the experimental method or wait for the method of
             history to resolve matters).

 

          1.  Pearson Product Moment Correlation

Pearson product moment correlation: a correlation method is suitable for situations in which both the independent and dependent variables (identified as X and Y respectively in most notation) are interval or ratio level measures

          2.  Spearman Rank Order Correlation

Spearman rank order correlation: a correlation method suitable for situations in which both the independent and dependent variables are ordinal measures