You are here

Scale Reviews

Find reliable measures for use in your research. Search Now

Testimonial

The Marketing Scales website is a gold mine of information.  It is the only source that helps me understand the psychometric quality of the instruments used in past research.  I recommend that researchers bookmark this site . . . they will be back!
Bob Moritz
Marshfield Clinic Research Foundation

Lexicon

Below are numerous terms related to statistics and psychometrics that are commonly used in the reviews of scales in the database. The wording of these definitions and descriptions has been deliberately kept simple and refer primarily to the way they are used in the reviews. They are defined as they are used in marketing science as well as other disciplines that employ psychometrics in their research but, keep in mind, differences in descriptions do exist. In those cases where descriptions can become highly mathematical, they have been simplified here for the more casual reader. Although the phrasings of these definitions are original, they are heavily influenced by Netemeyer, Bearden, and Sharma (2003) as well as Nunnally and Bernstein (1994).

  • Alpha: a statistic for measuring reliability that quantifies the internal consistency of a set of items. The higher the alpha, the greater the internal consistency, with something above .80 usually considered sufficient (e.g., Netemeyer, Bearden, and Sharma 2003, p. 59). The statistic is sometimes called Cronbach’s alpha (for the one who created it) or coefficient alpha.
  • Average variance extracted (AVE): a statistic that estimates the amount of variance in a set of items relative to the amount of measurement error. There is disagreement about whether it is a measure of internal consistency or convergent validity, none-the-less, the higher the statistic the better, with .50 considered a minimal amount (Fornell and Larcker 1981).
  • Bi-polar adjective: see Semantic Differential.
  • Confirmatory Factor Analysis (CFA): a form of factor analysis where the number of factors, their composition, and even their inter-relationships can be hypothesized beforehand and then tested. While the technique could be used with one set of items intended to measure one construct, it has more meaning and value when multiple sets of items expected to measure multiple constructs are examined simultaneously. It is useful in examining dimensionality and validity.
  • Construct: the latent version of a variable that cannot be directly observed or measured. It may also refer to theoretical variables that are hypothesized (“constructed”) by scientists. Thus, a scale is a way to empirically measure something that cannot be measured more directly. In the social sciences, most constructs can be measured a variety of ways and empirical measures (such as scales) tend to evolve over time as theories change and measurement practices become more elaborate.
  • Exploratory Factor Analysis (EFA): a statistical routine in which the degree of relationships among a set of items is accounted for mathematically such that the most variance is accounted for rather than based on theory. Items are free to load on the dimension with which they are most related. It is up to the researcher to examine the results and determine which items have loaded strongest on a factor and what that factor (construct) appears to represent.
  • Formative scale: a summated measure in which the items “form” the construct. This is in contrast to the items “reflecting” the construct as in reflective scales (see below). In a formative scale, each item makes an important contribution to determining the construct and deleting any of them would change the construct. These are not the kind reviewed for the database though some have slipped in from the times when our field was less knowledgeable of the difference.
  • Instrument: In the context of measurement, it is sometimes used with respect to a single multi-item scale. Other times the term is used to with reference to multiple scales that are used together to measure several related constructs.
  • Item: It is the smallest unit of a measurement scale, such as survey question. While items may be used individually to measure a construct, the type of scales reviewed in the database are composed of multiple items. The items may be sentences (as in the case of Likert-type scales) or bi-polar adjectives (as with semantic differentials)
  • Likert-type scale: a form of measurement in which statements about beliefs, behaviors, or characteristics are responded to using some form of verbal agreement. The overwhelmingly most popular type in the database reviews is either a 5 or 7 seven point response format using strongly agree and strongly disagree as the extreme verbal anchors. Common variations are statements and responses about the frequency (never/very often) of something, how much something occurs (a lot/a little), or how accurate a statement is (not at all true/ very true). If desired, each scale point can have a verbal label rather than just the end points.
  • Multi-item scale: A type of measurement scale in which a person’s score is based upon some combination of the scores on more than one item. This is the only kind reviewed in the database, with the overwhelming majority having at least three items.
  • Psychometric quality: a phrase used to refer to how “good” a scale is based on its levels of reliability, validity, and unidimensionality.
  • Reflective scale: a type of summated measure in which the items are representative of innumerable items that could be used to reflect the meaning of the construct. Since the items are “caused” by the same construct, they should be highly correlated. Further, deleting any one item should not change what is being measured. Reviews of this type of scale compose the database.
  • Reliability: the quality of a scale such that multiple instances of measuring the same thing produces the same result. There are several forms of reliability but the form used most frequently in marketing research is Cronbach’s alpha (See alpha). A less used form is temporal stability (See below).
  • Response format: the verbal and numeric anchors for the points on the scale. Points refer to the number of potential responses for an item. The overwhelmingly most used points in the domain of scales reviewed are either 1 to 5 or 1 to 7. See Likert-type scales (above) and semantic differentials (below) with regard to verbal anchors.
  • Reverse-scored: the numeric score for an item as given by a respondent is re-coded by the analyst in order to have the opposite meaning.  This is done so that scale scores are calculated based on items that point in the same direction rather than offsetting each other.  For example, on a 1 to 7 scale, 1 would become 7, 2 would become 6, 3 would become 5, etc.
  • Scale: it can refer to one measurement item upon which a range of responses (and scores) could result from a variety of people. In the reviews, the term is used primarily to refer to a set of items intentionally used together and on which a person would receive one score intended to represent the measurement of one construct.
  • Semantic differential: a type of scaled measure in which the items are terms or phrases with opposite meanings, e.g., good/bad, attractive/ugly, pleasant/unpleasant. Typically, these terms are adjectives and in those cases the scales may be referred to as bi-polar adjectives. This type of scale is good to use when something is being described or evaluated such as a product, a business, or a person. A seven-point response format is most typical for these scales and labels are rarely used on anything other than the end points.
  • Summated scale: when multiple items are used as a set to represent a construct, scores can be produced by adding up a person’s score on each item. While totals can be used as scores, many times it is more meaningful to calculate the mean of the scores.
  • Structural Equation Modeling (SEM): a statistical routine whereby the relationships among a set of latent constructs are hypothesized based on theory and the fit of the arrangement is tested using empirical data.  Scale scores are used in many cases to represent the latent constructs in the model.
  • Temporal stability: a form of scale reliability that assesses the degree to which a measure produces the same score for the same construct over some period of time. It is typically measured via the correlation between two scores from two time periods, thus, is referred to by many researchers as test-retest correlation.
  • Unidimensionality: a quality of a set of measurement items such that they all measure the same construct even if it is not clear what that construct is. Evidence for unidimensionality is typically provided via some form of factor analysis to show that the items load “high” on the same, single factor with “low” and/or insignificant loadings on any other factors.
  • Unipolar: a type of measurement scale where one term or possibly a brief phrase is used instead of a statement, a question, or bi-polar anchors. Respondents are asked to use the unipolar term to describe a focal stimulus (person, place, activity, or object). The response might be of the Likert-type (agree/disagree) allowing a person to express the level to which the unipolar item accurately describes the stimulus.
  • Validity: the quality of a scale such that it measures the latent variable it is intended to measure. There are several types of validity with the most frequently mentioned in the reviews being convergent and discriminant validity. Other types that are sometimes mentioned are face, content, nomological, and predictive, and known-group. Many times the phrase “construct validity” is used broadly to refer to all of these or some subset (such as convergent and discriminant).

References

Fornell, Claes and David F. Larcker (1981), “Evaluating Structural Equation Models with Unobservable Variables and Measurement Error,” Journal of Marketing Research, 18 (February), 39-50.

Netemeyer, Richard G., William O. Bearden, and Subhash Sharma (2003), Scaling Procedures: Issues and Applications, Newbury Park, CA: Sage Publications, Inc.*

Nunnally, Jum C. and Ira H. Bernstein (1994), Psychometric Theory, New York: McGraw- Hill.