The Mechanics of Discrimination Testing

This study examines a snack company's challenge to find a cost-effective malt vinegar supplier without compromising taste.

Prepared by Viktor Plamenov

Abstract

This report illustrates the use of discrimination testing to evaluate sensory differences in chips made with malt vinegar from two suppliers: the current supplier (Supplier A) and a potential new supplier (Supplier B) that offers more competitive prices. The objective is to obtain a potential cost reduction that will not jeopardize a successful product. The experiment involves 50 participants in a blinded A-Not A test, assessing their ability to distinguish between the two tested chips and their confidence in these assessments. This testing method is vital for identifying significant sensory changes and informing product development and quality control in the food and beverage industry. The results provide valuable insights into the potential effects of switching vinegar suppliers on the taste and overall sensory experience of the chips.

1. Case Study Example

This case study explores a strategic sourcing dilemma encountered by a snack food company, centering on the procurement of malt vinegar, a vital component in their popular salt and vinegar chips. Recently, the cost of malt vinegar from their existing supplier, Supplier A, has been on the rise, significantly affecting the company’s overall production expenses. The primary challenge for the company is to identify and transition to an alternative supplier, Supplier B, while ensuring that there is no compromise on the sensory qualities that define their product.

1.1 Research Objective

To evaluate whether malt vinegar from Supplier B can replace the vinegar from Supplier A without significantly altering the product’s sensory characteristics, thereby ensuring no risk in consumers rejection of the product and loss of sales.

Study Methodology

Participants: A panel of 50 untrained consumers, all familiar with particular product and using it at least 4 times a year, evaluated the chips. Each consumer evaluated both products, one after another in a randomised and balanced design
Sample Preparation: Chips were prepared using malt vinegar from either suppliers resulting in two samples : control (with vinegar from Supplier A), referred to as A in this report, and prototype (with vinegar from Supplier B), referred as B. Care was taken to ensure all other variables were constant.
Test Design: : A blinded A-Not A discrimination test with a sureness level is employed. Participants were presented with the control sample A for familiarization and were instructed to taste this control sample A at least 2 times. After this they were given the first of the two samples and were asked if it was A or Not A, and to rate their level of sureness in their decision (Sure, Not Sure, Guess), based on how easy it was for them to make their judgement. After which they continued with the second sample.
Statistical Analysis: The data was analyzed to determine the size of sensory difference between the sample. The analysis includes calculating the d-prime and R-index to assess the sensitivity and reliability of the participants’ responses.

2. Outcomes and Implications

Determine if there is a statistically significant difference in sensory perception between the two chips.
Quantify the size of sensory difference between the samples (d-prime) and the estimated percentage of people that would be able to perceive the difference (R-index).
Gain insights into the impact of changing suppliers on product sensory characteristics.
Understanding consumer sensitivity to small differences can guide marketing strategies and product labeling.
Results may inform quality control protocols and shelf-life studies.

Table 2.1: Summary of panelist responses with adapted sureness levels.

3. Method Overview

The employed A-Not A discrimination test with sureness levels is a a sensory evaluation method where participants are presented with a reference sample for familiarization and then presented with a sample for which they need to indicate if it is the reference or not, and then rate their sureness on a three-level scale: 1 (Sure), 2 (Not Sure), and 3 (Guess). This two step approach is used to create a 6 point scale. The responses are counted and summarized in a table, showing the distribution of scores on the 6 points. Table 2.1. shows the distribution of sureness levels among the consumers in each group, based on whether they detected a difference in the sample they tested. This detailed breakdown helps in understanding not only the participants’ perception of difference but also their level of confidence.

3.1 Null Hypothesis

Consumers are unable to differentiate between the salt and vinegar chips made with malt vinegar from Supplier A and those made with malt vinegar from Supplier B. Any perceived differences are due to random variation and not due to actual sensory differences in the products. The null hypothesis will be tested by analyzing if the number of correct identifications is significantly higher than what would be expected by chance. If the results show that participants can differentiate between the chips significantly more often than would be expected by chance, the null hypothesis can be rejected, indicating a perceptible difference between the products.

Figure 3.1: Definition of d-prime as a difference of two Standard Normal Inverse-CDFs.

3.2 Discrimination Metrics

In the section we delve into the quantitative evaluation of the discrimination tests. Focusing on key metrics, we assess the ability of participants to discern between identical and different stimuli. The Hit and False Rates associated with the current analysis can be found in Table 3.1.

Figure 3.2: Isosensitivity map with estimated hit rates, false rates and d-prime across all sureness levels.

3.2.1 D-prime

D-prime (d’) is a pivotal metric in sensory science, derived from signal detection theory. It quantifies an individual’s ability to distinguish between different stimuli, independent of their response bias. This measure of sensitivity is crucial in discrimination test, where it evaluates the detectability of differences in products. A higher d-prime indicates greater sensory sensitivity. D-prime is an invaluable tool for assessing and comparing sensory perceptions in product development and quality control. It is defined as the difference of the quantile function of two standard normal distributions:

where Φ⁻¹(p) = − √erfc⁻¹(2p) for 0 ≤ p ≤ 1, H is the hit rate, and F is the false rate, and erfc⁻¹ is the inverse of the completement error function.1

Standard Errors: This report utilizes the approximation method for calculating the standard error of d ′ , as originally developed by Gourevitch and Galanter [3].2 . More formally, the formula for estimating the variance of d ′ is:

Table 3.1: Proportions of panelist responses.

where N₁ and N₂ represent the number of people that were given the control and prototype respectively. In the scenario under consideration, the parameter σ can be regarded as a direct representation of the standard error. To obtain the 95% confidence interval we multiply σ by ±1.96 3 and add to d ′ .

where σ = 0.22. A d-prime value of 0.31 indicates a level of sensory discrimination ability among participants that is only slightly better than random guessing. Considering that the 95% confidence interval includes 0, this result is not statistically different from zero, suggesting that the panel had difficulty in consistently differentiating between the prototype and reference samples. Consequently, we can advise the manufacturer that changing their vinegar supplier does not lead to a distinguishable difference in the chips, as the sensory differences are not consistently perceptible to the average consumer.

3.2.2 R-Index

The R-index, another significant metric in sensory science, can be use to quantify the difference between products, either based on results from one individual, to assess the reliability of discrimination tests as it evaluates the consistency of an individual’s sensory responses. In this study it is calculated for the group of subjects, thorough pooling the results across subjects. A higher R-index indicates a larger difference and represents the estimated percentage of people who detected the difference. In general, the R-index varies from 50% (no discrimination) to 100% (full discrimination). The formula for the R-index associated with an A-Not A test with 3 sureness levels is now provided:

where Tc = a + b + c + d + e + f are all participants shown the control (Supplier A) and Tp = g + h + i + j + k + l is the cohort of people shown the prototype (Supplier B) during the testing. In our case, the 50 participants are testing both the control and the reference, ie TcTp = 2500. Using the data given in Table 2.1. and substituting the values in (2) to compute the R-index associated with this study yields R = 0.58. An R-index of 0.58 is above the chance level of 0.5 and indicates a performance level that is slightly better than random chance. While the performance is above chance, it’s not by a large margin. This suggests that the system or individual has some ability to distinguish between the two conditions, but this ability is limited.

3.2.3 ROC and Isosensitivity Curves

The curve that connects points with the same d’ value, where each point represents a combination of correct detections and false alarms, is known as an isosensitivity curve. This is because every point on this curve shows the same level of sensitivity. The functional form of the curve could be derived by rewriting (1) in terms of H for a specific value of d-prime. Performing algebraic manipulations, the resultant equation is:

Figure 3.3: ROC curve for three different d-primes.

3.2.4 AUC

In some cases, representing performance metrics as ratios is more effective than using absolute distances. This concept is exemplified in the use of the Area Under the Curve (AUC). The AUC, represented by the area under the ROC curve, is a strong measure of sensitivity. It could be computed either parametrically (based on a known distribution) or empirically (nonparametrically). The nonparametric methodology for estimating AUC, using specific approximations, was put forward by Pollack and Hsieh [5]. Contrary to their approach, our example will illustrate the parametric approach utilizing the the curve formed by the Inverse-CDF of the normal distribution. Hence, the AUC is defined as the integral of (3) over F ∈ [0, 1].

Substituting d ′ = 0.31 in (4) yields AUC(F, d′ = 0.31) = 0.59. The conclusion is somewhat similar to the one obtained from the R-index. Namely, while the classifier’s performance is better than random, an AUC of 0.59 is not considered highly accurate and suggests some ability to discriminate between the classes, but the level of accuracy is low.

4. Results

This section provides a concise summary of the statistical analysis performed to assess the ability of consumers to differentiate between salt and vinegar chips from two different suppliers.

D-prime (d’): The d-prime value is computed to be 0.31. This indicates some ability to differentiate between the chips, though not with high accuracy.
D-prime (d’) C.I. : The 95% C.I. of d ′ includes zero indicating there is no statistically significant perceived difference in the prototype and reference samples.
R-Index: The R-index is found to be 0.58. This value suggests a performance slightly better than random chance, but it is not significantly higher.
AUC (Area Under the Curve): The AUC is computed to be 0.59. This indicates moderate accuracy in the ability to discriminate between the products of the two suppliers.

In conclusion, while there is some indication that participants can distinguish between the products made with ingredients from the two suppliers, the level of discrimination is low, suggesting only a marginal difference in sensory perception that might be due to chance.

5. Conclusion and Advice to Management

Based on the statistical analysis, while there is a marginal ability for consumers to distinguish between the chips made with vinegar from Supplier A and Supplier B, this differentiation is not pronounced. Therefore, from a sensory perception standpoint, switching to the cheaper Supplier B could be considered, as the difference in product quality, as perceived by consumers, is minimal. However, it is important to weigh other factors such as brand reputation, customer loyalty, and potential cost savings before making a final decision.

6. References

N.A. Macmillan and C.D. Creelman, Detection Theory: A User’s Guide, Lawrence Erlbaum Associates, Publishers, 2005
H. T. Lawless and H. Heymann, Sensory Evaluation of Food: Principles and Practices, Springer, 2010
H. Stone and J. L. Sidel, Sensory Evaluation Practices, Academic Press, 2012.
H .Lee and D.van Hout, Quantification of Sensory and Food Quality: The R-Index Analysis, Journal of Food Science, Vol. 74, Nr. 6, 2009
Gourevitch, V., Galanter, E. (1967). A significance test for one parameter isosensitivity functions. Psychometrika, 32(1), 25–33.
Pollack, I., Hsieh, R. (1969). Sampling variability of the area under the ROC-curve and of d’e. Psychological Bulletin, 71(3), 161–173.