Examination of the data shows that the distribution of percentage difference per gram is symmetric, but information is missing for all of the "locally prepared" foods. This variable cannot be used for a full investigation of the relationship between distribution area and underreporting of calories. The distribution of percentage difference per item is skewed right with median 24. A one sample sign test of Ho: Median=0 is appropriate to determine whether there is an overall underreporting of calories on food labels.
Boxplots suggest that national products are not systematically under reporting caloric content, regional products are somewhat under reporting, and local products are most prone to under reporting. There is also an increase in the variability as the classification changes from local to regional and national. One-way ANOVA can be used to address the relationship between distribution area and underreporting of calories. Because of the skewness and non-constant variance, the data should be transformed first. First, eliminate the negative differences in the data set by adding 100 to each number before transforming it. A reciprocal transformation (1/(100+percentage difference per item) results in an approximately normally distributed variable. A one-way ANOVA with this transformed difference as the dependent variable shows a significant difference in underreporting among the different distribution area categories.