DANL 310 - HW 1 - Ben & Jerry’s Ice Cream

Ben & Jerry’s Ice Cream Consumption Across Households

Motivating Question

How do household characteristics influence Ben & Jerry’s consumption, price paid, and coupon usage?

To answer this question, I analyzed a dataset containing Ben & Jerry’s ice cream purchases. The dataset also included information on household demographics, product prices, coupon use, and flavor preferences.

1. Price Distribution by Ice Cream Size

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.2     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.3     ✔ tibble    3.2.1
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
ice_cream <- read_csv("https://bcdanl.github.io/data/ben-and-jerry-cleaned.csv")
Rows: 21974 Columns: 17
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (4): flavor_descr, size1_descr, region, race
dbl (5): priceper1, household_id, household_income, household_size, couponper1
lgl (8): usecoup, married, hispanic_origin, microwave, dishwasher, sfh, inte...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
ggplot(ice_cream, aes(x = size1_descr, y = priceper1)) +
  geom_boxplot(fill = "lightblue") +
  labs(title = "Price Distribution by Ice Cream Size",
       x = "Ice Cream Size",
       y = "Price ($)")

As shown above, larger ice cream sizes tend to have higher prices. This is expected, however, there is some variation in price within each size. This suggests that different flavors may have different prices, which may influence household purchasing decisions.

2. Average Price by Flavor

top_10 <- ice_cream |>
  count(flavor_descr, sort = TRUE) |>
  head(10)

ice_cream |>
  filter(flavor_descr %in% top_10$flavor_descr) |>
  group_by(flavor_descr) |>
  summarise(avg_price = mean(priceper1, na.rm = TRUE),
            n = n()) |>
  arrange(desc(avg_price))
# A tibble: 10 × 3
   flavor_descr               avg_price     n
   <chr>                          <dbl> <int>
 1 CHC CHIP C-DH                   3.47  1070
 2 CHERRY GRCA                     3.44  2097
 3 CHC FUDGE BROWNIE               3.34  1235
 4 PHISH FOOD                      3.33   968
 5 HEATH COFFEE CRUNCH             3.32  1070
 6 AMERICONE DREAM                 3.31   865
 7 PB CUP                          3.29   828
 8 NEW YORK SUPER FUDGE CHUNK      3.29   932
 9 KARAMEL SUTRA                   3.28   738
10 CHUNKY MONKEY                   3.26  1064

The flavors Chocolate Chip Cookie Dough and Cherry Garcia have the highest average prices. This shows that flavor impacts price.

3. Household Income vs Price Paid (with coupons)

ggplot(ice_cream, aes(x = household_income, y = priceper1, color = usecoup)) +
  geom_point(alpha = 0.5) +
  geom_smooth(method = "lm", se = TRUE, color = "black") +
  labs(title = "Household Income vs. Price Paid",
       x = "Household Income ($)",
       y = "Price ($)",
       color = "Used Coupon")
`geom_smooth()` using formula = 'y ~ x'

There is not much correlation between household income and price. However, higher income households tend to pay slightly more on average. Lower income households tend to use more coupons, which decreases the amount they spend.

4. Coupon Use by Region

ice_cream |>
  group_by(region) |>
  summarise(coupon_rate = mean(usecoup, na.rm = TRUE)) |>
  arrange(desc(coupon_rate))
# A tibble: 4 × 2
  region  coupon_rate
  <chr>         <dbl>
1 East         0.126 
2 West         0.121 
3 Central      0.109 
4 South        0.0797
ggplot(ice_cream, aes(x = region, fill = usecoup)) +
  geom_bar(position = "fill") +
  labs(title = "Proportion of Coupon Use by Region",
       x = "Region",
       y = "Proportion")

The East and West regions show slightly more coupon usage compared to the South and Central regions. This shows that geography may influence whether or not customers use coupons, which in turn impacts the average price they pay for Ben & Jerry’s ice cream.

5. Price Paid by Household Size

ggplot(ice_cream, aes(x = factor(household_size), y = priceper1)) +
  geom_boxplot(fill = "lightblue") +
  labs(title = "Price Paid by Household Size",
       x = "Household Size",
       y = "Price ($)")

Larger households tend to buy Ben & Jerry’s ice cream at a slightly lower price than smaller households. This suggests that larger households may buy ice cream in bulk or use more coupons to save money.

ice_cream |>
  group_by(household_size) |>
  summarise(coupon_rate = mean(usecoup, na.rm = TRUE)) |>
  arrange(household_size)
# A tibble: 9 × 2
  household_size coupon_rate
           <dbl>       <dbl>
1              1      0.112 
2              2      0.105 
3              3      0.103 
4              4      0.123 
5              5      0.0942
6              6      0.0780
7              7      0.0440
8              8      0.0208
9              9      0.0273

Smaller households actually use coupons slightly more frequently than larger households. This could be because they buy ice cream less often or in smaller quantities. Larger households may buy their ice cream in bulk and may not need coupons in order to save money.

Conclusion

  • Flavor influences price - flavors like Chocolate Chip Cookie Dough and Cherry Garcia are more expensive

  • Household characteristics - income and household size influence price paid and coupon usage

  • Regional differences - East and West regions have higher coupon usage

  • Price paid varies with household size - larger households tend to pay less on average for Ben & Jerry’s ice cream

Analyzing this dataset helps explain why Ben & Jerry’s may see different purchasing patterns across different regions and demographics. This information could be useful in marketing, as Ben & Jerry’s may want to tailor promotions to different income levels, household sizes, or other household characteristics.