Exercise Patterns Across Income and Education Levels in the United States

DANL 310: Data Visualization and Presentation

Author
Affiliation

Ann Brennan

SUNY Geneseo

Published

May 14, 2026

Project Overview

Physical activity is an important part of overall health. However, not everyone exercises the same amount, which may be influenced by social and economic factors. This topic highlights how health behaviors are shaped by more than just personal preference. Factors such as income and education can affect an individual’s access to time, resources, and opportunities for physical activity. This may contribute to differences in exercise rates across groups. This project examines how exercise participation varies across income and education levels in the United States. The main question is: Are individuals with higher income and education more likely to engage in physical activity?

Data

brfss <- read.csv('C:/Users/15856/Downloads/Behavioral_Risk_Factor_Surveillance_System_(BRFSS)_Prevalence_Data_(2011_to_present)_20260428.csv')

paged_table(brfss)

The dataset used in this project is from the Behavioral Risk Factor Surveillance System (BRFSS), a national survey conducted by the Centers for Disease Control and Prevention (CDC). This survey collects self-reported information from adults in all 50 states on health behaviors and related factors. The data used in this project is from 2024. The unit of observation is a group defined by demographic categories, including income and education level. This analysis focuses on three main variables: exercise participation, income level, and education level. Exercise participation is measured as the percentage of individuals within each group who have engaged in physical activity in the past 30 days. Income and education are reported in categories. Although the data is self-reported, it still offers useful insights into health behavior across the United States.

Data Preparation and Transformation

To prepare for the analysis, the dataset was first filtered to include only respondents who had exercised in the past 30 days. It was also filtered to include the relevant demographic categories, including income and education. Observations with missing values were removed. Key variables such as exercise rates and sample sizes were converted to numeric format. The data was then grouped by income and education categories. A weighted average of exercise participation was calculated for each group using the sample size as weights. This ensures that groups with larger sample sizes have a greater influence on the results.

brfss_clean <- brfss |>
  filter(Response == "Yes") |>
  filter(Break_Out_Category %in% c("Household Income", "Education Attained"))
  
brfss_clean <- brfss_clean |>
  filter(!is.na(Data_value), !is.na(Sample_Size))

brfss_clean <- brfss_clean |>
  mutate(
    Data_value = as.numeric(Data_value),
    Sample_Size = as.numeric(gsub(",", "", Sample_Size))
  )

income_df <- brfss_clean |>
  filter(Break_Out_Category == "Household Income") |>
  group_by(Break_Out) |>
  summarize(
    exercise_rate = weighted.mean(Data_value, Sample_Size, na.rm = TRUE)
  ) |>
  rename(income = Break_Out)

income_df$income <- factor(income_df$income, levels = c(
  "Less than $15,000",
  "$15,000-$24,999",
  "$25,000-$34,999",
  "$35,000-$49,999",
  "$50,000-$99,999",
  "$100,000-$199,999",
  "$200,000+"
))

income_df <- income_df |>
  filter(!is.na(income))

education_df <- brfss_clean |>
  filter(Break_Out_Category == "Education Attained") |>
  group_by(Break_Out) |>
  summarize(
    exercise_rate = weighted.mean(Data_value, Sample_Size, na.rm = TRUE)
  ) |>
  rename(education = Break_Out)

education_df$education <- factor(education_df$education, levels = c(
  "Less than H.S.",
  "H.S. or G.E.D.",
  "Some post-H.S.",
  "College graduate"
))

Descriptive Statistics

Basic Summary

skim(brfss_clean)
Data summary
Name brfss_clean
Number of rows 7095
Number of columns 10
_______________________
Column type frequency:
character 5
numeric 5
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
Locationabbr 0 1 2 2 0 54 0
Locationdesc 0 1 4 20 0 54 0
Response 0 1 3 3 0 1 0
Break_Out 0 1 8 17 0 12 0
Break_Out_Category 0 1 16 18 0 2 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
Year 0 1 2017.79 4.09 2011.0 2014.0 2018.0 2021.0 2024.0 ▆▆▅▇▇
Sample_Size 0 1 1191.99 1240.17 44.0 380.0 775.0 1570.0 16152.0 ▇▁▁▁▁
Data_value 0 1 72.64 10.64 33.2 64.9 72.5 80.9 97.4 ▁▂▇▇▃
Confidence_limit_Low 0 1 68.59 12.10 28.2 59.7 68.5 78.2 96.0 ▁▃▇▇▃
Confidence_limit_High 0 1 76.68 9.51 37.3 69.9 76.7 83.7 100.0 ▁▁▇▇▃

Important Statistics

Exercise rate:

# overall exercise rate
overall_exercise <- weighted.mean(
  brfss_clean$Data_value,
  brfss_clean$Sample_Size,
  na.rm = TRUE
)

overall_exercise
[1] 78.2776

Income groups range:

range(income_df$exercise_rate, na.rm = TRUE)
[1] 61.94568 92.31137

Education groups range:

range(education_df$exercise_rate, na.rm = TRUE)
[1] 58.64922 86.85935

The descriptive statistics show clear variation across socioeconomic groups. The overall average exercise rate is about 78%. Income groups range from about 62% to 92%, whereas education groups range from about 59% to 87%. The differences within groups are slightly more pronounced across income levels, suggesting larger gaps by income.

Data Visualization

Visualization 1: Exercise Rates by Income

ggplot(income_df, aes(x = income, y = exercise_rate)) +
  geom_col(fill = "#4E79A7") +
  labs(
    title = "Exercise Rates by Income",
    x = "Income Level",
    y = "Exercise Rate (%)"
  ) +
  theme_minimal(base_size = 14) +
  theme(
    plot.title = element_text(face = "bold", size = 16),
    axis.text.x = element_text(angle = 45, hjust = 1),
    panel.grid.minor = element_blank()
  )

This chart shows a clear positive relationship between income and exercise rates. As income increases, the percentage of individuals who report exercising in the past 30 days also increases. This pattern suggests that higher income individuals are more likely to participate in physical activity. This could be because having a higher income provides greater access to resources that support exercise, such as gym memberships, safe neighborhoods, and flexible time. These resources may be less available to lower income groups.

Visualization 2: Exercise Rates by Education

ggplot(education_df, aes(x = education, y = exercise_rate)) +
  geom_col(fill = "#59A14F") +
  labs(
    title = "Exercise Rates by Education",
    x = "Education Level",
    y = "Exercise Rate (%)"
  ) +
  theme_minimal(base_size = 14) +
  theme(
    plot.title = element_text(face = "bold"),
    axis.text.x = element_text(angle = 45, hjust = 1),
    panel.grid.minor = element_blank()
  )

This chart shows a similar positive relationship between education level and exercise rates. Individuals with less than a high school edcuation have the lowest exercise rate (about 60%), whereas college graduates have the highest exercise rate (about 90%). Each step up in education level is correlated with an increase in exercise participation. This suggests that education is strongly associated with healthier behaviors like exercise. Higher levels of education may increase awareness of the benefits of physical activity and correlate with occupations that give an individual more time or opportunity to exercise.

Visualization 3: Exercise Rates Across United States

ggplot(map_df, aes(long, lat, group = group, fill = exercise_rate)) +
  geom_polygon(color = "white", size = 0.2) +
  coord_fixed(1.3) +
  scale_fill_gradient(low = "#DEEBF7", high = "#3182BD", na.value = "grey90") +
  labs(
    title = "Exercise Rates Across United States",
    fill = "Exercise Rate (%)"
  ) +
  theme_void() +
  theme(
    plot.title = element_text(face = "bold", size = 16),
    legend.position = "right"
  )

This map shows geographic variation in exercise rates across states. Some states show much higher levels of physical activity than others. Western and a few northeastern states tend to have higher exercise rates (darker shades), while some southern and midwestern states appear to have lower rates (lighter shades). These regional differences may reflect socioeconomic and environmental factors, such as differences in income distribution, education levels, climate, and infrastructure.

Shiny App

This Shiny application provides an interactive exploration of exercise rates by income and education. The app allows the user to view and compare the first two visualizations of this project dynamically. This app aims to help examine how health behaviors, like exercise, vary across socioeconomic groups.

Dashboard

This dashboard explores how exercise rates vary by income, education, geography across the United States. It allows the user to compare differences in physical activity across socioeconomic groups and states, helping to identify patterns in health behavior.

Discussion

The visualizations suggest that exercise rates increase with both income and education. People with higher income and education levels tend to report higher rates of physical activity. One possible explanation is they have greater access to resources such as gym memberships, safe places to exercise, and more free time. Overall, these patterns show that exercise rates follow a social and economic pattern. The geographic results also show variation across states. This suggests that location and environment may also affect physical activity.

Limitations

This analysis is purely descriptive, meaning it does not show causal relationships. Although the visualizations show relationships between income, education, and exercise rates, they do not prove that one variable causes another.

There are also limitations in the data itself. Exercise rates are self reported, which may not be completely accurate. Additionally, some observations are missing, which may impact the accuracy of the geographic visualization.

Finally, other important factors like age and health status are not included in this analysis, which may also influence exercise behavior.

Conclusion

This project explores how exercise rates differ by income, education, and geography. The results show a clear pattern: people with higher income and education tend to exercise more, with variation across states. Overall, the results suggest that exercise is influenced by social and economic factors, not just personal preference. Access to resources and opportunity appear to play an important role in health behaviors.

Future Directions

Future work could include additional variables such as age, occupation, or health status. This would help provide a more complete understanding of what influences physical activity. It would also be useful to examine a longer period of time to see whether these patterns change over time. More detailed geographic analysis at the county level could also provide a clearer picture of variation across and within regions. Finally, this project could include more advanced modeling approaches to isolate the effects of different socioeconomic factors and analyze correlations between variables.

References

Centers for Disease Control and Prevention (CDC). (n.d.). Behavioral Risk Factor Surveillance System (BRFSS). https://www.cdc.gov/brfss/ (Accessed May 2026).

R Core Team. (2024). R: A language and environment for statistical computing. https://www.r-project.org/ (Accessed May 2026).

Wickham, H. et al. (2019). tidyverse: Easily install and load the tidyverse. https://www.tidyverse.org/ (Accessed May 2026).

Wickham, H. ggplot2: Create elegant data visualisations. https://ggplot2.tidyverse.org/ (Accessed May 2026).

Wickham, H. dplyr: A grammar of data manipulation. https://dplyr.tidyverse.org/ (Accessed May 2026).

Pebesma, E. sf: Simple features for R. https://r-spatial.github.io/sf/ (Accessed May 2026).

Ahlmann-Eltze, C. skimr: Compact and flexible summaries of data. https://docs.ropensci.org/skimr/ (Accessed May 2026).

RStudio. rmarkdown: Dynamic documents for R. https://rmarkdown.rstudio.com/ (Accessed May 2026).

Cédric Scherer. hrbrthemes: Additional themes for ggplot2. https://github.com/hrbrmstr/hrbrthemes (Accessed May 2026).

R Core / R community. maps: Draw geographical maps. https://cran.r-project.org/package=maps (Accessed May 2026).

Posit. shiny: Web application framework for R. https://shiny.posit.co/ (Accessed May 2026).

OpenAI. (2026). ChatGPT (GPT-5.3). https://chat.openai.com/ (Accessed May 2026).