6 Interactions and Non-Additivity
6.1 Overview
Up to this point, we’ve assumed that the linear regression model is additive – the effect of one predictor on \(Y\) doesn’t depend on the value of another predictor. But this assumption is often unrealistic. Does the effect of authoritarianism on institutional trust differ across partisan groups? Does the relationship between income and vote margins change depending on a community’s racial composition? These are questions about interactions.
In this chapter we develop the interaction model, show how it generalizes the additive (intercept-shift) model from the previous chapter, and apply it to both the Western States Survey and the Arizona precinct data.
6.2 The Arizona Precinct Data
Throughout this chapter and the next, we use the Arizona precinct data – 1,688 precincts from the 2024 presidential election, with voter registration, turnout, and Census demographics linked at the tract level.
This data comes from the Arizona Secretary of State voter file, merged with ACS tract-level demographics. Each row is a precinct. The dependent variable is Trump’s margin over Harris (positive = Trump advantage). Predictors come from ACS tract-level demographics: median household income, Latino population share, median age, and the Gini index of income inequality.
load("precinct_voter_summary.rda")
load("precinct_tract_data.rda")
# Convert raw ACS counts to percentages
precinct_voter_summary <- precinct_voter_summary |>
dplyr::mutate(
pct_latino = (tract_acs_latino / tract_acs_total_population) * 100,
pct_white = (tract_acs_non_latino_white / tract_acs_total_population) * 100
)
head(precinct_voter_summary[, c("dos_precinct_key", "trump_harris_margin",
"tract_acs_median_household_income",
"pct_latino", "tract_acs_median_age",
"tract_acs_gini_index")])# A tibble: 6 × 6
dos_precinct_key trump_harris_margin tract_acs_median_household_i…¹ pct_latino
<chr> <dbl> <dbl> <dbl>
1 0001 ACACIA 0.0430 69005 31.4
2 0002 ACOMA 0.254 95395 21.3
3 0003 ACUNA -0.508 49849 93.7
4 0004 ADOBE 0.115 78531 30.8
5 0005 ADORA 0.194 156354 15.5
6 0006 AGRITOPIA 0.0943 101179 15.7
# ℹ abbreviated name: ¹tract_acs_median_household_income
# ℹ 2 more variables: tract_acs_median_age <dbl>, tract_acs_gini_index <dbl>
The code above uses several dplyr verbs:
mutate()creates new columns (or modifies existing ones). Here we divide the raw Latino count by total population to get a percentage.select()chooses specific columns for subsetting.filter()keeps only rows that meet a condition (e.g.,filter(!is.na(trump_harris_margin))).group_by()+summarize()computes summary statistics within groups.- The pipe
|>passes the result of one step as the first argument to the next.
Figure 6.1 shows the geographic distribution of Trump’s margin across Arizona precincts.