Political Targeting Using k-means Clustering

Political campaigns send certain messages to people based on what the campaigns know about these potential voters, hoping that these specialized messages are more effective than generic messages at convincing people to show up and vote for their canddiate.

Imagine that a Democratic campaign wants to target Democrats for specific types of mailers or phone call conversations, but has limited resources. How could they decide on who gets what type of message? I'll show how I've been playing around with k-means clustering to get some insight into how people might decide what messages to send to which people.

Let's say this campaign could only, at most, afford four different types of messages. We could try to cluster four different types of Democrats, based on some information that a campaign has about the voters. I will use an unrealistic example here—given that it is survey data on specific issues—but I think it is nonetheless interesting and shows what such a simple algorithm is capable of.

My colleagues Chris Crandall, Laura Van Berkel, and I have asked online samples how they feel about specific political issues (e.g., gun laws, aborton, taxes). For the present analyses, I include only people who identify as Democrats, because I'm imagining that I'm trying to target Democratic voters.

I have 179 self-identified Democrats' answers on 17 specific policy questions, as well as how much they identify as liberal to conservative (on a 0 to 100 scale). I ran k-means clustering to these 17 policy questions, specifying four groups (i.e., the most this fictitious campaign could afford).

First, let's look at how many people were in each cluster. We can also look at how much each cluster, on average, identified themselves as liberal (0) to conservative (100):

Cluster Conservatism Size
1 46.32 22
2 31.84 43
3 27.96 47
4 10.45 67

These clusters are ordered by conservatism. We could see each group as just most conservative Democrats to most progressive Democrats, but can we get a more specific picture here?

What I did was create four different plots—one for each cluster—laying out how, on average, each cluster scored on each specific policy items. These items are standardized, which means that a score of 0 in the group means that they had the same opinion as the average Democrat in the sample. This will be important for interpretation. Let's look at Cluster 1. I call these “Religious Conservative Democrats,” as I will explain shortly:

plot of chunk unnamed-chunk-4

In general, these people tend to be more conservative than the average Democrat in our sample. But what really differentiates these people most? Three of the largest deviations from zero show the story: These people are much more against abortion access, much more of the belief that religion is important in everyday life, and much more against gay marriage than the average Democrat. These are not just conservative Democrats, but Democrats that are more conservative due to traditional religious beliefs. If I were advising the campaign, I would say, whatever they choose to do, do not focus their message on progressive stances on abortion access and gay marriage.

Let's turn to the Cluster 2, “Fiscally Conservative Democrats”:

plot of chunk unnamed-chunk-5

The biggest deviations from the average Democrat that these people have are that they are more likely to be against welfare, say that there is too much government spending, and oppose raising taxes. These people also support funding social security, stopping climate change, reducing economic inequality, and government providing healthcare less than the average Democrat. I would suggest targeting these people with social issues: access to abortion, supporting gay rights, funding fewer military programs, and supporting immigration. These people are about the same as the average Democrat on these issues.

Cluster 3 are what I have named “Moderate Democrats”:

plot of chunk unnamed-chunk-6

I almost want to name this group, “Democrats Likely to Agree with All of the Questions,” because they tended to support both conservative and liberal policices more than the average Democrat (except for the death penalty item, strangely). But they can be seen as moderates, or perhaps “ambivalent.” In comparison to the average Democrat, they are both more likely to say we should control borders and immigration and say that we should reduce economic inequality. Theoretically, this group is interesting. We could ask lots of empirical questions about them.

But pragmatically? They could probably be given messages that the candidate is most passionate about, polls best, etc., as they are likely to be more favorable toward any attitude—liberal or otherwise—than the average Democrat.

You might be wondering, “These groups all seem pretty conservative.” Remember that these scores are all relative to the average Democrat. Even if they score more conservatively on an issue, they are likely to support it less than a Republican.

In any case, Cluster 4 (the biggest cluster, about 37% of the sample) are the “Progressive Democrats”:

plot of chunk unnamed-chunk-7

In comparison to the average Democrat, these people support all of the liberal causes more and conservative causes less. I would suggest to those trying to campaign to these people that they would be open to the most progressive issues that the candidate has to offer.

As I mentioned before, this is somewhat of an unrealistic example: Campaigns don't have surveys laying around for registered voters. But there's an increasing amount of information out there that campaigns could use in lieu of directly asking people how they feel about issues: Facebook data, donations to specific causes, signing up for e-mail updates from different political organizations, etc. Information for most campaigns can be sparse, but people are increasingly able to access some of these more proprietary datasets.

k-means clustering is also a remarkably simple way to look at this issue. One line of code runs the algorithm. The command I ran was:

four <- kmeans(data,4)

And then I extracted the clusters for each person with the code:

four$cluster

Even if specific decisions are not made based on these simple cluster analyses, they are easy enough to do that I believe it is a good way to explore data and how respondents can be grouped together. Running multiple analyses specifying different numbers of clusters can help us understand how the people that answer these questions may be organized in pragmatically helpful ways.