Political scientists have been trying to summarize politicians’ ideological preferences for a long time. The most well accepted version of these are called ideal point estimates. These are measures of inherently unobservable preferences that are estimated from observed behavior. I see you voting in favor of a higher minimum age and regulating carbon dioxide as a pollutant, and I infer you’re probably a liberal. Or maybe you vote in favor of the Canadian oil pipeline as well as against “Obamacare” and I think you’re probably a conservative. As a sign they’ve hit the (nerdy) big time there’s now even a great XKCD comic about Keith Poole’s and Howard Rosenthal’s DW-NOMINATE scores.

The observed behavior that is most commonly employed are the Yea and Nay votes taken on roll calls in legislatures like Congress. These are very attractive to use as the raw data for ideal points for many reasons, one of which is that there is almost always an embarrassment of data. I’ve used them extensively in my research; here is a paper I recently published with Nolan McCarty on state legislative roll calls.

But they’re not perfect, for two reasons. First, a candidate at election time may present a different platform to voters than he actually uses as a guide to voting on roll calls once he achieves office. Second, by definition, they are only available after an election. This means we can’t get information on the losing candidate in state or district. This is a much more serious problem than the first.

An attractive alternative observable data is the candidate survey. In my opinion, the best candidate survey these days is administered by Project Vote Smart. It has been in the business of surveying tens of thousands of federal and state candidates for office since the mid 1990s. The questions it asks are numerous, well-phrased, and stretch across nearly all of the contentious political terrain you’d want them to. The results of their survey, which used to be called the National Political Awareness Test (NPAT) and is now the Political Courage Test (PCT), is published in a variety of formats for voters to use. The idea is that this makes it easier for voters to find out information on the policy preferences of candidates of whom they might otherwise know very little. The organization appears to be without a hint of partisan bias, as a nice bonus.

There’s another problem, one you might have guessed. Not every candidate answers the survey; in fact, fewer and fewer candidates do as time goes on. Many obviously feel that doing so could be an electoral liability now or in the future; better instead to refuse to be pinned down on many questions of policy specifics.

So Project Vote Smart figured out a solution in 2010 and now again in 2012. It would research answers to a subset of their candidate survey using good old fashioned research brawn. So nearly all of the congressional candidates in 2012 for nearly all of the congressional districts and all the states that are having elections to the House of Representatives and the Senate are represented in their 2012 Vote Easy tool. The tradeoff for this broad coverage is that only a small subset of policy stances could be researched for the many hundreds of candidates this year.

I’ve built on their work by merging their deeper but narrower NPAT with the smaller but broader Vote Easy. This gives us the best of both worlds. And the most important step is to estimate ideal points from this merged survey data. I’ve done this using a Bayesian two-parameter, one-dimensional item response model, implemented in the R statistical environment with Simon Jackman’s invaluable pscl package and visualized with Hadley Wickham’s powerful ggplot2 package.

How valid are these scores? One way to assess their external validity is to assess their convergence with measures taken from unrelated data. Luckily for me, just such an external data source exists in the form of Adam Bonica’s candidate scores for 2012. Bonica’s candidate scores correlate with my own at a level of r=0.88, which is quite high, especially as both of our measures are measured using no data in common and different estimators. The advantage of my method, though, is that it allows me to jointly classify candidates and voters, something I’ll be returning to in my blog in the coming days before the election.

For more technical details, you can consult a paper I cowrote on congressional voting with Jon Rogowski in part by using this data amalgam. You can find out more about my research on legislative ideology here.

Normally, I write something in 2012 for publication in 2013-2014 about what happened back in 2008 or 2010. Interesting, but not as much fun as it could (should) be. So, without further ado, here are the results of my exercise. Here are the plots of the two parties in 2012, and here are the underlying scores.

Big thanks to Chad Levinson, a political science PhD candidate at the University of Chicago, for helping me gather the survey data from Project Vote Smart.