Technical


My coauthor (Nolan McCarty) and I are releasing a new version of our state and chamber-level aggregate data. We have focused on two major updates:

  1. In all, we have 140 chamber-years of new data. These now include party data for Nebraska thanks to friend and coauthor Seth Masket, who generously provided the informal but well-known partisan affiliations for Unicameral legislators.
  2. The individual level data underlying this release has been extensively cleaned to minimize the random noise inherent in acquiring roll call votes from printed journals.

You can find the data here.

Political scientists have been trying to summarize politicians’ ideological preferences for a long time. The most well accepted version of these are called ideal point estimates. These are measures of inherently unobservable preferences that are estimated from observed behavior. I see you voting in favor of a higher minimum age and regulating carbon dioxide as a pollutant, and I infer you’re probably a liberal. Or maybe you vote in favor of the Canadian oil pipeline as well as against “Obamacare” and I think you’re probably a conservative. As a sign they’ve hit the (nerdy) big time there’s now even a great XKCD comic about Keith Poole’s and Howard Rosenthal’s DW-NOMINATE scores.

The observed behavior that is most commonly employed are the Yea and Nay votes taken on roll calls in legislatures like Congress. These are very attractive to use as the raw data for ideal points for many reasons, one of which is that there is almost always an embarrassment of data. I’ve used them extensively in my research; here is a paper I recently published with Nolan McCarty on state legislative roll calls.

But they’re not perfect, for two reasons. First, a candidate at election time may present a different platform to voters than he actually uses as a guide to voting on roll calls once he achieves office. Second, by definition, they are only available after an election. This means we can’t get information on the losing candidate in state or district. This is a much more serious problem than the first.

An attractive alternative observable data is the candidate survey. In my opinion, the best candidate survey these days is administered by Project Vote Smart. It has been in the business of surveying tens of thousands of federal and state candidates for office since the mid 1990s. The questions it asks are numerous, well-phrased, and stretch across nearly all of the contentious political terrain you’d want them to. The results of their survey, which used to be called the National Political Awareness Test (NPAT) and is now the Political Courage Test (PCT), is published in a variety of formats for voters to use. The idea is that this makes it easier for voters to find out information on the policy preferences of candidates of whom they might otherwise know very little. The organization appears to be without a hint of partisan bias, as a nice bonus.

There’s another problem, one you might have guessed. Not every candidate answers the survey; in fact, fewer and fewer candidates do as time goes on. Many obviously feel that doing so could be an electoral liability now or in the future; better instead to refuse to be pinned down on many questions of policy specifics.

So Project Vote Smart figured out a solution in 2010 and now again in 2012. It would research answers to a subset of their candidate survey using good old fashioned research brawn. So nearly all of the congressional candidates in 2012 for nearly all of the congressional districts and all the states that are having elections to the House of Representatives and the Senate are represented in their 2012 Vote Easy tool. The tradeoff for this broad coverage is that only a small subset of policy stances could be researched for the many hundreds of candidates this year.

I’ve built on their work by merging their deeper but narrower NPAT with the smaller but broader Vote Easy. This gives us the best of both worlds. And the most important step is to estimate ideal points from this merged survey data. I’ve done this using a Bayesian two-parameter, one-dimensional item response model, implemented in the R statistical environment with Simon Jackman’s invaluable pscl package and visualized with Hadley Wickham’s powerful ggplot2 package.

How valid are these scores? One way to assess their external validity is to assess their convergence with measures taken from unrelated data. Luckily for me, just such an external data source exists in the form of Adam Bonica’s candidate scores for 2012. Bonica’s candidate scores correlate with my own at a level of r=0.88, which is quite high, especially as both of our measures are measured using no data in common and different estimators. The advantage of my method, though, is that it allows me to jointly classify candidates and voters, something I’ll be returning to in my blog in the coming days before the election.

For more technical details, you can consult a paper I cowrote on congressional voting with Jon Rogowski in part by using this data amalgam. You can find out more about my research on legislative ideology here.

Normally, I write something in 2012 for publication in 2013-2014 about what happened back in 2008 or 2010. Interesting, but not as much fun as it could (should) be. So, without further ado, here are the results of my exercise. Here are the plots of the two parties in 2012, and here are the underlying scores.

Big thanks to Chad Levinson, a political science PhD candidate at the University of Chicago, for helping me gather the survey data from Project Vote Smart.

Click here for my scores for the 2012 House and Senate congressional candidates.

Graphs of the distributions can be found in this post, and an explanation of how I came up with these scores is in this post.

The fields in the spreadsheets are as follows:

  • stdist: Congressional district for House candidates
  • st: State abbreviation
  • party: D, R, or X (independent)
  • pid: –1,0,1 (equivalent to party)
  • full.name: Self-explanatory; sorry for screwups with accent marks and the like.
  • incumbent: 1 if incumbent, 0 if challenger
  • crp.id: Center for Responsive Politics identification number
  • npat.id: Project Vote Smart candidate id
  • score: Candidate ideal point or ideological position estimated from survey response as described here
  • sd: Measure of uncertainty around the point estimate in score
  • perc: Percentile ranking within the pool of all 2012 candidates, House and Senate. So a percentile score of 84.5 for  Mia Love (R) in Utah’s 4th District indicates Love ranks as more conservative than 84.5% of all 2012 candidates.
  • perc.r: Percentile ranking within the pool of 2012 Republican candidates, House and Senate. So Love scores 70.3, which indicates she is more conservative than 70.3% of all 2012 Republican candidates: that is, she is certainly quite conservative, even within her own party.
  • perc.d: Percentile ranking within the pool of 2012 Democratic candidates, House and Senate. Love’s opponent, Jim Matheson (D) with a percentile score of 1.6, indicating that he is more conservative than all but 1.6% of 2012 Democratic candidates. In other words, Matheson is extremely conservative for a Democrat, which is not surprising given the conservative character of Utah’s 4th district.

Seth Masket (University of Denver), in the course of generous words of praise, writes that he fears that variation in agenda control across states could undermine the comparability of ideal point estimates that I used to discuss the ideology of Dede Scozzafava, Republican candidate in a special election for the US House in the 23rd District of New York.

As a bit of introduction, in some states, agenda control is tighter, meaning that parties exercise tight control over what bills get introduced for a vote. In other states, agenda control is looser. For example, Seth’s home state of Colorado adopted an initiative in 1988 called GAVEL (Give A Vote to Every Legislator) which prevents party leaderships from suppressing bills in the early stages of lawmaking.

How could this be a problem? Well, roll call-based measures of ideology, of which our paper is one, relies on the public votes that are allowed to come to the floor for consideration. Thus it could be the case that we have a very “selective” roll call record that suppresses the true range of variation in ideology simply because some bills (typically the minority party’s) never get a vote in some states, but not others.

While variation in agenda control is very interesting, investigating it before now has been very difficult, because we don’t have legislator-level ideal point estimates. So having this problem should be considered a luxury…

But more broadly, existing evidence on Congress leads me to doubt that agenda control is that big of a problem for estimating ideal points. Remember that this was a debate about NOMINATE for a while (eg, Snyder 1992 and Rosenthal 1992, plus the simulation evidence in McCarty, Poole, and Rosenthal’s 2006 book), and these concerns havn’t really stopped the ideal point project.

The party line is that since agenda control isn’t perfect, and there’s always error in legislative voting, there should be enough cutpoints to differentiate between legislators. See Poole and Rosenthal on this.

As an additional bit of evidence, the common movement of ideal points between US House and Senate–despite very different agenda-setting institutions–implies that these institutions need not undermine our estimates too badly. Polarization looks like it’s rising in both chambers almost identically. But agenda control is far tighter in the House.

Finally, the roll call-based scores are “normed” by the Votesmart NPAT survey. It should be the case that, even if agenda control compresses the range of ideal points in the state alone, they are decompressed when considered in tandem with an external issue preference survey.

Still, it would behoove us to study these institutions far more carefully. I’m glad for people to use the data to do so, once we get the paper published!

Follow

Get every new post delivered to your Inbox.