To subscribe to the monthly C&E email newsletter and event announcements click here.

One of the most surprising early storylines in this year’s NFL season was the unexpected success of the Dallas Cowboys, led by a record-setting performance by running back DeMarco Murray. To begin the season Murray broke Jim Brown’s more than 50-year-old record by rushing for at least 100 yards in seven straight games.

While Murray’s achievement is remarkable, the nature of football as a coordinated effort among 22 players and dozens of coaching staff makes it difficult for observers to attribute credit for his success to any single factor.

Is Murray’s improved performance in his fourth year in the NFL due to improvement in his individual skills, training and preparation? Or is he the same player, but now supported by a markedly improved supporting cast of offensive linemen, which includes two 2014 first-round draft picks and is widely reputed to be the most talented in the league? This difficulty of attribution in a team-production environment is a thorny problem for NFL general managers attempting to make intelligent roster decisions.

A candidate seeking to hire a political consultant is in a situation similar to the team GM. The candidate would like to hire the consultant who would yield the best electoral return on his or her investment. But candidates have only a hazy idea of the quality of the consulting firms they’re considering. In addition to the holistic and qualitative information that a candidate should always use, is there a profitable role for quantitative information about consulting firms’ performance?

We believe that information about the electoral performance of a consultant’s past clients can be used to evaluate the quality of a consultant. Still, it can be difficult to do this properly.

What are the challenges that make a quantitative assessment of a consultant’s quality so difficult? Just as in the football example, election results are the product of a complex combination of factors involving many individuals’ efforts together with a mix of random events and a certain amount of chance.

Of primary importance is the fact that the candidates involved drive election results. In the Georgia gubernatorial race, for example, no observer would have expected the race to finish as close as it did if Gov. Nathan Deal’s (R) challenger were an unknown Democratic state senator instead of Jason Carter, grandson of former President Jimmy Carter. This basic fact of electoral politics is important for the evaluation of consultant quality because a consultant’s record of success and failure depends strongly on the collection of candidates for whom the consultant works. A consultant’s win-loss record partially reflects the quality and skill of the consultant, but to an even larger degree it reflects the quality of the consultant’s clients.

Because we cannot rewind history and see how well the same candidate in the same race would have performed had they chosen to hire a different consultant, it’s difficult to separate the contribution of the consultant alone from that of the client. The matches between consultants and candidates are far from randomly determined: better candidates tend to have more resources available to hire better consultants, and better consultants tend to have more ability to be selective in choosing to work for better candidates. This fact is a central challenge to the measurement of consultant performance.

A related issue is that modern campaigns often hire multiple consulting firms. For instance, a well-funded statewide candidate could hire a media firm, a digital media firm, a direct mail consultant and one that manages voter file data and directs grassroots GOTV efforts. It’s difficult to separate the effect of each one, because all are contributing towards the same goal: Getting the candidate elected.

Finally, electoral results are noisy: they’re affected by random factors that are under the control of neither the candidate nor the consultant. To cite one famous example, George W. Bush’s election in 2000 famously hinged on a few thousand Floridians confused by the “butterfly ballot.” Political scientists have found that such unpredictable factors can have real effects in elections: for instance, bad weather on Election Day depresses turnout among marginal voters, an effect which tends to favor Republican candidates. A naïve measure of consultant performance risks giving consultants credit or blame for random variation over which they have no control.

Despite these real obstacles, there is hope. Candidates trying to use quantitative information to learn the quality of a political consultant can take a cue from the world of education, where the problem of measuring a teacher’s quality faces many of the same obstacles. In education, scholars and practitioners have made tremendous strides in identifying the component of students’ academic performance that’s attributable to teachers. These “value-added” measures take the average test scores of students in a teacher’s class and then control for previous student performance and student characteristics, such as whether the student qualifies for free or reduced lunch. By averaging across a large number of students, the randomness that affects one student’s performance on one standardized test is drowned out relative to the teacher’s contribution.

With these realities in mind, what can we say about best practices for measuring the quality of political consultants? A strong consulting rating system would have similar features to the approaches used to evaluate teachers. Observers should use the history of previous electoral performance of a consultant’s clients while, crucially, accounting for the built-in electoral resources that clients would have had even in the absence of the consultant’s participation.

Accounting for district ideology, the quality of the client’s opponent, national partisan tides, and candidate incumbency status is of first-order importance. These characteristics of the client mix can and should be removed from contributing to the evaluation of consultant quality using statistical methods such as regression. Just as a teacher who teaches low-income students would not be expected to have the same average test scores as a teacher whose classroom is filled with high-income students, a consulting firm representing Democratic challengers in the 2010 election shouldn’t be evaluated as though its client mix consisted of safe Republican incumbents.

Second, instead of simply using whether a candidate won or lost, observers should use the much richer information contained in the candidate’s vote share. Knowing that a candidate won 45 percent of the vote relative to an expectation of 40 percent is meaningful information that shouldn’t be ignored simply because it didn’t result in a candidate winning the election. Given that a select minority of congressional elections are competitive, a win-loss outcome in most races isn’t particularly informative about the quality of the consultants involved. Using vote share relative to expectations allows an observer to use all of the available information, even in races where the ultimate winner isn’t in serious doubt.

Our research has found that consulting firms are not rewarded (in terms of their future revenues) for outperforming expectations in vote-share terms. The fact that most candidates seem to ignore this information opens up the possibility of using this information to find talented “diamonds in the rough”—firms that are effective at influencing races but don't yet have the reputation to command high fees. Michael Lewis’ book "Moneyball" described how the Oakland A’s used statistical information that other teams were ignoring to find underpaid but productive players, thereby putting together a competitive team despite the lowest salary budget in the major leagues. Clever use of under-utilized measures of consultant quality could provide similar opportunities in the political arena.

Finally, observers should use a consultant’s entire performance history instead of the performance in just a handful of races. Each individual election result depends on random factors beyond the control of the consultant. While Dave Brat’s campaign manager, Zachary Werrell, certainly deserves some credit for the neophyte’s upset primary victory over then-House Majority Leader Eric Cantor (Va.), other candidates would be wise to temper their expectations about his ability to repeat such an unlikely victory again in the future.

By averaging over many elections, rather than focusing on just one or two campaigns, this random component specific to each individual election becomes less and less important. With enough past elections to examine, the random component is drowned out by the consultant’s contribution.

No quantitative approach of measuring consultant contributions to the electoral performance of their clients will be perfect. But by combining the systematic information contained in election returns with more informal information, candidates for office can potentially find higher-quality political consultants.

Zachary Peskowitz is an assistant professor at Ohio State University in Columbus. Gregory Martin is an assistant professor at Emory University in Atlanta, Ga