Effectiveness Measurement: When Will We Get it Right?

Abstract: Credible demonstration of policy or program impacts depends on understanding the distinction between inputs, outputs, outcomes and indicators. Moreover, in order to be trusted, public reports on a programs’ performance need to focus more selectively on identifying the key measures of performance. In the first place, the aim of the article is to provide those involved in the practice of program evaluation with enhanced understanding of the current literature, reports and documentation on estimating impacts and results of government programs and policies. Secondly, it is designed to share definitions and guidelines used to determine economic impacts. Finally, this article includes current best practices involved in measuring incremental impacts, all of which, we contend, enable program evaluation staff providing them with new ways of approaching measurement, effectiveness and accountability in a strategic and comprehensive manner.


introduction
Performance measurement and reporting is now considered to be a critical component in public sector accountability [Auditor General of Canada (AG), 1997; Organization for Economic Cooperation and Development (OECD), 2004, 2007Treasury Board of Canada (TBS), 1998, 2001a, 2001b both in Canada and abroad. Indeed, public sector organizations remain under increasing pressure to measure progress toward results, have flexibility to adjust operations to meet expectations, and report on outcomes accomplished. In comparison to private sector organizations, public sector organizations neither seek to enhance their competitiveness nor promote their growth -these public institutions aim to provide the highest quality of service to the public and to manage for results. However, a significant element of public sector reform is an approach that pays greater attention to the results attained with taxpayers' dollars.
While the literature is replete with numerous models for assessing the impact and results of programs and policies, the number of program evaluators familiar with it, and who understand the appropriate methods that can be utilized to estimate and assess these impacts and results, is rather dismal. Thus, there are many advantages to having program evaluators understand results and impact assessment literature. For example, if evaluators consistently incorporated impact and results assessment principles in both summative and formative evaluations, it would afford organizations a stronger way of approaching performance accountability in a strategic and comprehensive manner. This would respond to the changing perception of the public regarding performance accountability, while providing enhanced information and opportunities for organizational management to make more effective choices in program investments.
Thus, the intention of this article is threefold. First, it presents a review of the current literature, reports and documentation on monitoring, as well as measuring results and long-term impacts of government programs and policies. Secondly, it shares definitions and guidelines used to determine results and impacts.
Finally, it presents the current best practices involved in measuring incremental results and impacts.

results-oriented meAsurement
According to Artley, Ellison & Kennedy (2002), Treasury Board of Canada (2000) and OECD (2007), most American state governments have performance measurements and planning regimes, as do most OECD countries. At the federal level in Canada, a results-oriented focus was initially launched in 1996, while the Ontario and Alberta governments unveiled their respective Quality Service Initiative and Results-Oriented Government Initiative in 1998. However, varying degrees of a performance measurement framework are utilized in the other provinces and territories within Canada.
The literature distinguishes two uses for performance measurement information. From a management perspective, performance information can be used to better understand the contributions and differences a program or policy is making. Furthermore, it enables program management to determine if a program or policy is the appropriate tool to achieve the desired result. In this regard, performance measurement is both a search for knowledge and an investigative tool.
Secondly, performance measurement is utilized to explain or demonstrate the performance achieved by a program. In many jurisdictions, there is an increased focus on reporting to elected officials and the public exactly what has been achieved with the tax dollars spent and resources used. Performance measures frequently form the basis of such reporting. According to Mayne (2004;2006), the question is how can performance measurement information be used to report credibly on what has been accomplished.
The Treasury Board of Canada (2001a) suggests performance reporting and management depend on the distinction between inputs, outputs, outcomes and indicators. Inputs are the resources allocated to programs and organizations. Outputs are the activities government agencies undertake, such as the provision of services. Outcomes are the eventual results of those activities in Vol. 14 Nº 27 terms of the public good. Schacter (2002a) and Curristine (2005) both note indicators are the empirical measures of inputs, outputs and outcomes. Hence, the thrust of performance measurement is to train attention on "outcomes", what ultimately matters the most, and link them to a logical model that connects inputs (resources) with activities, outputs and outcomes.
However, when examined more closely, performance measurement is more than simply "measuring impacts" -it entails a management regime that requires an organization to have a clear idea of its objectives, and a regular means of reporting its success achieving them [Goss Gilroy Incorporated (GGI), 1997]. Performance reporting is different from program or policy evaluation, which typically takes place at specific points in time in a program's life, and is a more comprehensive analysis of program impacts. It is imperative that performance measurement be viewed as part of a larger management regime, which attempts to link ongoing results with strategic planning, budgeting and resource allocation.
As mentioned in the literature, the development of a successful alternative methodology for measuring economic impacts requires an alignment between both key factors in order to do performance measurement.

clArity And KnoWledge -An imperAtive
Performance measurement is difficult enough in its own right, but particularly challenging in the context of non-commercial programs. Measuring performance requires clarity and consensus about objectives, as well as a logical model of causes and consequences, and how the organization's actions contribute to outputs and outcomes. Since performance measurement is about assessing the success of a program, it is vital to know what that program is about and what its intended objectives are. The difficulty associated with this task is gleaning the various perspectives from key organizational staff and then translating these perspectives into a coherent picture.
TBS literature (2001b) suggests a necessity to develop a "profile" of the program, which provides a con-cise description of the policy, program or initiative, including a discussion of the background, need, target population, delivery approach, resources, governance structure and planned results. This view is echoed by Schacter (2002a) who denotes the foundation of a good performance story is a detailed understanding of the program whose performance is to be measured. Furthermore, Schacter (2002b) and Mayne (2006) both articulate that the first and most important step in developing a performance measurement framework is to take the program apart: analyze it, dissect it, and break it up conceptually into its component parts. This requires a clear understanding of the goals and objectives of a program and how these are linked to the mandate of the organization. Others, such as GGI (1997) and Hatry (2004), suggest it is essential for an organization to determine the type of business it is in and how it intends to work before it can clearly identify performance expectations. Moreover, to be understood, public performance reporting needs to focus more selectively, and more meaningfully, on a smaller number of critical aspects or areas of performance. The issue then becomes how to determine these few aspects and how to engender confidence that selections are made to illuminate performance.
The literature on performance measurement indicates that at the heart of any performance reporting process is a "logic model" that ties inputs to activities as well as to short-term, intermediate and ultimate outcomes. Thus, according to Wholey and Hatry (1992), the logic model becomes a conceptual illustration of the "results chain", or how the activities of a policy, program or initiative are expected to lead to the achievement of the final outcomes. In addition, developing and using a logic model has a number of benefits for program managers, including: a) developing consensus on what the program is trying to accomplish; b) developing an understanding of how the program is working; c) clearly identifying the clients of the program; d) seeking and obtaining agreement on precisely what results are intended, and e) identifying the key measures of performance.
Part of the challenge associated with performance measurement is identifying appropriate indicators for the different levels of outcomes, and making judgments about the specific contribution of the program, policy or initiative under measurement. As highlighted in the literature, performance can only be measured if there are both outputs and outcomes. Even if a program is explicit regarding its intended outcomes, selecting indicators is not automatic. Successful performance measurement depends, in part, on finding credible indicators that relate something important about a program, and which can be successfully measured.
It should be well understood that performance indicators are the measure actually used to assess a specific aspect of performance and that no single indicator is adequate. Therefore, choosing the best set of performance indicators is central to ensuring that the right results are being measured. According to Canada's Auditor General (AG) (Office of the Auditor General, 1997), results can be measured in many ways by using many different kinds of information when stakeholders agree upon appropriate performance indicators. Without agreement from stakeholders, there is a risk that inappropriate performance will be encouraged.

linKing resources to results-A process requirement
Performance measurement is not an end in itself. Measurement should contribute to the wider process of governmental resource allocation. Linking resources to results is a mechanism for supporting transparency in the government decision-making process. As well, such steps enrich accountability in a citizen-centered approach. Theoretically, if programs are found to be under-performing, resources should be reallocated to other programs that have demonstrated public benefits. In addition, there is evidence from several jurisdictions to suggest the alignment of resources to results. Many organizations report actual performance against targeted performance. Alignment is most common between budgeted resources and expected results. According to Artley, Ellison & Kennedy (2002) and the TBS (2000), alignment of actual expenditures against actual performance is less common. As well, most organizations indicate they are making progress in tracking results, but they are not there yet.

tell A convincing story
There is a paradox of performance measurement acknowledged in the literature. As noted previously, performance measurement is driven by both precision and a clear assessment of the contribution of government programs to specific outcomes. The literature acknowledges that there are significant technical problems associated with disentangling the specific effect of those programs from other factors that might contribute to those outcomes. Schacter (2002a) argues that good performance measurement is an exercise in storytelling. He maintains successful performance measurement must acknowledge there is an element of judgment. Furthermore, he notes the importance of acknowledging the limits of both the chosen indicators and the evidence for those indicators. According to the same author, a well-developed performance framework allows to tell a convincing story, backed by credible evidence, about the value added by the program to some particular segment of society (Schacter, 2002b).
Moreover, Schacter (2002b) suggests performance measures derive their meaning from high-level outcomes. For example, when a policy has several highlevel outcomes, some of which may be in opposing directions, how is performance measurement possible? Finally, Schacter (2002a) articulates clarity will be the touchstone; and it will be up to the performance-measurement framework to force some clarity in relation to high-level outcomes.
In addition, as noted by CCAF Canada (2002), selecting the areas or aspects of performance on which reporting will focus is, in fact, a judgment. What constitutes an appropriate focus for reporting will depend on circumstances and on the perceptions and values of key stakeholders, as well as on the level of the reporting unit and the view of management. Both the TBS (2001) and CCAF Canada (2001) agreed that performance reporting should not be considered in isolation, but that it is best considered in the wider context of the governance, management and comptrollership. As pointed out by CCAF Canada (2001), getting these factors right is a critical ingredient in the successful establishment of a performance measurement regime. Thus, the exercise of judgment and allocation of attribution requires Vol. 14 Nº 27 reflection on an organization's environment. Hence, the organization has to be outward-looking.
The literature highlights four organizational implications of performance measurement. First, if a true performance measurement regime is established, it implies the organization has a focus on performance and outcomes rather than on process or outputs. Second, there is a willingness to be evaluated at both the organizational and personal level. Third, there is a focus on continuous improvement so that performance measurement is linked to the development and adjustment of new programs and resource allocation. Fourth, there is greater transparency, and accountability to both internal and external stakeholders.

outcomes AccountAbility
According to Bird et al. (2005) and Mayne (2006), the literature highlights an important shift in the notion of "accountability." Mayne suggests that, in the past, accountability referred largely to the processes followed, the inputs used, and the outputs produced in the public service domain. This focus was consistent with the more traditional view of accountability, emphasizing what could be controlled and assigning corrective action when things went wrong. If the expected process was not followed, improper inputs were used or outputs were not delivered, then responsibility could be placed with the appropriate individual, and appropriate action taken. Mayne (2004) argues that under the traditional paradigm, there is a reticence within government to accept accountability for results beyond outputs -that is, outcomes over which one does not have full control. In other words, within government, being accountable for outputs has been much more widely practiced in the past than accountability for outcomes. Under this paradigm, establishing the links between activities and outputs (i.e. attribution) is not a significant issue, especially when it can clearly be shown that the program produced the outputs. However, as further noted by Bird et al. (2005), establishing the links between activities and outcomes (i.e. attribution of the program to outcomes realized) is a much more significant task.
Other researchers (Bartik, 2003;Bolton, 2003;Hatry, 2004) ponder the notion that accountability for results or outcomes asks if everything possible has been done with authorities and resources to effect the achievement of the intended results, and if it has been learned from past experience what works and does not work. Accounting for results of this kind means demonstrating that there has been a difference through actions and efforts, and that it has contributed to the results achieved. As argued by Hatry (2004), finding credible ways to demonstrate the move toward managing for results is essential to succeed.

meAsurement And its limitAtions
As pointed out by Canada's Auditor General (Office of the Auditor General, 1997General, , 1998General, , 2000General, , 2003, there remains a constant need to rethink what measurement can usefully mean. Even with a carefully designed evaluation study, definitively determining the extent to which a program contributes to a particular outcome is usually not possible. In fact, measurement in the public sector is becoming less about precision and more about increasing our understanding and knowledge about what works, thereby reducing the uncertainty about program impacts. This view of measurement implies a requirement to gather additional data and information that will increase our understanding about a program and its impacts, even if we cannot "prove" things in an absolute sense. However, it might allow us to provide a reasonable estimate of the magnitude of the impact. Perhaps more importantly, this view recognizes that softer, qualitative measurement tools should be included within the concept of measurement in the public sector. Since there is the necessity to be realistic about program outcomes, there is also a need to acknowledge other factors at play that may influence these outcomes. Moreover, Mayne (2004;2006) contends it is imperative a more honest and credible approach by acknowledging that these influences exist, rather than pretending otherwise. When we acknowledge that other factors are at play, it is not immediately clear what effect the program has had, or is having, in producing the outcome in question.
Increasingly, there is recognition that such measurement has its limitations, perhaps implying that we should accept some uncertainty about the unavailability of performance measures in some cases. When it is absolutely necessary to have a high degree of certainty regarding a program's contribution, it becomes even more crucial to ensure rigor within the evaluation measurement process. For example, in recent years, the OAG has cited several government programs for inadequate information linking expenditures to outcomes. Mayne and Zapico-Goni's (1997) examination of the literature indicates performance measurement is being utilized increasingly to measure program performance. Generally speaking, program evaluations are designed to assess both intended and non-intended impacts of a program based on valid and reliable data collection and analysis. On the other hand, performance measurement is characterized as the ongoing measurement of a program's execution by applying indicators to track performance. Increasingly, organizations are endeavouring to measure or track the subsequent impacts of these programs, policies or initiatives at either the intermediate or final outcomes.

evAluAtion versus performAnce meAsurement
In the absence of a well-designed evaluation study, what can be done to get a measure of attribution of the program? Canada's Auditor General (Office of the Auditor General, 2000) suggests it is possible to structure performance measurement systems to directly acquire attribution measures. For example, in the case of a "normal" or typical performance measurement or monitoring system, the AG recommends the utilization of "contribution analysis" to get a handle on attribution issues. In addition, the AG recommends measuring the impacts on program recipients against the changes occurring to non-recipients. Thus, such approaches become de facto evaluations. However, while such an approach is possible, it requires a carefully constructed and often expensive measurement strategy that is not usually associated with most performance measurement approaches.

Attribution
The literature is clear on the concept of program "effectiveness. " Government programs are designed to produce certain "intended outcomes" such as: a healthier public, better living conditions, healthier communities, more jobs, and so on. Effective programs are those that can demonstrate these results. In other words, they contribute to the public view of value for money expended. However, in the quest to measure program performance, we face two challenges: first, measuring whether or not these outcomes are actually occurring; and second, determining what contribution the specific program has made to the outcome. The second is perhaps the more difficult question in that it attempts to determine how much of the success (or failure) can be attributed to the program.
Despite the difficulties associated with attribution measurement, both Wholey and Hatry (1992) and TBS (2001a) literature stress that attribution cannot be ignored when trying to assess program performances. In fact, when little can be said about the worth of the program, how can advice be provided regarding about future program directions? As well, the AG urges performance measurement to take into consideration the possibility that observed changes in outcomes would have occurred -occurred at a lower level, or at some future date-even without the program or policy. Accordingly, this supports the notion of other factors at play in addition to the impact of the program's activities (other government programs, or actions, economic factors, social trends, etc.), all of which can have an effect on outcomes. Hence, this measurement problem needs to be properly addressed to support the notion of results accountability.
According to Wholey and Hatry (1992), program evaluation is one measurement discipline that endeavours to answer the attribution question. The more traditional approach is to utilize a controlled comparison to estimate what would happen with the program in place versus what would happen without the program. Although social science methodology has been designed to address the issue of attribution, an evaluation study probably remains the best way to address it. program activities are making a difference. Recognizing other factors at play, while still believing the program is making a contribution, is a critical first step.
At the end of the day it can be stated that contribution analysis attempts to explore, and perhaps demonstrate, "plausible association. " A thought is echoed by M. Hendricks, as cited in Mayne (2001) who noted "…plausible association is whether a reasonable person, knowing what has occurred in the program and knowing the intended outcomes actually occurred, agrees that the program contributed to those outcomes" (p. 8).

conclusion
Gaining an in-depth understanding of the literature related to best practices in impact measurement is the first step in building a credible methodology to measure program impacts. In addition, utilizing both contribution analysis and other appropriate techniques and approaches to add rigor increases evidence validity.
Furthermore, as noted by both the TBS and AG, there is a need to explore issues in systematic ways, and when reporting results "the totality of the evidence gathered -some of it strong, some perhaps rather weak-that builds a credible performance story" (Mayne, 2004, pp. 49-50) increases the knowledge regarding program contribution. As well, AG research (Office of the Auditor General, 2003) postulates that in most cases we tend to measure with the aim of reducing uncertainty, rather than proving the level of contribution.
Moreover, if an alternative methodology to measure program impacts, such as building a credible performance story of attribution using all available evidence, has been explored and if there are gaps in the story, the measurement methodology should recognize this. As suggested by Mayne (2004;2006), theory-driven performance measurement, such as contribution analysis, would enable a better understanding of just how programs are working and would support the notion of improved reporting of past performance as well as future performance. Thus, As previously noted, the current thinking acknowledges the difficulty in the public sector of measuring outcomes and establishing links to program activities in a cost-effective manner. An additional and related problem is the need to deal with accountability. That is, the need to visibly demonstrate that programs have made a difference, and the actions and efforts that program activities have contributed toward the results achieved.
Furthermore, although evaluations and performance measurement studies frequently measure whether or not outcomes are occurring, the more difficult question is determining program contribution to program outcome. How much success or failure can be attributed to the program? What contribution did the program make? What influence did it have? A key challenge in performance measurement is attribution, or determining what contribution a program has been to a specific outcome. Mayne (2006) contends the more difficult question is usually determining how much the specific program in question has contributed to the outcome. Even with carefully designed evaluation studies, as pointed out by the AG on many occasions, determining the extent to which government programs play a role to particular outcomes is usually not possible. The AG suggests undertaking a "Contribution Analysis" approach, using a specified number of steps to address attribution through performance measurement.

Addressing Attribution: contribution AnAlysis
The literature underscores and emphasizes that what is needed for understanding and reporting is a specific analysis to provide information on the contribution of a program toward the outcomes it is trying to influence. The literature is unmistakably clear on the subject of simply measuring and reporting performance based on performance measurement systems, without any discussion or analysis of other factors at play. This kind of performance measurement information is thought to have little credibility. Moreover, the literature urges managers to be realistic about outcomes, especially if they are trying to either influence or want to gain insight and understanding as to whether and how the developing an alternative methodology would entail the following: • providing a well-articulated presentation of the context of the program and its general aims, along with the strategies it is using to achieve those ends; • presenting a plausible program theory leading to the overall aims (the logic of the program has not been disproved and the underlying assumptions appear to remain valid); • describing the activities and outputs produced by the program; • highlighting the results of the contribution analysis indicating there is an association between what the program has done and the outcomes observed; and • developing reasonable explanations for outcome(s) that take into consideration external factors, or clearly demonstrating any influence such factors have had on the outcome(s) in question.
Finally, the literature highlights the recognition that measurement is becoming less about precision and more about increasing the overall understanding of program contribution in comparison to intended outcomes. Also highlighted in the literature is the need to consider the broader array of factors at play that could contribute additional data and information. Lastly, the literature bespeaks the need to keep an open mind when developing alternative methodologies -such an approach will provide a more credible demonstration of program impacts.