GovernmentOversight andEvaluabilityAssessment |
Download EA1 |
|
| Download EA2 |
||
| Download EA3 |
||
It is Always More ExpensiveWhen the Carpenter Types |
||
Joe N. NayPerformance DevelopmentInstitute Peg KayInstitute for ComputerSciences and Technology/ National Bureau of Standards |
||
LexingtonBooksD.C.Heath and CompanyLexington, Massachusetts Toronto |
||
Library of Congress Cataloging in Publication Data
Library of Congress Catalog Card Number: 81-47750
Nay, Joe N
Government oversight and evaluability
assessment.
Includes index.
1.Evaluation research (Social action
programs)United States.
I.Kay, Peg. H. Title.
H62.5.U5N39 361'.973 8147750
ISBN 066904833x AACR2
Copyright © 1982 by D.C. Heath and Company
All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage or retrieval system, without permission in writing from the publisher.
Published simultaneously in Canada
Printed in the United States of America
International Standard Book Number: 0669--04833x
Library of Congress Catalog Card Number: 81-47750
Contents
Chapter 1 Evaluation in Perspective
Relating Evaluation to Purposeful Management Behavior:
Exploring Organizational Boundaries
Evaluation
Direct Intervention
Those In Charge
What, then, Is an Evaluator to Do?
Chapter 2 Expanding upon Evaluation as a Part of Purposeful Behavior
Operation of a Simple, Mechanistic, Feedback System
Why is Life like a Furnace
Part II Describing the Universe: Models and Measurement
Representing Complex Social Systems and Organizations
Finding the Direct Intervention: Where Functional Models Begin
Drawing an Example from the Home Heating System
Why Make a Model of the Direct Intervention?
How High Is this Table from the Floor?: Steps in Measurement
Some Kinds of Error
Part III The Domain of Those in Charge
Chapter 5 Those in Charge of Government Agencies
The Planners Dream of the Government Organization
The Government Organization during Waking Hours
Multiple Levels of Organization
Multiple Levels of Semantics
Chapter 6 What is Acceptable to Those in Charge
Why is Acceptability a Problem?
Factors Influencing Acceptability
Analytic Integrity and Political Efficacy
Chapter 7 Some Clues on Exploring the Those-In-Charge Domain
The Treasure Hunt
Chapter 8 Purposeful Behavior Revisited: First Find out What the Purpose Really Is
Where Organizational Incentives Come From
Effects of Organizational Incentives on the Individual
What to Look For
Why Appropriate Measures and Comparisons May Meet either Acceptance or Resistance
Part IV Evaluability Assessment or We Said all That to Say This
Chapter 9 Evaluability Assessment: An Overview
Testable and Equivalency Information Comparisons and Reconciliations within and between Families of Models
Sequential Purchase of Information
Two Ways to Foul-Up an Attempt at Oversight
Big Organizations Make It All Harder: Hierarchies and Discontinuities Synopsis
Chapter 10 Testable Information
The Testable Logic Model
The Testable Functional Model
Discussion and Validation
Warning
Chapter 11 Equivalency Information
The Preliminaries
Building the Equivalency Models
Alarums and Excursions
Level of Detail
Convergence
The Hybrid Testable-Equivalency Model
Chapter 12 When is a Program Evaluable?
Evaluable Programs
Comparisons and Test of Evaluability
Summary
Chapter 13 The Sequential Purchase of Information
Some Examples of Why Costs and Effort Vary
Acceptance of Sequential Purchase
Chapter 14 Sources of Information for EA Models
Sources of Information and the Related Models
Recapitulation and Coda
Appendix The Creation and Development of EA
Figures
|
1-1 |
Evaluation in Perspective |
|
1-2 |
A Paradigm of Purposeful Behavior with 3 Domains |
|
1-3 |
A Simple Paradigm |
|
1-4 |
Laying Out and Measuring a Direct Intervention |
|
1-5 |
Those in Charge |
|
2-1 |
A Home Heating System |
|
2-2 |
A Home Furnace with Policy Management Included |
|
2-3 |
A Social Program |
|
3-1 |
The Mystery Model |
|
3-2 |
A Rudimentary Model |
|
3-3 |
The Furnace Company and the House as Black Boxes |
|
3-4 |
Logic Model of the Office Personnel |
|
3-5 |
Two Models of a Home Heating System |
|
3-6 |
Functional Model Arranged to Trace Energy Flow (Control Circuits Omitted) |
|
3-7 |
Five Levels of Models of a Heating System |
|
3-8 |
Garbage Transfer |
|
3-9 |
Mystery Model of a School System |
|
3-10 |
Functional Model Arranged to Trace Knowledge Transfer (Teaching and Technique Knowledge Omitted) |
|
3-11 |
Knowledge Transfer Model |
|
3-12 |
Three Types of Models: Logic, Measurement and Functional |
|
5-1 |
The Planners' Dream of a Governmental Organization |
|
5-2 |
Levels in Organization |
|
6-1 |
Examples of Some Possible Measures and Comparisons |
|
6-2 |
How to Evaluate a Furnace |
|
6-3 |
Factors Affecting the Acceptability of Measures and Comparisons |
|
8-1 |
The Individual Comparing the Outcome of Action with the Local Market Standard |
|
8-2 |
The Individual Comparing the Outcome of Action with Local and Exogenous Market Standards and Rank Ordering the Importance of the Markets |
|
9-1 |
Basic Distinctions Used in Evaluability Assessment |
|
9-2 |
Testable Functional Model of Child-Health Projects |
|
9-3 |
Equivalency Functional Model of Child-Health Project I |
|
9-4 |
Equivalency Functional Model of Child-Health Project II |
|
10-1 |
One Overall Logic Model for Universal, Free Education |
|
10-2 |
Superintendent's Logic Model for CAI Program |
|
10-3 |
Assistant Superintendent's Logic Model for CAI Program |
|
10-4 |
Curriculum Coordinator's Logic Model for CAI Program |
|
10-5 |
Director of Evaluation's Logic Model for CAI Program |
|
10-6 |
Principal's Logic Model for CAI Program |
|
10-7 |
Example of a Combined-Logic Model for CAI Program |
|
10-8 |
Laying Out a Direct Intervention |
|
10-9 |
Black-Box Testable Functional Model of CAI Program |
|
10-10 |
Some Elements of a Testable Functional Model of the CAI Direct-Intervention and Comparison Groups |
|
10-11 |
Additional Detail on the Instruction Testable Functional Model |
|
10-12 |
Some Suggested Measurements Based on Testable Functional Model of CAI Program |
|
11-1 |
Black-Box Equivalency Functional Model of Actual Program |
|
11-2 |
Some Elements of an Equivalency Functional Model of the Textbook-Demonstration-Program Direct Intervention |
|
11-3 |
Additional Detail on the Textbook-Demonstration Equivalency Functional Model |
|
11-4 |
A Hybrid Testable Equivalency Model of the Textbook Demonstration |
|
12-1 |
What is Evaluable? Two Kinds of Comparisons |
|
12-2 |
A Basic Format for Laying Out Evaluable Questions and the Means for Answering Them |
|
12-3 |
Some Elements of a Hybrid Functional Model of the Textbook Demonstration |
|
12-4 |
Model, Measures, and Issues Related |
|
12-5 |
Some Evaluable Questions for a Child-Health Project |
|
12-6 |
Logic Underlying Section 111(jj) and 1133(d)(4) of the 1977 Clean Air Act Amendments |
|
12-7 |
Hybrid Functional Model of Homer City Plants and the Direct Intervention: An Innovation in Coal Cleaning |
|
12-8 |
Sample Elements of a Model with Multiple Domains: The Louisville Emission-Offset Bank |
|
12-9 |
Evaluable Model of an Offset Bank |
|
13-1 |
A Paradigm for Sequential Purchase |
|
13-2 |
Simplest Hazardous-Waste Flow |
|
14-1 |
Gathering Material for Models |
|
14-2 |
Sources of Operational-Activity Information and Related Equivalency Models |
|
14-3 |
Each Organization Level Is a Potential Source for Testable Descriptions as Well |
|
14-4 |
Much of the Rhetoric is Written |
|
14-5 |
Deriving the Testable Model |
|
14-6 |
Different Types of Information that May Be Considered, Collected, or Created during EA |
|
Tables |
|
|
4-1 |
Steps in Obtaining a Measurement |
|
5-1 |
Levels of Semantics in the Dog Program |
|
5-2 |
Levels of Semantics in Mental Health and Schools |
|
12-1 |
What Programs are Evaluable? |
|
13-1 |
Examples of Universes, Domains, Flows, and Measurement |
| EA1 | Title Page, Publisher Information, Table of Contents, List of Figures and Tables, Preface and Acknowledgments, Introduction, and Part I through Part III Chapter 5. |
| EA2 | Part 3 Chapter 6 through Part 4 Chapter 11 |
| EA3 | Part 4 Chapter 12 through Chapter 15, Appendix, and Index |
| EA-ALL | The entire book |
Acknowledgments
Several groups of analysts of government have long known a set of secrets that allows them to solve major problems of oversight and government operations when nearly everyone else is getting the wrong answers. The use of these secrets multiplies the reach of analysts of diverse background and helps pull together the teams of diverse specialists often needed to attack a particular problem. The secrets also provide a powerful set of questions against which to critique a study, evaluation, or policy analysis.
In this book we have tried to write down some problems that recur again and again in large organizations during day-to-day operations, including reorganization, reform, oversight, and evaluation. The approaches for surfacing these problems and dealing with them are the secrets. For a long time these secrets were viewed as quite dangerous by many people because they often quickly uncovered wide disparities between what everyone said was happening and what was really going on. As more people have turned to examining what the government can and cannot do and how the things that must be done are to be accom plished, the secrets have become more acceptable.
This book reveals the secrets of distinguishing more sharply between the necessary rhetorics of government and its realities. It tells how to compare the two, attempts to close the gap, and gives an introduction to using a special kind of model for representation. It also illustrates the importance of sequential purchase of information in dealing with large organizational systems. These and several associated secrets about how large organizations work are now revealed to you because, after all, you have bought the book.
We would like to acknowledge a large invisible college of colleagues, teachers, and friends in and out of government whoover the last ten yearstested ideas, contributed concepts, made applications of the material, and returned to tell us where our guidance had helped and where it had steered them wrong. A short history of the early development is given in the appendix, but to name all of the participants would be impossible.
We acknowledge more broadly all of those people in government who, when faced with nonsense or bewildering complexity, stop andwith or without this bookask, Is it possible to find out whether all of this activity is accomplishing it's purposecan it be evaluated? They are natural evaluability assessors and are providing a wide base of acceptance for this work. We hope that this material will help them to continue to improve government.
The late Don Weidman (first with The Urban Institute's Program Evaluation Group and then with the Office of Management and Budget) contributed not
only to the development of this work but also coined the term evaluable. The world is a little poorer without his work, wry wit, and accurate criticism.
Mary Sarley has been Joe Nay's secretary as well as general overseer for this book for ten years, no easy task. More than that, however, she has been a colleague and friend who inexplicably refrained from braining both of us during successive revisions of this material over those years. She alone knows how many versions there have been.
The Urban Institute and the Ford Foundation funded some of the early developments of these ideas, even though some of the work was often threatening to cherished notions of how government should be run. Once, when the book was nearly abandoned, the Russell Sage Foundation called and asked us to apply for a grant to finish it. We never did apply for the grant, but thinking about it got us started again; for that, we thank Russell Sage. The National Rural Center provided some money for typing and revision of the draft preceding this one. The Performance Development Institute allowed this version to be typed as the final revisions were made.
The Experimental Technology Incentive Program (ETIP) Regulatory Processes and Effects Project at the Department of Commercea team effort over the last four yearshas allowed us to correct many pieces of the theory that we thought we already understood. What we learned and accomplished there in regulation is the subject for some later book, but the book in your hand is much clearer and more accurate because of our experience in stretching oversight and assessment ideas from a program context to fit into regulatory-agency activities.
Joe Alan and Kenneth and Eric Nay proofread the last revisions. Warren Frederick of the Performance Development Institute provided invaluable con ceptual help and examples of several concepts that could be fitted into this book. The format illustrations are due to his work. He is the only technical person (besides the authors) to have read this revision through from beginning to end Any technical problems that have been missed should therefore be taken up directly with him. Robert Kershaw and his group at the General Accounting Office have made both theoretical and practical contributions to this work.
As the book took on its final form, our colleagues at the Performance Development Institute and at the Institute for Computer Sciences and Tech nology added invaluable refinements and insights, as did several evaluability assessors around townin particular, Barry Rosenthal and Jim Statman of Aurora Associates and Marco Fiorello and Peter Eirich of Fiorello, Shaw Associates.
Finally, during the last revisions of chapters 9-15, we were provided with a sign over our basement work table by Eric Nay. It is from Soren Kierkegaard and reads: All essential knowledge relates to existence, or only such knowl edge as has an essential relationship to existence is essential knowledge. Not bad advice for people searching for the actual locations of the direct inter ventions of government and for what is actually happening at those locations.
Eli Kay provided another prescient sign, to wit: To err is human, but to really foul up, you need a computer. Not bad advice for people searching for eternal truth.
Within hours of the announcement that the Reagan administration had appointed David Stockman to the post of director of the Office of Management and Budget (0MB), the gentle hum of copy machines and the clatter of collators was heard throughout the land, or at least throughout as much of the land as mattered to those affected firstthat is, Washington, D.C. By the following morning, copies of a five-year-old Stockman article were on the desks of every senior bureaucrat and every important (and some not-so-important) consultants in the capitol area.[1] How this spontaneous generation of paper occurred remains a mystery to the authors, but occur it did and the news it brought was not goodnamely, that this was no above-the-fray president talking; this was the man who had his hands on the money. Worse yet, he had been preparing a federal-spending hit list for a least five years. Hundreds of people moved to hundreds of phones and began to dial familiar numbers in the Congressthey paused as they recalled that their respective legislators had not returned to the Ninety-Seventh Congress. The iron trianglestight coalitions of congressional committees, interest groups, and agency program directors drawn together to provide and maintain funding for particular programshad been blasted asunder by the electorate.
Apart from its stunning effect on the bureaucracy and assorted feeders at the public trough, the Stockman article held some intrinsic interest. His major theme is illustrated by the following excerpt:
Having fed on a "starved public sector" rhetorical diet, the dominant liberal forces in Congress are simply unwilling or unable to recognize that this perhaps once appropriate complaint has...ceased to reflect reality, and that a major reordering of...spending priorities...has now become imperative.[2]
Stockman used several examples to illustrate the theme. The over-success of the Hill-Burton Act in closing the nation s hospital gap was cited, as Stockman pointed out that the combination of a present excessive supply of beds combined with insurance-system incentives for in-hospital treatment had created a costly over-hospitalization of the U.S. publicthat is, Hill-Burton had created excess hospital beds, and excess hospital beds had created their own excessive demand .[3]
Stockman went on to say that the major programs of the great society, contrary to their creators intents, ultimately wound up subsidizing the haves rather than the have-nots. As examples, he mentioned several education initiatives and the once controversial community-action program that, he claimed, has been transmuted into innocuous social-spending pumps such as Head Start, Emergency Food and Medical Services, and Upward Bound.
This fact of transmutation of government programs brings usalbeit circuitouslyto the point of this book. The dominant liberal forces in Congress certainly did not intend to subsidize the haves nor did they intend to bring the nation to the edge of bankruptcy in order to subsidize the have-nots. Whatever the intents, however, many observers believe that those results have nearly occurred.. Whatever the intents of the newly dominant conservative forces, there is every reason to believe thatas Congress and the bureaucracy now operatetheir intentions too will undergo similar transmutations.
Controlling the politically motivated transmutation of program intent depends, for the most part, on the self-discipline of the politicians in power (and those who do analysis for them), although procedural control mechanisms are available if the politicians choose to use them.[4
For the most part, this book is not concerned with the deliberate transmutation of programs as they occur in Congress (although we take that up briefly in Part III and chapter 14). We do, however, investigate political motivations and program rhetoric, not from the aspects of rightness or wrongness but from the aspect of their relationship with program operation. Thus, setting aside for the moment the question of whether an emergency food and medical services program is an appropriate community-action-program enterprise, one of our major aims is to revive a long-dormant interest for any government operation in whether the intended intervention is being made at all, in the way that it was supposed to be carried out, and at a cost commensurate with its worth.
We found Stockman s community-action-program example particularly intriguing because of one of the authors experiences with a small program agency in the South. A contractor team was investigating the operations of the local government agencies in a county of about 25,000 people. Of these, about 400 had been identified as being poor and living in substandard housing that was in need of weatherization. Conservatively figuring four people per substandard household, the number of housing units in question was, at maximum, 100. At the time the contractor team entered the county, the community-action agency was requesting additional funds from the three organizations (city, county, state) that had been jointly funding the weatherization program. The agency previously had hired a CETA-trained weatherizer and had secured sufficient money to finish the project if the weatherizer completed two houses per daya projected completion rate based on experiences in nearby counties. Unfortunately, the weatherizer was completing only half a house a day; clearly, they needed funds for three more weatherizers.
When we explored the nature of the weatherizing operation, we discovered that the combined regulations of the three funding agencies compelled the weatherizer to (1) buy only enough material at one time to weatherize one house, (2) enter each purchase transaction on three separate and slightly different forms, and (3) stop in at headquarters to drop off each set of forms and pick up vouchers for the next batch of materials purchased. This to-and-fro traveling ate up a large chunk of the weatherizer s time (not to mention gasoline). Added to that was the time he spent filling out the required formsa very long process since the man was barely literate and understandably made quite a lot of mistakes.[5] In fact, the traveling and the clerking together took up almost exactly three-quarters of his time. All of which illustrates a fundamental principle of operation: It's always more expensive when the carpenter types.
In regulation and defense, as well as in the social programs of this country, the same stultifying multiplicity of requirements is also occurring. It occurs because of different governmental requirements, but the effect can be the same on industry as on the carpenter. As regards regulation, for instance, the majority agrees that some regulation is necessary in many areas. Striking a proper balance of operation without having political intent transmute into undesired results remains as important and difficult as ever.
The evaluability-assessment procedures described in this book are based on the conviction that for government to operate sure footedly, it must know what the problem being attacked is, how that problem can be handled, and how the suggested program operates in practicenot what someone thinks the problem is, claims will cure it, or how someone thinks the program is operating or intends it to operate. If the carpenters are spending most of their time typing, it is a sure bet that the houses will not get weatherized, and it is silly to spend money to discover that the people in them are still cold.
Enough experience has been gained to see that evaluability assessment will work with nearly any belief system or governmental operation. Indeed, the examples offered in the book range from social-service delivery to military helicopter sorties to garbage collection. In all cases, the principle is the samenamely, look at the intervention actually being made by the government. Vague rhetoric is usually a poor guide to policy formulation. The rhetoric of both conservatives and liberals often tends to differ from the reality of government operations in practice. Often, though, it is the rhetoric, rather than the reality, on which important belief structures are based. Money is spent, administrative systems are developed, and government operations are evaluated as if the rhetoric were the reality.
Distinguishing between belief structures and actuality and then making comparisons between them is essential to successful oversight and evaluation of government operations. In this book we present the evaluability-assessment method for doing so. Over the years, however, we have noticed that the federal government (like most large bureaucracies) has certain characteristics that may stymie even the most assiduous evaluability assessors as they attempt to draw the distinctions and make the comparisons. We therefore spend considerable time discussing those characteristics and offering some advice on coping with them.
This book is not tied to a given ideology. Liberals, conservatives, libertarians, and vegetarians, for that matter, can use the concepts and procedures to good effect. People who believe they have an intervention that solves a thorny prob1cm can use the techniques to demonstrate the rightness of their approach and to help carry out the solution (assuming that the solution is not demonstrably harebrained). People who believe they have discovered a program that is a ticking time bomb should, equally, be able to use the approach to demonstrate its dangers and to help dismantle it before it explodes (again assuming some plausibility to their beliefs).
We have tried to cover some of the problems that recurrently get in the way of conducting good oversight or evaluation. We begin, in Part 1, with some underlying ideas, address the representation of reality in Part II, and examine the people in charge of government operations in Part III. All of this is given as background for a presentation of the four cornerstones of evaluability assessment in Part IV:
1. The construction of two families of modelsthat is, testable
models based on information derived from descriptions and equivalency models
based on information derived from observation;
2. Comparisons and
reconciliations within and between the two families to produce an evaluable
model, often the basis for immediate action and always the basis of evaluation
design;
3. The construction and use of functional models (see chapters 3
and 12) to display the relevant structure and flows of both the described and
observed activities of interest;
4. A phased approach to the entire
investigation that permits those in charge to make sequential purchases of
information.
Earlier drafts of this book have been usedwe hope to good effectby a number of people. Drafts have been used as textbooks for students in masters of public administration programs; federal program managers have distributed them to contractors and bidders; government officials themselves have used them as primers; and congressional staff have read them in an effort to improve their oversight approaches. To all of those people, we apologize for any errorseither in concept or semanticsthat appeared in the drafts. We think we have become smarter (or at least, more experienced) over the years.
Notes
1. David
Stockman, "The Social Pork Barrel," Public Interest 39 (Spring 1975):
330.
2. Stockman, "Social Pork
Barrel," p. 12.
3. This is, of course,
a public-sector version of Say s Law, the supplysider s dictum: Supply creates
its own demand.
4. An
evaluability-assessment procedure for Capitol Hill was described in "Finding
Out How Programs Are Working: Suggestions for Congressional Oversight" (Report
to the Congress by the Comptroller General of the United States, U.S. General
Accounting Office, PAD-78-3, 22 November 1977).
5. The CETA program that trained him could hardly
be faulted since it was supposed to produce rough carpenters, not office
workers.
Part 1Some Underlying Ideas |
The material in this book is based on three underlying concepts:
Organizations are systems that exhibit purposeful behavior.
The existence and operation of information feedback loops and information comparisons motivate the purposeful behavior of organizations.
The stated rhetorical positions of management are often quite different from the activities that comprise the actual delivery of a specific governmental service or intervention.
Implicit in these concepts are the assumptions that evaluation activities are (or should be) (I) part of the organizational system, (2) an important (but not the only) information feedback loop, and (3) concerned with providing information about the actual delivery (as well as the rhetorical descriptions) of a governmental service or intervention.
Looked at in that way, it is clear that a book about doing evaluation cannot be a text about statistical techniques, a treatise on mathematical modeling, or a handbook on experimental and quasi-experimental designs. If it is to serve its function, a book about the evaluation of governmental programs should be about the operation of governmental organizations in general, with specific emphasis on how evaluation activities can most usefully relate to the rest of the system.
In order to understand the relationship of evaluation to the system in which it is embedded, the reader should have at least some familiarity with a few basic concepts about how organizations work. Therefore, the two chapters in this first part describe a simplifiedor oversimplifiedorganizational context. Chapter 1 develops our definition of evaluation through an overview of the three domains within a government organization: (1) evaluation, (2) direct intervention, and (3) those in charge. Chapter 2 describes purposeful behavior and how it is affected by the operation of a single feedback loop. A simple, common systema home heating systemis used to illustrate the points.
Any large organization is, of course, much more complicated than this. The root concepts developed in these two early chapters are expanded throughout the book to the stage that they can be used in actual organizations to solve real problems.
|
Evaluation in Perspective |
Over the past several decades, evaluation has assumed an increasingly visible role in the operation of large government programs. Offices of planning and evaluation are part of the standing organizations of federal, state, and many local governmental units. Congress, in funding new programs, often designates an almost pro forma 1 percent for evaluation. Yet for all the attention paid to evaluation no universal consensus exists as to what it is and what it should do whether it is a tool for deciding among alternative methods of delivering a given service or performing a given task, an offshoot of financial accountability, a way of rating programs long dead, or even the only way to spend 1 percent of a program budget.
Attempts at defining evaluation range from a few simple statements to entire books. Joseph Wholey et al., in Federal Evaluation Policy, provided one of the more-concise descriptions. Program evaluation, they said:[1]
Assesses the effectiveness of an ongoing program in achieving its objectives,
Relies on the principles of research design to distinguish a program s effect from those of other forces working in a situation,
Aims at program improvement through a modification of current operations.
In other words, evaluation is a methodological approach to improve the quality of information about a program and to structure the information so that decision makers can use it while the program is still in operation. In this view, evaluation is part of purposeful management behavior.
This book is based on this functional definition. The intended relationship that occurs when evaluation is part of purposeful management behavior is shown in figure 1-1.
Relating Evaluation to Purposeful Management Behavior
The bottom of figure 1-1 shows a process being carried out in its environment. A number of people are engaging in an activity or set of activities for the purpose of accomplishing a concrete objectivefor example, to perform an appendectomy, to climb Mt. Everest, to win a ball game, or to distribute food stamps.

In order to direct purposefully the day-to-day operation of the process, the person immediately in charge requires some specific information, or measurements, based on the continuing performance of the process. For instance, if the process is a series of baseball games and the objective is to win them, the coach would want measured information including elements such as earned run averages, runs scored against the team, games won and lost, the team record in games played on natural versus artificial turf, and the record against various opponents. Gathering, analyzing, and reporting this information for the purpose of day-today program direction is one function of evaluation as shown by the internal loop on figure 1-1.
In addition to evaluation activity, nearly every real process is the focus of some additional activity that has been created by management. A baseball season, for instance, requires that players be assembled, uniforms bought, games scheduled, a ball park obtained, and so on. In most large operations, management is not directly involved in any of these real processes but, rather, oversees the activities from a distance. The more distant management is, the more often it bases its own activities and decisions, not on a contextual familiarity with a Evaluation in Perspective 5 process but on reports, gut feelings, and preconceived notions of what is going on out there. Together, this agglomeration of information and mental filtering form an abstract model of the process. In most cases, management has built and amended its model over time by trial and error.
The equivalence of management s model to reality depends on many factors including the ability of the managers to accept reality, the reliability of reports, and of course, the size and complexity of the process, the complexity of the organization involved, and the number of people operating and managing the process. Even in small processes, management will always deal with some abstraction rather than with total reality since reality is so rich (and in part so irrelevant) as to disable decisions about it.
Another function of evaluation, then, is gathering, analyzing, and reporting information to management. This enables the managers to refine their models so that they are more nearly equivalent to reality and thus, presumably, a better basis for making management decisions. This process is shown by the outer loop in figure 1-1. Much of this book is devoted to describing this function of evaluation.
If figure 1-1 is divided into three domains (as in figure 1-2), the definition of evaluation can be more easily understood. One domain now contains the activities that are taking place in the process that is being managed. Another domain contains the activities of the people who are attempting to manage the process. The third domain contains the evaluation activity.

It is not hard to imagine, on an organizational level, cases for which some of the elements in the feedback loops are missing. A simple process could be operating nearly by itself. Certainly, many cases exist for which only the management and the process may be there. The managers may then control the process through their immediate sensing of what is occurring; they have absorbed the evaluation function into their own work. Or they may manage by only controlling inputs to the process and by paying little or no attention to how the process operates. This may be a perfectly satisfactory management method as long as the model the managers use has a sufficient resemblance to reality. However, if, for instance, the management model of how to win the world series is not sufficiently representative, decisions about training regimen, players to be traded, and so on may be in error. Successive measurements (for instance, game scores) may call this to management s attention. To be successful, the managers will then alter their models of how their team wins ball games. Some form of evaluation activity (measurements and comparisons) is needed in order both to make further decisions and to compare expected results based on models of an activity with the observed actual results of the process itself.
It is not uncommon to find cases in which a process exists, evaluation is ostensibly taking place, and yet no one is making any use of the results sometimes because management is not organized, or motivated, in a way that allows the information to be used and sometimes because it is not evident that the information received is relevant to making decisions. This dangling evaluation occurs much more often in governmental organizationspartly because of unclear criteriathan in the private sector. In business, the profit-and-loss statement is an unavoidable evaluation measure. In sports, standing in the league is a clear, unambiguous measure. In government, however, readily available, unavoidable measures are much more difficult to obtain and agree upon. Further, there are less often directly involved vociferous aids such as irate fans or disgruntled stockholders with sufficient clout to ensure that evaluation results are examined and used to improve performance.
The primary purpose of this book is to examine some of the problems in developing agreed-upon measures, collecting relevant data, and ensuring the use of evaluation results by governmental management. The approach we use can also clarify those cases in which it is unlikely that the evaluator will be able to discover what is expected of the process or, even if expectations are articulated, where it is improbable that the intent can be realized. The evaluator needs, to be sensitive to the possibilities of both success and failure.
If Wholey et al.'s definition is now reviewed in terms of the simple organizational context shown in figure 1-2, "assesses the effectiveness of an on-going program in achieving its objectives" deals almost entirely with measurements and comparisonsthat is, did the team play well? (Note that the definition of well involves the comparison of the measurements taken against a standard. Determination of that standard should take place in the management, not the process, sector. Did the team win the world series? If not, how close did it come?) "Relies on the principles of research design to distinguish a program s effect from those of other forces working in a situation" is also principally a measurement-and-analysis problem with the emphasis on research methods to ensure that the evaluation is properly donethat is, did the team come into possession of the pennant because it won the series, or did the general manager win it arm wrestling the owner of the Kansas City Royals? "Aims at program improvement through a modification of current operations" is a statement about both cornpanson standards and the continuing use of evaluation information. A structure of potential information usable to management must be determined and designed and the evaluations carried out so that the desired information is produced and used.
The definition of evaluation can be reduced to three simple words: measurement (of what is actually going on), comparison (how the activity compares with some standard representing both the model and the expectations of management), and use (what gets done with the information). Analysis and comparison activities are carried out principally in the evaluation domain. Yet if management's expectations are to be successfully measured, evaluators must go into the management domain in order to determine the expectations (uses for information are keyed to expectations) and into the process domain to develop models on which measurements can be based and, of course, to collect the basic data. The models and measurements should reflect the expectations of management. Suppose, for example, that management s expectations are that the team shows up for the game looking presentable. In this case, management decisions would concern only the number and kind of uniforms to buy and various transportation arrangements. An analysis of won/lost statistics would be irrelevant. The attempt to design the proper measures and comparisons and to obtain usage requires much information from, and possibly some preconditions being met in, both the process and management domains.
A detailed discussion of the domains is beyond the scope of this introductory chapter. However, a quick overview is in order.
Exploring Organizational Boundaries
In order to describe evaluation in terms of its effect on organizational behavior (and the effects of the organization on evaluation), it is necessary to develop some abstractions of our own that have close parallels in reality. The organizational reality is often as impenetrable as a thorn thicket. We can, however, develop a simplified framework sufficiently realistic to permit an examination of the organizational and evaluation processes.
We have indicated that at least three domains are important: evaluation, direct intervention, and those in charge. (Hereafter, the rather awkward appellation those in charge will be used instead of management since many of the activities in the government do not in any way resemble what is commonly called management by professional managers.) Those three activities are arranged as shown in figure 1-3 and affect each other (in terms of short-term operation) as shown by the arrows. The activities considered to be in each area of the diagram are described in later sections.

The reason for exploring each of these domains is simple enough. In actually laying out a useful and usable evaluation, information from each of the sectors must be gathered and brought together. Most of the information to be gathered is particular to, or may be thought to be from, one of the three domains of interest as developed here. The people who inhabit the different domains have different perspectives and different needs. They even speak slightly different languages. For the most part, the people in the evaluation and those-in-charge domains are dealing with more-abstract models of the real process in question. The direct-intervention domain actually contains the real process. It is a matter of some importance to be able to identify the domain from which different types of information are gathered, since without that demarcation it is difficult to distinguish the degree to which such information represents actual events.
Evaluation
We assume that true evaluators must do one or more of the following things:
1. Construct models for use in
measurement and analysis. These models are simply diagrams of some sort that
display how the characteristics of the process to be measured are assumed to be
interrelated and the importance of the characteristics to each other, to the
points of measurement, and to the environment. They reflect an informed
abstraction of reality that is tailored to the uses of those in charge.
2. Make measurements of some part of
the intervention proper or of some activity or phenomenon explicitly assumed to
cause (or enable) or to be caused by the intervention. For instance, runs
scored is a measurement taken from the intervention, uniforms ordered is a
measurement of an enabling activity, and fan satisfaction is the measurement of
a phenomenon in the environment assumed to be caused by the intervention.
Assumptions about causal relations should always be explicitly and carefully
statedthat is, the assumption that, if the team plays well, its
performance will cause the fans to be satisfied, is different from assuming
that satisfied fans are an indication that the team is playing well. If you
doubt this, remember that the newly formed New York Mets played to packed
houses of ecstatic fans when the team was unquestionably the worst in baseball.
By the same token, the world-champion Oakland team performed in a nearempty
stadium.
3. Perform data analysis to
bound the reliability of the measurements taken. For instance, if runs scored
and runs scored against are selected as measures of the quality of team play,
how many games must be analyzed in order to be 70 percent sure of the answer?
4. Analyze sets of related
measurements in order to test the validity of the model being used to represent
realtythat is, are the displayed characteristics really as interrelated
and important as believed? For instance, suppose that runs scored and runs
scored against are measured in every game of the season and that runs scored
outnumber runs scored against by two to one. Because all the games have been
analyzed, it can be inferred that the measurement is 100 percent accurate. If
it then turns out that the team has lost four-fifths of its games, it is a
reasonable inference that either a very rare event has occurred or that
something is wrong with the measurement modelthat the assumed
interrelationship between runs scored/runs scored against and winning is
invalid.
5. Compare the models of the
real process, on which measurements are based, with the models constructed by
those in charge, on which expectations are based. These comparisons might cover
a range of questions such as: Is there a group of appropriately equipped men
playing ball? Is the new defense combination working? Is the activity directed
toward the objectives of those in charge of the operation? (for example, is the
team s won/lost record good enough to reach the world series?).
6. Reduce the results of any of these preceding
steps to two forms: One form, easily readable by a technically literate reader;
the other, by busy senior people.
Many other activities may be done by evaluators. For example, they may write guidelines; give talks on methods, goals, and objectives; draw organization charts; or spend endless hours in discussions with deputy assistant secretaries. Unless they are also doing at least one of the six items just listed, however, they are not doing what we define as evaluations.
As described here, evaluators get their information from two places. Concepts, grand plans, and goals (which contain management s expectations) are obtained in interviews with those in charge. Measurement data of the process and its outcomes are more usually taken in and around the direct-intervention domain. This domain is described next.
Direct Intervention
The purpose of many interventions by government is to deliver or perform a service of some kind or to alter the way in which a service is performed by others. A direct intervention, as used here, is the actual delivery or performance of the service. It does not include the policy decisions about the service or any of the other myriad activities that are predicated on someone s abstraction of the process.
The point of intervention is defined as the boundary between the person delivering the service and the recipient of the service. For instance, the government employee who actually does something for or to a citizen is at the point of intervention (for example, an employment-service counselor who places a citizen in a job, an army sergeant who trains a recruit, a police officer who arrests a burglar, an employee of the sanitation department who picks up the garbage, or an IRS employee who reviews and examines income-tax returns). In some cases, the personnel at the point of intervention are not government employees per se but people who have been commissioned (usually by contract or grant) to perform the service. Thus, if a city chooses to hire a garbage-collection firm rather than have city employees pick up the debris, the garbage collectors are nonetheless the people at the points of intervention. The people who hired them to do the dirty work are not in the direct-intervention domain. This is an elementary distinction but an important one. To restate, the point of direct intervention is the location at which the performance of service actually takes place. (A model of the process of direct intervention constructed for analytic use is an abstraction. Measurements can only be made of the characteristics of the reality, not of the model. If an evaluator is to measure an intervention activity, the evaluator must go to where the action isnamely, the point of intervention.)
Figure 1-4 is an expanded representation of a direct intervention embedded in its environment. It shows the different places at which measurements of a direct intervention can be made.

The model shows the direct intervention sitting in its immediate environment. The direct-intervention domain is peopled by everyone directly effecting or directly affected by the intervention. Normally, these include the immediate supervisor of the intervenor as wellfor example, a baseball coach. This domain is composed of people who deal principally with the process under examination. Those who work mainly from a model of the process are not included in the direct-intervention domain.
The cloudlike outline in figure 1-4 represents the boundary of the intervention s immediate enviromment. The inputs to the intervention are drawn from within the boundary. The inputs to the intervention are those things that will be directly affected by the intervention (for example, garbage, people, potholes). Contributions to the intervention process are things that are intended to (and sometimes actually do) help the intervention take place. These inputs include money, uniforms, guidelines, technical assistance, and intervenors.
Process measures describe how the intervention is being carried out and to what extent. These measures concern how the operation is getting on without regard to the overall effects it may have. Process measures often include things such as the action taken by the people involved, how many people are serviced, what exactly a service consists of, how convenient the arrangements are, and how people feel about them.
Outcome measures are, in effect, the last easily attributable process measures. They describe how given characteristics of the input were directly altered as an end result of the intervention in a way directly attributable to the intervention.
Impact measures describe the effects of the intervention and its outcomes on the environment. Impact usually involves the test and validation of some cause-and-effect hypothesis. An accurate impact measure is often a contradiction in terms since it implies that attribution or demonstration of cause and effect can be establishedusually a dubious demonstration since the expected impact almost invariably takes place at a temporal and/or logical distance from the intervention.
The difference between process, outcome, and impact measures is sometimes ambiguous near the edges. For instance, time spent by professionals is clearly a process measure. Number of people trained to be welders as a measure of a job training program could be either a process or an outcome measure depending on the goal of the projectfor example, is it supposed to train welders, to place people in jobs, to raise people s incomes? Thus, the categorization of particular measures as process or outcome often represents an interpretation of expectations. The interpretation is usually derived through a process of iterative interviews with the intervenors and those in charge.
Earlier, we stated the assumption that if the team plays well the fans will be satisfied. The play of the team during a baseball season could be regarded as an intervention; the players then are one of the inputs to the intervention; bases are contributions to the process; runs scored is a process measure; and the team s standing in the league is an outcome measure. So far, the measurements have been reasonably precise and the interrelationships among them and the activity reasonably straightforward. The fans, however, are out in the environment. Their relationship to the intervention is shrouded by unknowns. One may feel that if the team plays well the fans should be satisfied. That moral imperative may then lead one to assume that if the team plays well the fans will be satisfied. The jump from the moral imperative to the rational certitude is, to some extent, a blind leap, and a number of intervening variables may appear while the evaluator is in midair. Fans may be dissatisfied because the concession stand sells warm beer. Fans may be satisfied because the manager often kicks dirt on the umpires. And what do we mean by fans? Are they people who pay for game tickets? Who watch the team on television? Or are they all the people in town that might be enticed to watch the games if the team played well enough? Despite the tenuous relationships between the intervention and its impact on the environment, evaluators are often expected to measure the impact and often must model and include the environment presumed to be affected, relating the various measurements to each other. When selecting impact characteristics to be measured, particular care must be taken to choose characteristics that can be measured and to show what assumptions are being made. When reporting the results of impact measurements, literally fanatical care must be taken to lay out the entire chain of assumptions about cause and effect, the location and nature of possible intervening variables, and the measurements that serve as adequate proof of the assumption chain.
Control or comparison measures are sometimes made. One purpose of these measures is to control for intervening variablesthat is, to gain more assurance that the observed change in the inputs to the intervention, or in the environment, is really caused by the intervention. The control or comparison measurements are taken from some group or process that is believed to be similar to the one being evaluated but that has not been affected by the intervention. The two sets of measurements are compared, and if a change occurs in the group or process being evaluated but does not occur in the control or comparison group, the change is then often attributed to the intervention. The selection of control or comparison groups is a complicated business and historically seems to involve nearly as many assumptions as the selection of impact measures. The history of the use of comparison groups in social-science research contains many more questionable cases and failures than unambiguous successes. As in the selection and reporting of impact measures, the evaluator should proceed warily and explicitly.
Those in Charge
As a rule, direct interventions by government do not just happen. Somewhere, sometime there were enabling interventions, or interventions intending to create direct interventions. These enabling interventions emanate from some source of authorityfor example, Congress, a city council, or a school board. Sometimes the enabling intervention takes the form of an explicit directivefor example, "The public works department shall build a bus terminal at M St. and 21st." Sometimes, the enabling legislation is little more than an expression of good intentfor example, "The ombudsman shall ensure that all citizens dealing with the city government be treated fairly." Quite often, the enabling intervention is an amalgam of political compromises and encompasses an astounding number of hopes and dreams. The those-in-charge domain lies between the source of authority and the direct-intervention domain. Those in charge are the people who translate the language and intent of the enabling intervention into directions and guidelines for the direct intervenors and who pass along the money.
This sector is especially interesting in large bureaucratic organizations. In many cases, the actual language of the enabling intervention was produced either as the result of a political compromise or of an intent to go forth and do good. In these cases, do not be surprised to discover that quite extended chains of effects are assumed (at least rhetorically) to be caused by even the simplest of enabling interventions.
For instance, funding a day-care center is supposed to cause a series of outcomes and impacts including, but not limited to, the following:
Replacing inadequate child care with adequate child care,
Enabling children to live up to their potential,
Raising nutritional and health-care standards of poor children,
Raising the net income of the poor,
Reducing welfare rolls,
Disencumbering women so that they can enter the labor force.
One way or another, those in charge must translate these expectations into directives that presumably guide a real day-care center.
The those-in-charge domain includes more people than those in a straight line between the authority and the direct intervention. It also includes people who are owned by those in charge. For instance, some people act as in-house extensions of those in chargefor example, secretaries, assistants, office managers, or vice-presidents for acquisitions. It also includes people who operate the ancillary activities to the direct intervention. A baseball team, for instance, requires people who purchase uniforms, rent ball parks, or sell tickets.
Therefore, those in charge are usually operating on two levels. On one level they are involved in a real process (albeit not the process that principally concerns the evaluator). They write letters, answer the telephone, mediate (or cause) office disputes, and hire and fire subordinates.
On the other level, those in charge deal with their models of the real process (or direct intervention) under consideration. These are called testable, or rhetorical, models for the sake of distinguishing between them and the equivalency models constructed for measurement and analysis purposes by the evaluators. On the basis of the rhetorical models, those in charge provide certain contributions to the processfor example, money, guidelines, and supplies. It is not uncommon to find that, among those in charge, several different rhetorical models are in use for the same direct intervention. Figure 1-5 helps to illustrate why this happens.

In a large bureaucratic organization, usually several layers of management exist between the source Of authority and the direct intervention. Each layer deals with its own activities, has concourse with different units of ancillary activity, and ordinarily has its own perspective. For instance, the person who administers the financial aspects of the contract for a day-care center has a much different testable model than the person who is responsible for enforcement of federal day-care standards. Both models will be considerably different from that of the executive who operates on the policy level. It is possible that none of the three models will look much like the day-care center that actually has children in it.
There may be a few organizations in the world wherein the management team plans, organizes, coordinates, and controls. Based on our experience, such organizations are not likely to contain those in charge of a complex government program. The managers usually are too busy for such activities. Their days are spent cajoling subcommittees or vendors, sitting on advisory committees or being advised, talking with people outside the organization who may have important political information, giving information to media representatives, coping with the endless administrivia visited upon them, or just reacting to crises. Very little then is left to reconcile the disparate testable models that exist either in the political domain or the those-in-charge sector or even to test their expectations in great detail. Any systematic planning that occurs is often done off the record by a long-time confidant. Occasionally such work may be attempted in an office entitled planning and evaluationusually one of the more-arcane units of government.
What, then, Is an Evaluator to Do?
In order carry out the three-part process of program evaluation described at the beginning of this chapter, the evaluator must extract the expectations about the intervention from those in charge. The testable models must be examined, reconciled, and reduced to testable terms. By testable, we mean a coherent description of an assumed process that can be compared to the process as it actually exists. Choosing the precise testable models to work with is not a simple matter. Virtually all the members of the those-in-charge domain have their own models of both what they do and what others, including the direct intervenors, do. For the evaluator, an important and often unavoidable testable model is the one owned by the person who is requesting the evaluation, especially if that person is likely to use the results. It is not, however, the only important model. To select the important models it is necessary to discover who will be involved in implementing the results of the evaluation as well as what the intervention process actually looks like. The official chain of authority is another good starting place because tracing it out often uncovers the actual lines of authority. Another good plan is to follow the moneythat is, if a change is to be made, the hands that make the change often hold the dollars.
For example, the testable model of the person actually owning the ball club may be a very important one. The person in the ball club who is responsible for handing money over to the general manager also has a model worth noting. The general manager, in turn, allocates a certain amount of money to the coach. The coach's model is clearly of some significance. (Whether the coach, or the firstline supervisor, belongs in the those-in-charge domain, the direct-intervention domain, or somewhere in between depends on the way that the coach performs the job. Some so-called office coaches work entirely from models of the activity; others deal directly with the realty; and most do a bit of bothfor example, work with the offense and leave the defense to an assistant working with some general guidance.)
Recognize, though, that the routes of money and authority are only two of several possibilities. In the event, it may prove more fruitful to trace the models of the people who transfer things such as information or influence. The only way to tell is to choose a promising path, follow it, and see where it leads.
Some of the ancillary models may also be important. If an assumption implicit in the predominant testable model is that good equipment is essential to good play, it would be of no little interest that the person responsible for buying equipment works from a model in which spiked shoes are purchased in bulk from a cut-rate outlet to hold down expenses. It is imperative that the evaluator talk to enough people among those in, charge to begin to construct a unified testable model that shows what the program is believed to look like and what the specific expectations for it are.
It should be noted that the actual expectations for a program are often buried deep in the rhetoric. Mistaking the nature of expectations can lead to fundamental mistakes in the construction of the measurements to be taken. For instance, suppose that the expectation for the ball team was stated as "Win the world series." Rhetoric aside, the objective might really be to make money. If that turns out to be the case, the boundary of the direct-intervention sector must be redrawn to include all of the money-making activities such as concession rental; some personnel previously thought of as ancillary to those in charge now become intervenors (for example, ticket sellers); and winning the world series is now regarded as a process rather than an outcome measure. Uncovering the expectations can be a laborious process involving frequent conversations with both those in charge and the direct intervenors. The evaluators should be prepared to scrap their models and measurements and to start over if it becomes apparent that the originals were based on mistaken understandings of the expectations. Evaluators are not the only people who misunderstand the expectation of those in charge. Successful coaches sometimes lose their jobs that way.
The rhetorical information collected from those in charge should be used in preparing a unified testable model of expectationsthat is, one that describes functional relationships with measurable expectations of outcome. This testable model is a model of the beliefs of those in charge that can be reality tested through comparison with a model of the process based on the direct observations of the evaluator.
The first test check is to see if this testable model is fiction or fact. The evaluator does this by personally going and looking at the actual process in question. Quite often surprises await. The testable model may be one that depicts an activity designed to train people for gainful employment with the expectation of placing them in long-term jobs where they perform services desperately needed by grateful communities. The reality may be that thirty currently unemployed hairdressers are trained each month in a community glutted with beauty parlors. This initial information may be all the busy people in charge want to know, or it may be something hardly anyone wants to know at all. However, bringing expectations and measurements together always provokes interest. Either reaction or ignorance may result. (The act of ignoring hard information that contradicts expectations and rhetorical positions is, of course, one of the two definitions of management or legislative oversight.) The evaluators have done the first step of their job. Those in charge can now grapple with whether to change what they believe about the process or to change the process itself. If more information is neededif, for instance, the process actually looks something like the testable model and if those in charge want to know how close the intervention is coming to their expectationsthen the evaluators attempt to construct more-detailed models that are equivalent to the direct-intervention sector and to select measurements based on these models. Matching the expectations (in the form of testable, or answerable, questions) to measurable phenomena (at the direct intervention), selecting the measurement instruments and comparison methods, and refining the evaluation design are the next steps. Evaluators can then go forth and make measurements, analyze their data, validate their models, assess the meeting of expectations, and prepare their reports with some hope that the effort will lead to program improvement through a modification of current operations. In fact, the preparatory steps described here often lead to program modifications before further evaluation work is done.
Note
1. Joseph Wholey, John Scanlon, Hugh Duffy, James Fukumoto, and Leona Vogt, Federal Evaluation Policy: Analyzing the Effects of Public Programs (Washington, D.C.: Urban Institute (URI40001), 1970).
|
Expanding upon Evaluation as a Part of Purposeful Behavior |
Chapter 1 placed evaluation and the organization in perspective. Evaluation is only one aid to an organization in achieving purposeful management behavior. This chapter expands on the nature of purposeful behavior itself and evaluation as a part of that behavior.
Behavior is purposeful when it is directed toward a goal and when the attempts to close the gap between- actual performance and expectations for that performance are predicated on the size, nature, and tendencies of past and present errors in meeting expectations. The expectation can be conscious, in that a policy decision has been made to attempt to do something, or unconscious, like the habit of placing one foot in front of the other in order to walk. Whether conscious or unconscious, an expectation exists, an attempt is made to attain it, the error between expectation and performance is sensed, and behavior is adjusted so as to reduce the error.
Herbert Simon has pointed out that "the simplest movementtaking a step, focusing the eyes on an objectis purposive in nature, and only gradually develops in the infant from its earliest random movements. In achieving the integration the human being. . . observes the consequences of his movements and adjusts them to achieve the desired purpose."[1] Infants learning to walk do several things. They decide what they want to do (initially, this is only a policy). They then attempt to do it. As they try, they compare the results of what they just did with their standard of proper performance. Later, a little older and wiser, they try again, each time modifying their behavior in order to reduce the error between what happens and what is desired.
Thus, the child, learning to feed itself with a spoon, may initially put food all over itself (and its environment). In successive iterations, however, the child learns to focus control of the process directly upon the error distance between the spoon and the mouth and continues to improve the ability to reduce this error in actual practice until eating with a spoon becomes a learned behavior stored away in the brain and usable whenever needed.
Purposeful management behavior has many similarities. Management expectations are often initially policy decisions. Through organization, managers attempt to turn their expectations into reality by bringing to bear the parts of the organization that should be able to carry out the policy. The hard part comes in getting the workers to compare the results of what they have done with management s expectation for successful accomplishments and, if necessary, to alter their (the workers own) behavior so that, over time, the gap between expectation and performance continually narrows.
The four essential elements of purposeful behavior are
Evaluation provides the measurements and comparisons for such feedback and is often also involved in translating the expectations into clear standards. Used in support of purposeful management behavior, the functions of evaluation are to create methodologically sound information in a manner that permits valid comparisons with a standard, to perform those comparisons, and to inform the managers and the operators of the results of the comparisons.
A large (and still growing) body of literature exists describing sound methodologies that involve sophisticated (or unsophisticated) statistical techniques and experimental designs, usually dealing with the problem of obtaining useful measures and comparisons out of already sound data. However, if the implementation of those designs and the exercise of those techniques are to support purposeful management behavior (rather than to exist as random activities or academic exercises), the evaluators and managers must understand the several steps in the creation and operation of successful performance feedback systems.
Members of any organization participate in many groups both inside and outside their organization. As Chester Barnard has pointed out, this participation conditions the actions they take.[2] Members of an organization receive feedback from many places. Their actions are compared to the expectations of friends, supervisors, and colleagues and returned to them as praise, arguments, or attacks. Even the simplest of social organizations is a literal maze of performance feedback loops. In a smooth, well-functioning organization, the majority of these control systems guides the organization in a single direction toward a compatible set of standards. Other organizations display a high degree of schizophrenia as their members respond to discordant error signals. This schizophrenia is not uncommon in large government organizations that receive signals from the Congress, the current administration, constituencies developed under past administrations, bureaucratic superiors, and so on.
If the evaluators are to produce results that are used, rather than filed, it is essential that they recognize and understand the important feedback systems operating simultaneously in the organizations being evaluated. However, before attempting to cope with these multiple loops, it is necessary to understand how a single loop operates.
Therefore, the remainder of this chapter describes a single performance feedback system as it operates in the simple, mechanistic milieu of a home heating system. This illustration is used because it is both relatively straightforward and familiar to most people. Note though, that this example is not an accurate model of a complex social system. It is offered only as a useful first step toward understanding the more-complicated phenomena.
Operation of a Simple, Mechanistic Feedback System
A home heating system is an illustration of a feedback system managed by information based upon measurement and comparison. Figure 2-1 is a schematic (following the paradigm of purposeful behavior presented in chapter 1) of the operation of a home heating system. The figure displays some of the essential operations that take place.

The those-in-charge domain occupies the upper left-hand corner of the figure. Even in as simple an operation as a home heating system, this domain is conceptually complex, and we expand on it in detail later in the chapter. Suffice it to say here that, in a home heating system, the control mechanism that issues off/on instructions for the furnace is located in the domain of those in charge.
The lower left-hand corner of figure 2-1 represents the direct-intervention domain. It contains an oil tank, a furnace for burning the oil, a circulating system for moving hot water to a radiator, and a radiator that radiates the heat. Also in the direct-intervention domainbut not of itis a temperature sensor that measures room temperature and reports the data to the evaluation domain.
The operation of the direct-intervention domain normally follows a well-established routine governed by predetermined rules. On receiving a signal from the administration, the furnace turns on and burns the oil, turning it into hot gas; the hot gas heats the water; the heated water circulates through the radiator; the radiator radiates the heat into the room; the house gets warm. On receiving another signal from the administration, the furnace turns off. The purpose of this activity is to keep the house at a comfortable temperature. To this end, the furnace receives its stop/go directives from those in charge and proceeds with its own established implementation routines.
An important thing to note about the direct-intervention domain is that, in this case, it has no means of evaluating its own performance. In the absence of the temperature sensor, comparisons, and the feedback of the resulting information (if, say, the automatic administration had been programmed to turn the furnace on and off in half-hour cycles), the furnace would simply go on mindlessly repeating the sequence of operation that it knows best each time it received a go signal. It would do so if there were icicles in the living room, and it would do so if the house were on fire.
The additional elements that enable purposeful behavior to take place are in the evaluation domain, which is illustrated in the remaining portion of figure 2-1.
The evaluation domain is linked to the those-in-charge domain through information about whether expectations are being met. The temperature sensor (an evaluator) measures the temperature of the living room, and the thermostat (another evaluator) makes comparisons between actual and expected living-room temperature. The those-in-charge domain receives the results. The sensor measures the most relevant output of the furnace to the inhabitantnamely, the temperature of the living room. The temperature of the room is evaluated by the thermostat by comparing the present room temperature to the expected temperature set by a higher level of policy management.
Because most house-temperature management is not really concerned with exact temperature but only with a range of acceptable variations, the evaluation comparison is not reported to the administration unless the heat level observed by the sensor exceeds the limits of permissible error. When the room temperature hits the upper boundary of the permissible range, the evaluation comparison signal is transmitted to the administration and the furnace is turned off. When the room temperature hits the lower boundary of the range, the furnace is turned on. The room temperature measured by the thermostat is also displayed (reported) to those in charge on an agreed-upon scale divided into well-defined, well-known units (for example, degrees centigrade or Fahrenheit). This display of numerical temperature measurement (shown on the thermostat) is not necessary for proper operation, however, since administrative control of the furnace operates from the error signal generated by comparing the temperature measured by the sensor with the expectation set on the thermostat. This comparison would still be made and signaling would still occur even if the printed degree scale came loose and fell off. The degree scale is there simply for the recordto provide an indication that the room temperature has been requested to be somewhere in the vicinity of the temperature shown.
All of the elements of purposeful behavior are present in the case of the furnace:
What is described here is one very simple arrangement that accomplishes purposeful behavior by the use of these four elements. The thermostat in a home does not have to be very accurate because its actual setting can be controlled by an additional feedback loop. Figure 2-2 shows a policy function added in the those-in-charge domain. The policymakers determine the proper temperature in the room and announce it by setting (or resetting) the comparison standard (the thermostat).

As in figure 2-1, policy directives concerning room temperature are implemented by the heating system. The furnace is located in the direct-intervention domain. An evaluation system and an automatically implemented set of administrative procedures in the those-in-charge domain are also present. This purposeful-behavior loop has a clear, if implicit, charge: The furnace is to be used to keep the house at a comfortable temperature.
Those in charge engage in certain activities. Some in-house activities may have nothing to do with the actual furnace operationfor example, painting the furnace blue or building a redwood box to hold the thermostat. There will certainly be some ancillary activitiesfor example, ordering fuel and obtaining storm windows. Also, some oversight of the actual process of heating a home will occur. For the most part, however, heating of the home will still take place through the automatic administration or predetermined rulesthat is, once the furnace has been installed and a comparison standard decided (a major policy decision communicated to the purposeful system through the device of setting the thermostat), the operation proceeds automatically so long as nothing unexpected happens. The furnace goes on and off and heats the room according to predetermined rules in response to directions based on information about the temperature in the house. Only when the manager discovers that something out of the ordinary is happening do managersas distinct from administratorsdo something.
If the house gets too cold or too warm for comfort, management checks the setting on the thermostat. If the thermostat is indeed set to reflect the predetermined wishes of management, then the actual temperature of the room is checked. If the temperature is within the normal range of error, management then decides that the predetermined standard is wrong (a policy decision) and (by raising or lowering the setting as desired) directs the system to use a new comparison standard. The administration will then act on error signals from the comparisons made against the new standard and raise or lower the temperature of the room accordingly.
Thus, if management should decide that the predetermined standard is wrongif the house is usually either too hot or too coldmanagement will reset the comparison position of the thermostat, shifting the entire band of comparison up or down.
If, on checking, it is determined that the standards do represent a comfortable temperature level but that the furnace is not approaching the standardthat is, the temperature is not within the normal range of errorthen management does a quick check to make sure that nothing has interfered with the evaluation/ administration/process loop, such as no oil in the furnace or a cold draft blowing through an open window onto the sensor. If nothing easily correctable is apparent, management then issues a direct order to the administration to alter the furnace s behavior forthwith. The order might be transmitted via the on/off switch. The furnace would then be closely managed (that is, management manually operates the on/off switch) until repairs are made and automatic administration can resume. During the period of close management, the managers, by doing their own measurement, comparisons, and actions, have replaced (or become) the loop that controls the furnace. An important point to note is that if management is to effectively control the furnace by using the on/off switch, the thermostat must be disconnected (that is, automatic administration must be stopped). Picture what could happen if the automatic system was keeping the house too cold. After verifying the thermostat setting, the management response might be to go to direct control. in that event, management would flip the switch to on, activating the furnace. However, the furnace would still be turned off when the room temperature reached the limit acceptable to the sensor. This may happen long before management is satisfied with the temperature.
This superficial description of the management of a home heating system is sufficient to illustrate an important concept. The those-in-charge, direct-intervention, and evaluation domains are linked together to form a stable feedback system. An overlapping control loopa policy-control loop that changes the comparison standardcan be activated when those in charge want to alter the overall outcome of the process. The alteration in expected outcome can be effectively accomplished, however, only if management recognizes, understands, and manipulates the basic operating loop on its own terms. This can be done by:
Taking advantage of the stable loop and altering performance through policyfor example, changing the comparison standard that governs the furnace s activities and letting the existing measurement, comparison, and response arrangement produce the new conditions;
Entering the existing loop and directly controlling the actions of the administrationfor example, disconnecting the thermostat and manually operating the on/off switch.
Management may also have to deal with problems beyond the control or competence of the purposeful administrative system by:
Meeting necessary operating conditionsfor example, putting oil in the tank;
Removing noise from the feedback loop or systemfor example, closing the windows.
It is unlikely that behavior will be altered, as desired, if the existence and operation of the operating feedback loop is ignored. For instance, painting the furnace white instead of blue will not make the room get cooler, nor will manually operating the on/off switch be a satisfactory long-term solution. One way or another, management must deal with even a single automatic-feedback loop on its own terms or else redesign it.
Obviously, a home heating system could be operated in a number of different ways. It is conceivable, for instance, that management would demand a virtually absolute standard with an imperceptible error range. For instance, management may be trying to comply with a fuel-saving policy of maintaining a temperature of exactly 65°. It is possible to meet an absolute standard. However, the pleasure derived from such pinpoint measurement is not normally deemed -sufficient to compensate for the pain in the pocketbook. Usually, the more accurate the control instrument, the more expensive it is. That is, of course, also true in organizations.
Alternatively, the system could be operated as an open loop (no feedback) for example, by allowing it to burn only so much oil every hour or by turning it on and off in predetermined time cycles. This would eliminate the feature of controlling from the sensed error between room temperature and a standard. This closely resembles running a government agency by attempting to control its allocations and expenditures. More managerial oversight also could be used (another common solution), and management could sit by and control the furnace directly at all times.
The actual system used in the home has evolved over time, however, and has been found to be essentially satisfactory, reliable, economical, and unobtrusive.
Why Is Life Like a Furnace?
In many respects, life is not like a furnace. Such a mechanical explanation of the behavior of organizations and the people in them is not only repugnant but also oversimplified to the point of absurdity. Even when a governmental manager has the wit to check for noise in one of the program's feedback loops, it is likely that the noise is generated by yet another loop (like an angry congressional appropriations committee, a school board or even by citizens). Administrators do not always run through their routines regardless of ice in the kitchen or fire in the basement (it only seems that way). Direct intervenors often can sense for themselves when something is wrong with the operation. What is more, they sometimes tell those in charge about it.
Despite the differences, however, certain fundamental similarities exist between a furnace and each of the many feedback loops in a complex social organization. Primarily, feedback-loop behavior is pervasive, and the tendency to control and operate on some kinds of error signals is common. While the rhetoric of social organizations often tends to be obfuscatorysometimes deliberately somany basic loops and comparisons are almost always there.
Most midlevel bureaucrats respond to many feedback loops whose signals, comparison standards, and even directions may change. An important step is to determine what the existing formal and informal organizational control systems are whenever a new evaluation-design problem is to be approached. Somewhere, someone is doing something. Somewhere up above someone else can tinker with the system or with the standards of comparison. Somehow information created from measurements and comparisons (formal or informal, true or false, useful or irrelevant) gets back to the people who can tinker. If the evaluator can identify the loops and their elements at the beginning of an attempted new design, then the evaluator has at the very least identified many of the people who have important questions that will need to be answered, beliefs that will need to be examined, and process information that will need to be obtained. It should then be possible, for instance, to avoid wasting time and money to determine the program-evaluation equivalent of what color to paint the furnace.
Established information loops and behavior will often exhibit an amazing stability in the face of additional information unless that information can be inserted through an existing accepted feedback loop or accepted new loops can be created. In constructing a new feedback system, remember that four conditions are necessary for any working loop; accepted expectation standards, measurement, comparison of expectation with performance, and a willingness and ability to act on the resulting information.
The identification, use, and (when necessary), creation of feedback loops is usually regarded as beyond the purview of the evaluation team. However, our observation has been that if the evaluation team does not include these organization-analysis tasks in its work, then no one will do it. The evaluator should either plan to be involved in such work or to expect much of the evaluation work to be wasted.[3]
Many examples of simple feedback loops could have been chosen to illustrate the operation. The furnace was used because most people intuitively know how it works and because it clearly illustrates several properties of research design: the selection of a characteristic to measure, measures and a measurement instrument, the point at which measurement is to be made, the comparison to be made, the error from the comparison as a signal for action, and the range of desired accuracy. it is interesting to note how the presence of real-time feedback about a desired or expected value of a characteristic simplifies the gathering of data, the production of information, and ultimately, attaining expectations. It is also interesting to note that the real-time feedback system is concerned with the most relevant expectation (comfort) and that the administrative routines are based on a thorough knowledge of how the system works. For instance, when the house is too hot, the furnace is turned offthe oil is not drained from the tank.
It is important to understand the actual intervention being made and to have models equivalent to reality when designing any purposeful behavior system. In our furnace example, suppose that the furnace is imagined to be part of a large social program. An evaluation team might be sent out to analyze the reasons for success in the furnace program and to find out what makes it work. In the absence of real knowledge of how the furnace works, the evaluators might devise an evaluation plan that required them to synchronize their watches and to take simultaneous observations in both the living room and the other rooms of the house (including the basement). They would soon find the most obvious fact that, at the same time the temperature drops to its minimum in the living room, a noise starts in the basement. The noise continues during the period of rising temperature in the room. The noise stops at about the same time that the rise in temperature in the room stops. The temperature in the room goes through a period of decay during which there is no noise, and then the cycle repeats itself. The correlations of the evaluators data would be very high (if their measurements are reasonably accurate), and after applying some complicated mathematical techniques, they might come to a very important conclusion: the noise should be tape recorded, hi-fifidelity equipment should be purchased (considerably cheaper than furnaces) and installed in the homes of poor people throughout the country, and recordings of the noise should be played back loudly during cold weather. No one familiar with the structure and process of heating or of furnaces (for instance, a furnace repairman) would ever arrive at this conclusion. If the analysis had been kept in the language of the intervention being madethe operation of a furnace in a home heating systemsuch a mistake would be virtually impossible. Yet real-world analogies in social programs, the economy, and various forms of regulation are upon us and cannot be avoided.
The furnace example has some analogies in the program world, and figure 2-3 is a first attempt at describing a modern governmental program in the same way.

In figure 2-3, we have replaced each domain with the analogous parts of a social program. The figure shows how a social program fits a skeleton diagram comparable to the skeleton of a home heating system. In subsequent chapters, the flesh is added to the bones. Again, only one feedback loop is shown where, in practice, many will exist.
Notes
1. Herbert A. Simon,
Administrative Behavior, 2d ed. (New York: Free Press, 1957), p. 85.
2. Chester I. Barnard, The
Functions of the Executive (Cambridge, Mass.:Harvard University Press,
1938).
3. See the advice to evaluators
inside an agency in Pamela Horst, Joe Nay, John Scanlon, and Joseph Wholey,
"Program Management and the Federal Evaluator," American Society for Public
Administration, appearing in Public Administration Review, July/August
1974, pp. 300308.
Part 2Describing the Universe:Models and Measurement |
During the past fifteen to twenty years, the terms models and measurement have come to mean many different things to many different people. In Part II, we describe what we mean by the terms.