The (mis?)Measurement of Academic Units

It is always interesting and surprising to me how experiences from the distant pass rise in significance while grappling with the challenges of the present.  Over thirty years ago as a high school student I received as a gift from my father a paperback copy of Steven J. Gould’s book The Mismeasure of Man.  This book, perhaps Gould’s most densely argued and quantitative offering for the general public, addresses the now largely outdated concept of establishing measures of intelligence either through direct physiological measurements such as craniometry or psychological measures such as the intelligence quotient (IQ) test.  Gould’s argument, which received significant criticism over the years, was that efforts to establish a single measure of intelligence were in fact examples of the fallacy of misplaced concreteness.  The mathematician and philosopher Alfred North Whitehead defined misplaced concreteness as the “error of mistaking the abstract for the concrete”.

From Whitehead’s analysis of the concept of absolute position, to Gould’s attack on notions of a racial basis of intelligence, I have begun to see parallels in the passion the American higher education enterprise has developed for the use of “metrics” as essential tools for academic administration.

Why Metrics?

Throughout the twenty-first century American higher education has increasingly embraced quantitative analysis as an important, perhaps even essential, aspect of administrative practice.  The use of various metrics to measure academic productivity and efficiency is aligned with the demands of those most vocal university stake-holders for greater transparency and accountability.  As such, metrics are now deeply integrated into the style of academic administration known as “data-driven decision making”.

For what purpose are quantitative measures calculated and analyzed by academic administrators?  While nearly any aspect of university activity can be subject to numerical measurement, a handful of usages are most commonly made of metrics.  If an administrator wishes to evaluate the status of an department or program at a moment in time metrics such as total credit hours delivered or number of majors can be used to provide a “snapshot” view.  Variation in such metrics over time can provide an historical, or as it is termed longitudinal, record of change.  Metrics can be used as part of the program review process to compare the status of programs to peers at other universities, or to compare departments within a college across the university.  In all cases, administrators are expected to make use of metrics when reaching decisions regarding resource allocation as well as the even more significant challenges associated with establishing the short- and long-term viability of programs.  Metrics are now an essential part of all strategic planning efforts as well as a standard part of the institutional response to continuing accreditation.

When used well, metrics provide valid and useful information to the academic leader.  Their use, however, is not without risk.  To guard against committing the fallacy of misplaced concreteness it is essential to keep in mind that an academic unit is more than a collection of quantitative measurements.  Defining academic quality in terms of efficiency or productivity alone presents an over simplification of purpose.  As such, it is appropriate to consider some of the challenges associated with the collection and analysis of quantitative academic data.

The Challenges of Data Collection and Analysis

The collection, verification, and analysis of academic data are difficult, complex, and very time consuming tasks that require the active participation many administrative and support functions within a university.  Universities are awash in data, the use of student information systems integrated with enterprise level software make it possible to assemble large sets of data and to develop complex multivariate analyses.  Therefore it is essential that the collection of data be defined by, aligned with, and used for specific purposes.  Transforming data into usable information, and that information into knowledge cannot be achieved haphazardly.  The notion that understanding will emerge from a mass of unstructured data is fanciful at best.

A second challenge to the collection of quantitative information is the assurance of data accuracy.  In the absence of complete integration of databases it is possible, perhaps even highly likely, that significant data discrepancies will exist.  Not only do discrepancies lead to inaccuracies in calculations and errors in analysis, even minor problems with accuracy undermine the interpretations and conclusions drawn by administrators from the flawed data.  Issues with validity become compounded when efforts are made to make comparisons between departments or within a single department over time.  Can the academic administrators document that the processes of data collection, the definitions and procedures used, are consistent from department to department?  A metric as fundamental as the number of majors in an academic program becomes clouded by issues such as how double majors are counted or how individual departments deal with inactive majors.

Even when the data available are both complete and demonstrably valid there remain a wide range of barriers to the accurate and meaningful analysis of metrics.  The first and most significant challenge is the need to establish a complete understanding between the available metrics and the academic concerns under consideration.  When used well, academic metrics allow administrative hypotheses to be tested.  For instance, “student progress towards graduation is better in department X than in department Y.”  A variety of metrics could be used to test this hypothesis.  Yet even when the data are of high quality, and the metrics used are applicable to the hypothesis, there remain dangers of interpretation.  It would be foolish indeed to conclude that the faculty in department X were better, more concerned, more dedicated teachers than those in Y solely on the basis of student progress towards graduation.  The interrelated processes of teaching and learning are far too complex for such a simple interpretation.

One commonly held concern of faculty is that metrics are used not to test hypotheses but rather are developed, measured, and analyzed by administrators to support a predetermined position or conclusion.  Because of the wide variety of measurements that can be made of academic programs and departments, it is important to ensure there is a clear relationship between the metrics considered, the mission of the unit, and the larger goals of the college or university.  Few departments will be strongly positioned with respect to all metrics an administrator could consider.  As such, it is essential to maintain an open dialogue between faculty and administrators as metrics are identified and analyzed.  By being clear, from the beginning, with regard to the administrative questions to be asked, the underlying reasons for those questions, and the metrics most suited to those questions, much of the resistance to the use of metrics can be addressed and the fear and distrust that can develop between faculty and administrators will be avoided.

The Use of Direct Metrics and Metric Ratios

It is now common practice for administrators to identify academic metrics, measure them, and then combine those measurements with other metrics to produce metric ratios.  However, using directly measured metric values is the simplest approach to the analysis of academic programs.  Direct metrics such as number of majors, total credit hours taught, or number of degrees awarded are both easy to calculate and easy to understand.  The lack of ambiguity in these metrics leads to confidence in their interpretation.  There is a risk, however, that the use of direct metrics alone can overvalue unit size.  That is, is a larger department in any way “better” than a smaller one on the basis of number of majors or credit hours generated?  Using direct metrics can suggest that bigger is always better and this certainly is not true – quality is very significant.

Metric ratios such as student credit hours per faculty full time equivalent (FTE) are very useful in efforts to evaluate academic efficiency or productivity.  Given that the ratio is calculated from the values of two metrics, interpreting changes in that ratio over time can present challenges.  In the case of a simple ratio metric such as degrees granted divided by number of majors in a program, changes in the ratio over time could be complex since there is a necessary lag between when an increase or decrease in new majors occurs and when it is reflected in the number of degrees awarded.  Making administrative decisions based upon changes in the ratio metric, without understanding the origin of changes in the direct metrics used in the calculation, could result in taking actions that are ultimately detrimental to the department and contrary to the best interests of the college or university.  While ratios are highly useful in the comparison of departments and are also well-suited to assessing programmatic efficiency or productivity, interpretation in the absence of a complete understanding of the origin of observed changes can be misleading.

The Most Useful Metrics

As described above, a wide variety of metrics can be collected and utilized in the analysis of academic programs.  Those that are most useful and significant vary from department to department, from college to college, and from university to university.  Likewise, no single set of metrics is ideally suited for all administrative needs.  As the goals and objectives of an institution change over time the metrics most vital to its success will naturally also change.  None-the-less, I believe there is a group of metrics that are of broad general utility and are of particular value at the present time to my institution, Indiana University-Purdue University Fort Wayne (IPFW).  In the following I list and briefly describe the academic metrics I believe are most significant.

Number of Students Graduating Each Year – Nationally, regionally, and locally, critics of higher education increasingly stress the economic significance of improving college completion.  Typically reported in terms of four- or six-year graduation rates, enhancing college completion is of primary concern for IPFW as well as for all but the most selective and elite institutions of higher learning.  However, I feel strongly that the ratio metric graduation rate (expressed as the percentage of first time, full time students completing their degree at a single institution in four or six years) is ill-suited to measure success in an era that supports and encourages student mobility.  Rather, the direct measure of the number of students graduating each year is the best metric of student success and is, I strongly believe, more closely aligned with the goal of economic impact stressed by many external stakeholders.

Ratio of Graduating Students to Number of Majors – This metric was introduced briefly above and like all ratio metrics must be considered within the context of variation in both the numerator (number of graduates) and the denominator (number of majors).  Despite the inherent challenges of interpreting a ratio metric, I believe this is a useful general measure of programmatic health and is well-suited as a basis for comparisons between various departments and programs.  Interestingly, the value of the metric can be inverted to provide a measure of the average time to graduation for majors in that program.

Total Credit Hours Taught in an Academic Term – This direct metric is a measure of gross revenue generation.  While this metric might be considered to be too closely aligned with the so-called neo-liberal agenda that is gaining influence in higher education, it must be remembered that nearly all colleges and universities are highly dependent on tuition revenue.  As state support for public higher education has declined over the past several decades institutions such as IPFW have become increasingly dependent on the cash flow provided by tuition revenue.  While variation among departments and colleges is to be expected at a comprehensive university, tuition revenue is of such paramount importance that this critical metric cannot be discounted.

Credit Hours Taken by Majors – While this metric might be thought to be closely aligned with Total Credit Hours Taught, it is in fact significantly different for many programs.  Professional programs with highly structured or cohort based curricula typical make limited contributions to the general education program of their institution.  Conversely their majors, in addition to the courses taken within the department, take a significant number of credit hours outside of the department.  The small size of some highly successful programs might make them targets for reduction or elimination during periods of financial crisis.  Failing to consider the total credit hour contributions of majors in such a department would undervalue the significance of the program to the economic health of the university.

Tenure Stream Instructional FTE Divided by Number of Faculty – This metric provides information on the relative efficiency of a department’s use of its tenured and tenure track faculty towards the core departmental mission of student instruction.  If we accept the broadly held assumption that students at every level of instruction benefit from interaction with full time faculty, it is entirely reasonable to evaluate how departments are allocating the workload of their faculty.  At IPFW there is a tendency for faculty workload to gradually shift from the essential tasks of teaching and research to a wide variety of useful but perhaps non-essential administrative tasks.  This metric provides a way to track changes in workload over time, to make useful comparisons between departments, and to evaluate the impact of administrative reassignments on instructional capacity.

Percentage of Majors Graduating or Retained from One Year to the Next – Unlike the ratio metrics of graduation rate or graduates divided by number of majors, this metric provides an annual measure of student success.  In the ideal scenario this metric would be 100%.  Of course, students change majors, they transfer from one institution to another and they drop out or stop out for an infinite number of personal and financial reasons.  Of course, some students are academically unsuccessful and are dismissed from the university.  The fraction of students that are successful, who either graduate or persist from one academic year to the next, is perhaps the best measure of student success.  While it must be recognized that students are often not retained for reasons that fall far beyond a department’s ability to influence, this is a metric that can be the basis of comparison and the starting point for positive discussions of the origins and barriers to student success.

Student Success in Key Courses – This metric narrows the administrative focus from the level of the department or degree program to the most granular level of review, the individual course.  Student success should not, I believe, be strictly interpreted as the distribution of grades earned.  Rather, success should be understood in terms of those grades that allow progress towards graduation (A, B, and C) and those that do not (D, F, and W).  Placing too much emphasis on grades earned can lead to the undesirable outcome of inadvertent or purposeful grade inflation.  Administrators must recognize the vulnerability junior faculty may feel with regard to institutional efforts to improve student success.  As such, I believe focusing on the percentage of students who either withdraw from the course or who earn a grade of F due to failing to complete the course are more useful measures.  The categories of courses for which this metric should be calculated include (but need not be limited to) high enrolling general education courses, courses that serve as gateways or pinch-points within the major, and capstone or other critical senior level courses.

Percentage of Credit Hours by Mode of Delivery – The landscape of higher education is vastly more complex and more competitive than it was ten years ago.  This metric allows for evaluation of departmental response to some of those environmental challenges.  At IPFW the primary modes of instructional delivery are traditional face-to-face lab and lecture courses, distance and distributed courses taught off site, online, or through some hybrid arrangement, and the significantly expanded program of school-based instruction through dual credit high school and college courses.  While the distribution of departmental instruction across each of these three modes that is appropriate or desirable must necessarily vary from program to program, it is critical that administrators have at hand data on both the current distribution and historical trends in the modes of instructional delivery.

Cautionary Thoughts

In order to summarize this discussion of academic metrics it is worth returning to Whitehead and Gould.  The use of metrics in academic administrative decision making can provide important insights and valuable understanding of institutional complexity.  There exists a very real risk, however, that we can fall victim to the fallacy of misplaced concreteness when becoming overly confident in our ability to interpret the meaning of academic metrics or overly reliant upon quantitative data in the exercise of administrative authority.

In order to avoid those errors, it is useful to keep in mind several cautionary considerations related to the collection, analysis, and interpretation of academic metrics.  First, no single metric is best suited to the analysis of all departments.  In a comprehensive university or even within the curricular complexity of a large college of Arts & Sciences, the intrinsic strengths and weaknesses of each department and program must be evaluated independently.  Force fitting all departments to a single measurement methodology will lead to significant misunderstanding of the programs under consideration as well as elevated levels of dissatisfaction and distrust among faculty.  Second, metrics provide data, not knowledge.  Moving beyond raw data to an understanding based on the interpretation of numerical data requires careful consideration informed by experience and qualitative understanding of the programs under review.  Third, metrics should be used to frame a policy discussion not to validate a policy decision previously reached.  Similarly, the collection and analysis of quantitative academic data must include opportunities for vetting of the raw data, for responses to the methodology of analysis, and a period for remonstrance to the conclusions reached by administrators based upon those metrics.  Failure to follow these steps will naturally result in a lack of understanding and acceptance of the policy decisions that follow.  Finally, after going through the complex and difficult process of establishing a comprehensive set of academic metrics, resource allocation must be tied directly to unit performance measured by those metrics.

Sapere Aude ~


6 thoughts on “The (mis?)Measurement of Academic Units

  1. Dean Drummond’s observations are welcome. As a scientist in a data-driven discipline I find quantitative analysis of any problem desirable and necessary. I am especially appreciative of the Dean’s cautionary notes, and would like to expand upon these thoughts.

    “First, no single metric is best suited to the analysis of all departments.” Departments have an organic, living quality that cannot always be captured in numbers. Each has a unique set of problems and challenges. Without intimate knowledge of these challenges it is impossible to fully utilize the numbers that seem to be so easy to generate in the current environment. There is also a fatiguing-quality to the metrics-producing environment prevalent on campus. These numbers have high-consequence implications but it’s not often clear how to respond to make the numbers better. When jobs are on the line, the digestion of so much data can become stressful and can sometimes be allowed to settle into the background noise of our lives, if for no other reason than self preservation of our sanity. Too much data isn’t always a good thing.

    “Second, metrics provide data, not knowledge.” It seems at times we create numbers just because we can. Enough numbers will eventually produce a correlation, whether or not related to a causation. It’s easy to fall into the trap where reams of data become a substitute for thinking. Metrics also fail to capture the nuances that help define the campus, and don’t necessarily provide guidance in how to solve the root causes of problems. As an example, a 100-level geology section of a general education course taught hybrid with a 4:30 lab is enrolled with only 8 students this semester. Why? The metrics tell me the section is underenrolled- but why? Is it, as I suspect, due to the changing activity patterns on campus in the afternoon? How do I test that? If I cancel this section in the future I remove a low-enrolling course (and have therefore done what the administration expects me to do), but we have not solved or even addressed a basic issue that may be causing other problems for the campus. That’s how we get into a spiral and that to me is frustrating in the extreme.

    “Third, metrics should be used to frame a policy discussion not to validate a policy decision previously reached…. must include opportunities for vetting of the raw data, for responses to the methodology…”. This week I was asked to justify low enrollments in seven geology sections. One was the aforementioned 100-level hybrid section, one was a very specialized upper-level section with 10 students, and the remaining courses were parts of 3 pairs of cross-listed sections, taught in the same room, at the same time by the same prof and were not, in fact, even close to being under-enrolled. While it was easy to explain this non-problem, it was quite disturbing that somewhere in the administrative chain someone looked at Geo Department enrollments and came to the erroneous conclusion that these sections were underenrolled. It shouldn’t be that hard to recognize cross-listed sections, yet this simple nuance befuddled the bosses. It makes me wonder what else is being wrongly evaluated. While I appreciate that I had a chance to correct the record, the fact is that some things shouldn’t need correction. I worry about what else is being misevaluated, and lessens my faith in our administrators to get to the real problems of the campus.

    Three additional points, the first made in an email I sent earlier this week.

    1. It seems we define ‘bad’ in the simplest of terms equaling ‘smaller class’ or ‘smaller program’. These are symptoms, not problems. Perhaps there is value in a more anecdotal approach. Instead of raw numbers, do we ever just observe the way the environment around us works? I see faculty, staff and most importantly students who struggle with new systems for billing, purchasing, time cards, travel, finding courses in a catalog, finding schedules, creating schedules, documenting financial aid, documenting work study, evaluating progression towards a major, etc. It’s overwhelming and must have an impact on productivity at all levels. These are not captured in the normal set of metrics.

    2. Metrics sometimes seem to be an attempt by central administrators to find micro scale (course-level) fixes, as a response to symptoms (enrollments), caused by a macro-scale disease. If the only actionable response to metrics is to cut ‘underperforming’ courses and programs without addressing more general, campus-wide, problems then we will surely spiral-in to a very bad landing.

    3. Is it the job of students to determine the curricular offerings of a university, or is it the job of the university to guide the curricular and programmatic priorities of the campus and our students? Sales volume seems a good way to set the buggy-whip production for any given year, but is sales volume a good way to determine the types of courses and programs we offer at IPFW? A metrics-driven program of evaluation closely-coupled with a course- and program-cutting response (which is where we seem to be heading) is a sales-oriented model and cedes fundamental programmatic control to our students. I don’t argue with the notion that demand should have an influence on the flow of resources, but I loudly protest that student demand should be the only criteria brought to bear.

  2. I have to disagree with the idea that there are some programs that cannot teach to all students on the campus. I think that idea has led to some bunker/silo mentality. I would contend that it is critical that programs have some significant degree of openness to all students on campus. Failure to do so is to not be part of campus community.

  3. I would like to add two additional comments to my list of three observations.

    4. It sometimes seems departments (and presumably colleges) are evaluated only by the metric that reflects the greatest weakness of the unit. While we all want to identify and strengthen our weaknesses, I think it fair to assume that everyone also wants to be evaluated by the totality of their contributions to the University. Metrics should not be perceived as just the sharp end of a sword.

    5. It’s easy to use metrics to identify perceived weaknesses on campus and create numerical goals for their improvement. Last year, six-year graduation rates were of the highest priority, and academic units adopted policies that reflected this. Since resources of money and faculty effort are limited, shifting priorities probably weakened other aspects of operations. This year, the six-year rate seems to have disappeared as a high priority, replaced by other metric-inspired goals. I don’t think we solved the six-year problem and can only assume that we’ve shifted priority because some other number now looks worse. We need a plan supported by metrics to move forward, not a piecemeal reaction to some number or other that seems today to look worse than it should.

    • I agree with your points #4 and #5 entirely! I presently do not see where the first three observations are, but I will look for them, too.

  4. In reading this essay, I felt understood, a feeling that I rarely have these days. I did not expect to find my own sentiments/frustrations in an essay on (mis?)measuring academic units, but there they were. It was surprising that my feelings were presented as something many other people also may be experiencing at the moment. Given my own individualistic fantasies, I am caught off guard whenever I learn that I am taking part in a collective experience. Actually, the discovery of this collective experience is heartening, but I wish that it did not have to come from an essay but, rather, from a realization (in the moment!) that we are all working together.

    It seems that we at IPFW presently have the opportunity to begin an exchange of ideas on how to create a nurturing environment for our students. No doubt, the creation of such an environment will require/inspire a lot of collective effort, but, if results are our top priority, I must wonder why more of us are not actively seeking solutions together right now?

    Today I watched a TED talk by Ken Robinson (“How to Escape Education’s Death Valley”), and it has given me much to consider. It is only about 20 minutes long, but it suggests solutions that jibe with what I suspect are my own convictions developed out of experiences I have had in the classroom and in educational environments inside and outside of the United States. It strikes me as worth sharing here, because it matches the mood of this particular essay on (mis?)measuring academic units:

  5. I read the essay “The (mis?)Measurement of Academic Units” somehow as a forced response to the recent changes in external funding formula, which changed the rules of the game; as a result, ipfw has become #2 from the bottom—what an unflattering rank for us! In response to these critical changes that significantly reshape our institution’s financial present and future, some internal strategic planning and adjustments seem unavoidable, whether one likes it or not. if we, as an institution, wants to survive and move up from #2.

    I totally agree on the limitations of applying any single method to program review for all. It seems to me a combination of multiple methods–concrete, abstract, objective, quasi-objective, subjective, quasi-subjective, affective, performative will more accurately assess academic programs. One thing that has often puzzled me is we rarely take into account the quality of students’ educational experiences and the quality of faculty’s work experiences. Are they valid indicators of the strengths and weaknesses of academic programs? I certainly think so. How do we measure them? Some of the proposed new metrics seem useful for this purpose: student retention, graduation rate, student success, faculty retention, faculty success/productivity etc. Or even an account by students/faculty describing their educational/work experiences. That is to say Whitehead’s critique of misplaced concreteness could be taken further by incorporating his process theory, Henri Poincare’s non-deterministic theory that admits the lack of uniformity in reality, and Abraham Maslow’s humanistic theory of education that echoes Whitehead’s process theory.

    If the quality of an academic program is a key parameter in program review, as you persuasively argue, it only makes sense to also argue for aligning quality with resources (re)allocation when resources are rapidly shrinking, due to a new external funding formula. Otherwise the “fallacy” of privileging quantity over quality in program review (and faculty review) will remain uncorrected, and we’ll be stuck with being #2 for a long time. Some kind of shift in funding formula internally seems inevitable pressed, unfortunately, by the altered rules of the game, in the transition from a quantity-driven review frame to a quality-based review structure, as you logically argue.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s