Classroom Tech: the MIT-West Point Study, part 2: Interpretation

So, on May 26 I covered the MIT study on the West Point economics classes (totaling 800 students), and spent most of the time correcting a lot of errors
about West Point, of both fact and interpretation, in Kevin Gannon’s column and Jon Becker’s column. There were also points on which I agreed with them, including the opinion that the study’s results are not as significant as most people would make them out to be. Here I want to expand on how we should interpret the results of the study, in my case from the perspective of the humanities, and specifically from that of history.

Relevant portions here are “Results” (pages 16-25), “Conclusions” (pages 25-28), and Tables 3, 4, and 5 (pages 33-35).

On the evening of the 26th, an econ professor at West Point gave me some valuable critiques, and in the process of digesting them (what I call “writing out loud”), I think I solidified my interpretation of this study. What follows are my replies, expanded, revised, and corrected. Bottom Line Up Front: The study supports previous research that, for people possessing a certain kind of neuro-biology, handwriting notes is more effective than typing them. Yet the study did not test other kinds of learning with technology, and it demonstrated no real correlation between handwriting notes and arguing abstract concepts. Hence, the study’s significance is not as great as the 3-minute version would lead us to assume.

My correspondent made two very good points that should provoke a lot of thought: 1) Military academies are excellent places to do studies of this kind, because they randomly select and distribute students across all sections, in contrast to what you see in regular colleges where the population is mostly self-selecting.  2) When you compare “the coefficient [of computer users] to the average score [of non-computer users],” the difference is large enough that you could compare it to hiring a tutor. So, the negative results from electronic device usage are actually significant.

So, I went back to the study’s analysis of the results, and my thoughts ran as follows:

a) I would both agree and disagree regarding academies being the “perfect place”; agree b/c students are assigned randomly to classes, something that the study itself actually draws attention to. The study is very methodologically sound, which Gannon doesn’t get, though Becker does. Methodology matters, and they got the methodology right in a very important way.

I’d disagree in that, while I’ve already explained how West Point resembles civilian colleges in many ways, it cannot, by its very nature, replicate the diversity of individuals. You simply won’t get the same range of student as in a civilian school–whether elite private university or community college, and Table 1 (p31) notwithstanding. So, there IS a risk of extrapolation from USMA to the rest of academia, which should be borne in mind, but won’t because people will just read the headline.

b) Are the variations in student performance “marginal” or significant? There are several ways to interpret the results.

First (what in debate we would call “off case kritique”), bear in mind the mechanism used to “determine outcomes”: the final exam. The benefits of standardization still apply, and the authors took great care to factor in grading variations among instructors (pages 13-16). On the other hand, there is a growing chorus of voices critiquing the final exam, especially the in-class final, as an effective learning or assessment tool. Some of those critiques are legitimate, some are wide of the mark, others are simply trying to start a conversation–David Perry’s recent column is a decent read (I moved away from in-class finals this year, for a variety of reasons). So, while logical, assuming an in-class final exam is the best way of measuring semester-long learning for all students across the board is somewhat problematic when transferred to a civilian college setting.

But second (on case), consider what the figures actually mean. Standard deviation is a relative and contextual entity, but won’t be interpreted as such by the majority of people who read the newspaper article. For example, readers might notice that students in laptop/tablet classrooms had a deviation of -.23s on short answers, and then say “nobody did as well in short answers.” But that isn’t what the data actually say. So, a more accurate way of stating the results is that you’ve a greater likelihood of scoring lower on short answers, but that’s hardly guaranteed.

Third, the study is significant in one key area. Based on the coefficients under the “multiple choice” and “short answer”  I think the study does confirm the Mueller/Oppenheimer 2014 study (April version here; latest version under lockdown on Psychological Science) in that raw data retention seems unquestionably to be affected by mode of inscribing the data. Now, you can like this or not, I don’t really care–this is something Gannon and co. find a very hard pill to swallow, for a variety of reasons not least of which is political agenda. But a large, rigorously designed study has now provided additional evidence that, when it comes to note-taking, you tend to remember more data if you eschew keyboards.

Fourth, here is the key reason the study’s significance doesn’t really extend to much beyond data transcription: there was NO appreciable difference in performance on the essay portion of the exam (part D in Tables 3, 4, and 5). As in, statistically insignificant or non-existent deviations. Now, I’m not sure how they break down final exams in econ classes, but in history classes, both at West Point and in many other places that I’ve worked, the essays are weighted much higher than the multiple choice or short answers. So if the numbers don’t lie, that means that students with (on average) inferior data retention STILL did as well conceptually as their peers.


What does this mean? I think it can be explained in one of two ways. It either raises questions about the rigor of the controls for instructor bias (instructors grading electronic essays tend to give B’s where they’d give C+’s if it was hard copy), or it means that the metacognitive impact of computer use is negligible, which, from a humanities standpoint, REALLY minimizes the impact of the study. Especially if, as Gannon and others keep arguing, the point is concepts not content. Personally I think history is burdened by requiring a certain amount of content mastery, but a KEY result of the study is that computer use doesn’t impact your grasp of conceptual, abstract thought. And ultimately, I think Becker is right that this study ONLY tested one kind of learning, and not the kind of learning activities that HI301 and HI302 were/are designed for (heavy computer use in collaborative activities).

The triumphant “old school” response to the study is, therefore, rather misplaced. Do I ban all technology because a study geared to one kind of tech use indicates lower performance in data retention? Perhaps instead I can broaden my pedagogy for methods not accounted for in this study. After all, numerous studies charting other kinds of pedagogy and learning activities point to the benefits of electronic devices, such as iPads, and it suits the way I teach. But I still take notes by hand, and the MIT-West Point study confirms that that is more effective than typing if you can do it. And yes, I share that result with my students. But with caveats, because you can’t test everything. And neither could this study.

Moving on…