How much testing is too much testing?

We have built our entire education system on standardized testing. It would seem like we should have a lot of the details for testing figured out. However (and without going into too much of the messy details) there’s a lot we just don’t quite understand. in particular I can’t find any research beyond basic surveys that tries to assess what the optimal testing schedule is. The best I can find is some pretty vague surveys (summarized in this article) that basically just say that teachers, students, and parents think there is too much testing. To be honest, that really doesn’t say anything.

For instance, my school network gives a quarterly ACT along with semester-end midterms. By the end of the year the students will go through about one full month of dedicated testing: one week for state testing, two weeks for end of semester testing, and 4 full days of ACT testing on top of 90-minute long tests about every 5 weeks for 6 different courses. We take the scores very seriously and we adjust our classes accordingly. This seems better than a lot of the lazy alternatives that we see in traditional public schools, but it feels like it might be too much.

I feel like it can be very easy to overfit and overreact to the data. Figuratively, you can think of our classrooms as self-correcting models based on student data. We get new data and adjust our models accordingly. But when we test so frequently that student testing fatigue starts to possibly take effect, then the test scores are likely to become increasingly poor indicators of student understanding. So the integration of this new information may not even be useful to the revision of the classes, it might just be an overreaction to faulty testing. On top of all of that, teachers are taking in so much new information that we are unlikely to be able to actually integrate it all into our classrooms.

In sum, there seems to be a real point of diminishing (or maybe even decreasing) returns from standardized testing, but we don’t really know where that point is. SO, if I am missing the research on this please comment the links, OR give your ideas on how this could be measured in the current data.

Edit (7 Feb. 2020):

A comment rightly pointed out that there won’t be one optimal testing schedule for all situations. That’s almost certainly right and even if we were to have a literature on this that I just missed then it would likely be a bad interpretation of that literature to think that there are certain essential characteristics to our current educational systems that would necessitate one testing schedule to rule them all.

However, I don’t think we should give up on the question because it doesn’t have one clear answer, instead it’s probably best worth thinking about it in terms of some general scheduling interval that we should try to contort ourselves to — at least intuitively testing students every 5 years is way too long and every week would be way too short, so where do we meet in the middle?

Still unsure of how to test this. One thought is to compare schools with different testing intervals and see whether the increased intervals led to increased student performance. There are a lot of obvious issues to this. No schools are likely to organically use a super high-frequency testing cycle, you’d likely only really be able to test between schools that quarterly, semesterly, or yearly. Maybe you could look at just general testing time, but I’m most concerned about how quickly schools can and should integrate testing data into the classroom. Most schools don’t even do this at all. On top of all this it would be unbelievably difficult to control for all the different confounding variables across schools that could otherwise explain the results. This doesn’t seem like it could be well answered by observational studies alone.

One possibility is that you could convince a large charter network like BASIS, KIPP, Achievement First, or Uncommon to pair up with some economists to create an AB test across the schools in their network. Some schools test and iterate yearly, some semesterly, some quarterly, some maybe even monthly. After some reasonable interval you could come back and see which interval was most effective and implement across the networks. However, schools don’t like to think of themselves as a research treatment group and the time interval for this would likely have to be longer than a year. It’d likely be very difficult to implement and that’s all before considering all the controls required to effectively test this.

originally at

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s