5 Effective Ways to Calibrate When Rating Student Work

Dr. Sydney Schaef
July 21st, 2024

Five practical and effective ways to help your team calibrate when rating student work using rubrics or success criteria.

Are you an instructional leader, department chair, or lead teacher trying to support more consistent assessment practices among your team–and specifically, more consistency in how student work is evaluated, scored, or rated? This post is for you.

Before we jump in, it’s worth stating: Subjectivity is par for the course. Let’s name it. Every human brings their own perspectives, experiences, biases, and understandings to the table. That said, we can mitigate bias and benefit from the best forms of subjectivity–multiple, diverse perspectives–when we do these three things:

Establish stable criteria. With high-quality assessment tools or rubrics in hand, we can use stable, consistent indicators as a reference for our rating and, more even more importantly, to guide feedback to learners as a way to help them grow. This is the beauty of competency-based learning skill progressions, because the indicators can support learning design, assessment design, and rapid rubric development.
Ensure multiple perspectives. Don’t go at it alone. Invite other views to support, challenge, and inform your decisions.
Create structured space and time for deliberate practice. We need practice to get better. We need time and space, dedicated to calibration, in order to improve and align as a community.

Hold these three things in mind as you review these five ways to help your team calibrate when rating student work. And we know you’re busy! So we’ve ordered these by complexity and lift. You might use one or more of these to help plan an upcoming professional development workshop or series, or you might introduce these as key methods across your faculty for calibrating when scoring student work samples.

1. Two Views

Photo of Two Teachers learning how to calibrate when rating student work in a competency-based learning environment

“Two Views” is a practice of ensuring, before a final decision is made, another person has the opportunity to review it and provide input on the decision. Importantly, this should be someone with relevant knowledge, and ideally, with a different point of view from yours.

In practice? Partner up with another teacher or coach and have them periodically, or frequently, rate the same piece of work, or a part of a piece of work, that you are rating. Use any variances as helpful signals that you might need to look closer at the work to ensure you can substantiate your rating.

Short on time? Choose the particular dimension of analysis or assessment that you feel most unsure of, and present your request for help in an open-ended (don’t lead the witness!) but targeted way.

“I’m unsure which level on the skill progression Liza meets, based on how she has introduced her argument in paragraph 1. Can you try rating this for me, and share your rating and rationale?”

2. Team Calibration Protocol

Practice, practice, practice. Another important way to calibrate when rating student work is simply to create recurring opportunities to practice. Bring your team together, provide them work samples or ask them to bring work samples, and then structure the process using a step-by-step protocol like this to guide your process.

3. Modeling & Debrief (Fishbowl)

Photo of Calibration Protocol with small group of teachers learning how to calibrate when rating student work together.

Modeling and Debrief is much like a “Fishbowl” activity: Have one or several teachers who have demonstrated strength in rating student work model the process. Their job is to “think out loud” as they review the student work sample, reference the scoring criteria, discuss their ratings and rationale, and make their determination.

Tips? Have them do it twice. Choose a piece of work that is fairly straightforward in scoring, and then choose another that seems more difficult to rate. Give everyone, including observers, the time to review the work and practice rating it on their own. Then, ensure sufficient time for the teachers in the Fishbowl to engage in meaningful discussion about the work, and model norms of appreciative inquiry (stay curious, stay positive) and evidence-based decision-making.

Finally, debrief the experience with the full faculty, using such guiding questions as:

What stood out to you about the discussion?
What did you agree with and why? What did you disagree with and why?
What did you appreciate about the process or the discussion?
What is one new insight or take-away from this experience?

4. Survey to Sample

Remember your Probability and Statistics class? When it’s not feasible to survey every person in a population, we use a sample–a smaller, representative part–to gather data. By analyzing the sample, we can generalize or make predictions about the entire population.

This is that logic, applied to calibrating faculty on how to rate student work using a stable set of criteria or indicators. Instead of evaluating every single piece of scored work, we’ll have faculty score one piece of work as a “sample” to better understand how aligned the team is, and to identify areas for further learning and calibration.

Here’s how it works:

Choose a sample piece of student work. Choose something you’d like everyone in your sample to rate, using a provided assessment tool, such has a rubric. As a reminder, in a competency-based system, our rubrics use indicators taken directly from our competency-based skill progressions.
Create a survey form. The form should allow teachers to input their ratings, and their rationale, for each dimension of assessment. In a competency-based model, for example, you might say, “Please score skills 1, 2, and 3 for the competency, Express Ideas.” You might provide a drop-down for the numerical rating, and an open text field for the explanatory response for each rating. Try to keep the survey simple and short so it won’t be time-intensive for the faculty to complete. You might add a question like, “What questions do you have about this rating process?”, but try to keep additional questions to a limit.
Administer the survey. Set a timeframe for completion, and communicate it in advance.
Analyze the results. A few questions you might ask as you’re reviewing the data:
- What was the average rating among the faculty who participated?
- What was the range of ratings?
- What do you think about the ratings? When you look at the work sample, which rating(s) do you believe are most in line with your rating guidelines or norms?
- How much variation was there, among the ratings?
- What themes emerged among the open-ended responses about the rationale?
- What questions came up for faculty?
Summarize the data and share the insights. People are generally so fascinated by original data describing groups they’re a part of! Don’t forget to share back the summary and/or insights for your faculty in a way that fosters curiosity, appreciation, and a desire to nurture alignment and the learning that’s needed to get there. Celebrate the strengths, frame the “gaps” as learning opportunities. No shame or blame. You might even debrief the rating itself. “Here is the rating we believe best represents the work, and why,” then allow for discussion. Folks will want clarity so they’ll know what to do the same or differently moving forward.
Pinpoint and pursue specific needs for improving alignment. Mine the data, and faculty questions, to identify specific learning needs. Then take action to support these needs through professional development, shared examples, further clarification or documentation of rating rules or guidelines, and/or additional practice. Examples might be:
- The need to unpack the language in the criteria to ensures shared understanding
- The need to set, document, and reiterate specific rating rules, e.g., “If they meet everything in column 2, but only part of what’s in column 3, the “Level 2″ rating stands.” Or, “If their performance meets everything in column 3, but only part of what’s in column 4, assign the rating a “3.5.” You get the idea.
Rinse and repeat. Use the process several times throughout the year to gauge whether calibration is improving over time and with practice.

5. Benchmark Examples Collection

This takes time, and practice, and organization–but in the long run, it can be helpful. The process, too, of selecting student work samples and determining which performance level it represents is also incredibly helpful to further align your team! Over time, you can curate student work samples–across different learning contexts, such as disciplines or grade levels– that reflect a particular level on the skill progression or rubric, so that both educators and students have tangible examples of what success looks like along a learning trajectory.

And there you have it! Five ways to calibrate when rating student work. Happy calibrating.

5 Effective Ways to Calibrate When Rating Student Work

Five practical and effective ways to help your team calibrate when rating student work using rubrics or success criteria.

1. Two Views

2. Team Calibration Protocol

3. Modeling & Debrief (Fishbowl)

4. Survey to Sample

5. Benchmark Examples Collection

3 important ways to make your profile of a graduate actionable

4 things every educator should know about competency-based learning

Quick Links

Stay Connected

5 Effective Ways to Calibrate When Rating Student Work

Five practical and effective ways to help your team calibrate when rating student work using rubrics or success criteria.

1. Two Views

2. Team Calibration Protocol

3. Modeling & Debrief (Fishbowl)

4. Survey to Sample

5. Benchmark Examples Collection

3 important ways to make your profile of a graduate actionable

4 things every educator should know about competency-based learning

Quick Links

Stay Connected

Thank you!