I'm looking for help on creating a coding comparison for our project. We had 6 coders (users) and 16 transcripts (sources). Each source had text references that were coded to 5 nodes (with positive/negative sub-nodes). Each transcript was coded 3-4 times (assigned at random), but this means that a regular coding comparison will be inaccurate since not all coders worked on each source.
My challenge is: How do I show interrater reliability?
My vague idea of what it would look like is:
– run a report showing the list of users that coded for each source (I haven't been able to figure out how to do this)
– run a coding comparison for each source separately and only include the coders that are listed as working on that source.
That seems really tedious and doesn't really address the problem of showing % agreement for a particular node. For example, comparing User 1 to Users 2-6 would have to only consider the sources that User 1 worked on. Comparing Users 1-3 to Users 4-6 would have to include all sources but only consider agreement between the users that actually worked on the source for a given reference. Does that make sense, or am I speaking in conundrums?
I think you may be misunderstanding how the coding comparison query works. Why not run all 16 transcripts against the fifteen nodes? Personally, I would run it by comparing each coder against the other four in a group. Alternatively, you could run it for each coder against each of the four others separately as well. You could then average your results in an Excel table.