The matrix presented in Table 1 was used to create the probabilities that a “svaluor” in the simulation would give an assessment of this degree of SEVERity of the CDR, because the value of the gold standard (column) is true. Ten thousand Kappa estimates were generated for different test sizes (number of cases evaluated). In the first condition, the “test” cases were evenly distributed across all sizes (“unweighted”) on the severity levels of the CDR (total value of 5, 10, 20, 25, 30, 40 and 50 cases); for a size 5 test, one case per CDR level was assessed; for a size 10 test, two cases per level were assessed and 10 cases per level were evaluated for the size of test 50. Under the weighted conditions, test sizes were not evenly distributed over THE severity of the CDR, but more severe severity levels were weighted, i.e. twice as many test cases assessed by these values as the others (total values of 7, 14, 21, 35, 42 and 49). In one of the weighted state rates of 5Ã—5 matrices, more test cases were “assessed” for CDRs 0 and 0.5 (in Table 1, the most important values to be assessed are the most important, resulting in the most significant errors); [15]). We also repeated the differential weighting simulation for levels 0.5 and 1 of the CDR, which is a test situation in which trainees would draw attention to the distinction between cases or transitions from 0.5 to 1. It is important to note that these conditions are not weighted and unweighted Kappa estimates, but represent the distribution of test cases through CDR values in our simulated matrixes. All simulations calculated the simple unweighted kappa (simulation details in the appendix). For example 1, the standard deviation in cell B18 in Figure 1 can also be calculated with the formula `BKAPPA` (B4,B5,B6). The sample size represented in Figure H12 in Figure 2 can also be calculated with the BKAPPA_SIZE formula (H3,H4,B5,B6,H8,H11,H7).

In a multi-centre clinical trial, the rat certification “test” could provide estimates of people`s compliance with the “correct” assessment, and the accuracy of this estimate should be high. The ideal situation is both a high degree of agreement and a high value that represents the lower limit of the confidence interval around the estimate. Therefore, training programs should strive to achieve a kappa value calculated in the “area of excellence” (i.e. at least .80 [13]); We also propose that the lower limit of the confidence interval should not be less than “substantial” (i.e. at least .61 [13]). This study describes the optimal number of trial case evaluations and their distribution among potential CDR values to certify this combination criterion (point estimation and lower limit) of agreement and consistency among several CDR advisors. Simulations were conducted to determine the relative size of confidence intervals for Kappa statistics (agreement with the gold standard rating) based on sample size, distribution of test CDR levels, and previously reported compliance levels. Clinical Denial Rating (CDR) is a valid and reliable overall measure of the severity of dementia. Diagnosis and step-by-step transition depend on its consistent management. Reports on the reliability of CDR assessments are based on one or two test cases of each severity. Accord (kappa) Statistics based on so few cases evaluated show major errors and confidence intervals are wrong. The simulations varied the number of test cases and their distribution at the CDR level in order to deduce the sample size, giving a 95% confidence that the estimated kappa is at least 0.60.