For the three situations described in Table 1, the use of the McNemar test (designed to compare coupled categorical data) would not make a difference. However, this cannot be construed as evidence of an agreement. The McNemar test compares the total proportions; Therefore, any situation in which the total share of the two examiners in Pass/Fail (for example. B situations 1, 2 and 3 in Table 1) would result in a lack of differences. Similarly, the mated t-test compares the average difference between two observations in a single group. It cannot therefore be significant if the average difference between unit values is small, although the differences between two observers are important for individuals. Kalantri et al. considered the accuracy and reliability of the pallor as a tool for detecting anemia.  They concluded that «clinical evaluation of pallor in cases of severe anaemia may exclude and govern modestly.» However, the inter-observer agreement for pallor detection was very poor (Kappa values -0.07 for conjunctiva pallor and 0.20 for tongue pallor), meaning that pallor is an unreliable sign of diagnosis of anemia. Paper (Keim et al. 1976) was a comparison between dye dilution and impedance cardiography for measuring heart rate volume.
They used correlation coefficients between the measurements with the two methods. They did it for a group of patients. For 20 of these patients, they did several pairs of repeated matings on the same subject. They then calculated the correlation between the repeated pairs using the two methods separately for each of these 20 patients. The 20 correlation coefficients ranged from -0.77 to 0.80, with a 5 per cent correlation being significant. They concluded that the two methods did not match, as low correlations were found when the field of cardiac performance was low, although other studies covering a wide range of cardiac performance showed high correlations. We thought that a conference that said everyone would be wrong and sit down would fall a little flat. We had to come up with a method that was the right one. We thought the basic statistical methods were obvious. If we are interested in an agreement, we want to know to what extent the measures can be removed from each other in the two different ways.
So we started with two methods with the differences between the measurements made on the same subject. We can calculate the average and standard deviation of these differences. If the average and standard deviation are constant and the differences are approximately normal, 95% of these differences must be between the average minus 1.96 SD and the average plus 1.96 SD. Later, we called them the 95% limit of the agreement. Dependent and independent variables are measured with errors, but the regression line with the smallest squares ignores the x error. It assesses the average y value for an observed x. The expected slope, given the theme «Medical Statistics: Making a Difference in Health», I thought I could start with a half-quote from John Cleese: accuracy is how close a measure of fair value for this measure is. The accuracy of a measurement system refers to the proximity of the concordance between repeated measurements (repeated under the same conditions).