Home |
Search |
Today's Posts |
#1
|
|||
|
|||
A comparative versus evaluative, double-blind vs. sighted control test
Hi RAHE'rs -
I've had many inquires and some interest in my proposal that before comparative dbt'ng is crowned "the" test for audio evaluation, it needs to be validated by a control test. While I have sketched such a test in several different posts/threads, there seems to be enough confusion over what I have said that it is worth outlining here in a definitive post on the subject. In addition, at the end I will respond to Tom's offer to join together in such a test. WHAT IS THE ISSUE? As I have analyzed my own and others arguments here for and against comparative dbt'ng, it seems to me that the issue has much less to be with being blind than it does with being comparative. In other words, does a test "forcing" a choice under uncertainty duplicate the results that would be obtained by listening and evaluating components at home in a relaxed atmosphere, whether blind or sighted. I have accordingly proposed that the only way to validate the comparative dbt as the definitive tool is to remove this question mark. And it could be done, with enough time and resources devoted to it. As such, the control test must separate out and test two variables - * evaluative (blind) vs. comparative (blind) ,,, a test of evaluative testing versus comparative testing * evaluative (blind) vs. evaluative (sighted) ,,, a test of blind vs. sighted testing With the answers to these two comparisons, it should be able to answer the following questions? * Does blinding give better bias control? (presumably yes) * How close can open-ended, relaxed, sighted evaluative testing (the traditional home "sighted" tests which are believed worthless by the objectivists) come to duplicating the results of open-ended, relaxed, but blinded evaluative testing. Same test technique, but blinded, which objectivist presumably would support. * Do traditional comparative dbt tests give identical results to more relaxed and evaluative dbt tests? (answer simply not known, but postulated by subjectivists as "no", thinking that the test itself is different enough to get in the way). Essentially, the blinded (dbt), relaxed, evaluative test is "the missing link" between the current dbt camp and the current subjectivist camp as it helps resolve both the "blind" issue and the "comparative vs. evaluative" issue. Using components playing music, not artifacts or pink noise. GENERAL TEST CONDITIONS * Participants must take place in all three tests...open end sighted, open end blind, and comparative blind. * There has to be enough trials of each type to allow statistical evaluation. * Musical selections and media must be agreed to in advance by all parties as being sufficiently varied to reveal all types of significant audio reproduction qualities. (Dynamic range, soundstaging, depth, dimensionality, bass quality, treble quality, midrange quality, etc.) * Equipment under test must be believed by most participants to sound different from one another under sighted conditions and to have some degree of objectivist skepticism about same. * Equipment under test, everything else being equal, should make testing under home/similar to home conditions as simple as possible, including time-synched switching. * Tests must either be done in-home of participants, or at a site accessible to participants over long periods of time on a sighted basis before test ratings collected. EVALUATIVE TEST CONDITIONS * Open-ended home listening must supplant informal note taking with formal rating of components on evaluative scale, in order to be able to statistically correlate with blind evaluative testing. * Evaluative scale should draw from and reflect all significant variables suggested by RAHE participants, reduced to a manageable number by consolidating very similar qualities. COMPARATIVE TEST CONDITIONS * Test should be a-b, rather than a-b-x, in order to better approximate the evaluative tests * Test should ask for overall preference and preference on comparative version of evaluative scales (at least those found significantly different in the evaluative testing.) BLIND TEST CONDITIONS * Participants should be allowed substantial "warm up" time on sighted basis to listen to the test equipment using the musical selections to be used in the test. * Participants should be allowed to control switching of test. * Participants should be left alone in room during test ideally, and should "turn in" ratings to out of room proctor who also has recorded the actual a-b assignment for each trial. * a-b assignments shall be based on random drawings and then adjusted, if needed, slightly to assure equal positioning and no chance of order bias. * * * * * * * * * * * * * * * * With those general conditions established, I would like to discuss actual test implementation practicalities. This is where it gets complicated. THE OPEN-ENDED SIGHTED EVALUATIVE TEST Essentially, as I described in an earlier post, the typical audiophile puts a new piece of equipment in the system, listens open-ended for awhile, switches back, does the same, and by doing this a few times over several selections of music begins to hone in on what characteristics the new equipment has in his system versus the old. These may be improvements; they may be deficiencies. He continues to do this until a) he has to return equipment, or b)reaches a definitive preference for one or the other (a preference growing organically out of the evaluation and the emergence of defining audio characteristics). How to best approximate this test on a slightly more structured basis, so that results may be compared to later tests? The first and probably only thing required, seems to me, is to substitute formal evaluation rating scales for the informal notes done during this process. My suggestion is that the evaluator would have perhaps half-a-dozen interim rating sheets that he/she would use, lets say for six weeks. Then at the end, he/she would review those sheets and put together a "final" rating for the two pieces of equipment. These would be on an absolute scale for the two pieces. For example, both might be rated high on "throw a wide soundstage beyond the outside edge of the speakers". One would be rated "5" and the other "4" on a "1" to "5" scale. So this score can be used both as a numeric rating and as a comparative rating, e.g.. both same, one higher (different, higher) on that characteristic. Their would also be a similar rating "preference overall" that might be "4" and "3" (different, better). Or perhaps "4" and "4" (no preference). However, one can immediately see one problem. With a sighted test, there is no such thing as doing 16 independent trials, since presumably once the person "locks in" his future ratings would be very similar since he knows which equipment is which. Even allowing for differences in moods, climates, etc. these would not be sixteen independent tests. The implications of this are that for the "relaxed, evaluative, sighted" versus "relaxed, evaluative, blind" tests, more than one person must be tested....probably at least twenty. In the food industry we used to consider 100 as the smallest test size we considered reasonable. This adds enormously to the cost, time, and complexity of running such a test if one is to do it in-home. It would be a little more manageable doing it out-of-home at a central facility, and having sixteen audiophiles do it. But this is fraught with problems...an unfamiliar system probably requiring more time to reach a final evaluation for each respondent, the need to maintain the setup for several weeks to allow all respondents to have multiple exposures before doing so, etc. Problems, problems. THE OPEN-ENDED, BLIND EVALUATIVE TEST This test would be very similar to the open-ended sighted test, but double-blind. Once a warm up period of perhaps a few hours was over, however, the respondent would take a trial, rate, turn in, take a break, start another trial, etc up to four in a row. If repeated four days or four weeks in a row, this could result in sixteen trials, enough to determine significance of differences in ratings. The ratings would be the same used in the sighted testing. The results of this test would be: were differences between the equipment found, and were they statistically significant at the 95th percentile. What characteristics, if any, came through as significantly different. Once a respondents results were determined (different, same) overall and for each characteristic, they could be compared to open-ended scores and a correlation established (or not). Since the open-ended sighted test only had one score, it would be hard to evaluate significance for an individual person on these correlations, but if done across 20-100 people, a statistical correlation could be established. For this to be a true "scientific" test, it would have to be done across a substantial population of audiophiles as has already been pointed out. THE COMPARATIVE, BLIND TEST The main blind (a-b) test would use the evaluative factors of the sighted and blind evaluative tests, but on a comparative basis (e.g. which did you prefer overall, which had the widest soundstage, etc.). The comparative evaluation test could be directly correlated with the blind evaluative test, as well as within itself over sixteen trials. Again, these probably should be done in groups of four since they require a fair number of ratings. Not essential but of possible interest, would be to do a traditional a-b-x test as well, to see if it correlated with the overall preference a-b test (% of respondents noting a difference in each/statistical significance of same). * * * * * * * * * * * * * * * * * * * * * * * IMPLICATIONS As noted, to truly be significant, this test has to be done across a sample of audiophiles, probably at least two dozen in-home evaluations and subsequent test follows up. This would kept Tom and I busy for a year. From a practical standpoint, the blind comparative vs. blind evaluative tests are easier to do, since multiple trials allows for internal statistical validity. I would be willing to develop and be the initial testee of such a test along with Tom, whom I would also ask to do the same, and perhaps a "neutral" third party. I would also do the sighted test, but the results would be strictly "anecdotal" until an appropriate database of RAHE participants was built up, and would request that Tom and the "neutral" do the same. I would also suggest that a good and most interesting vehicle for this test would be a SACD player using stereo mix SACD and CD layer, on disks and tracks judged appropriate and "identical" in mix. The test would be easy to run...two identical side-by-side SACD players into a preamp input, with control box switching or manual switching, automatically volume matched, no impedance problems a la speaker cables, and perhaps some ultimate insight into "is there a difference in SACD vs CD". I have a SACD player; Tom would have to buy one or borrow one; same for neutral third party. If SACD is judged impractical, then I would suggest a CD test between two CD players judged to be likely audibly different...say an Arcam 27 versus a Sony $300 job. However, the equipment would have to be on long term loan, since it would take probably at least six months to complete the testing. We would also need neutral proctors to run the test and record scores. * * * * * * * * * * * * * * * * * * * * CONCLUSION There would be a fair amount of work needed to get this off the ground, but it is doable. In particular, I would want broad agreement within RAHE that it was worthwhile doing, and I would want input from members of appropriate test SACDS or CDs and tracks for testing, and I would want myself, Tom, and the other participant to agree on the selections to be used. Your comments and suggestions and questions are hereby solicited. Harry Lavo "it don't mean a thing if it ain't got that swing" - Duke Ellington |
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Forum | |||
Comments about Blind Testing | High End Audio | |||
Some serious cable measurements with interesting results. | High End Audio | |||
Mechanic blames amplifier for alternator failing?? Help>>>>>>>>>>> | Car Audio | |||
Richman's ethical lapses | Audio Opinions |