Impact Evaluation Culture

By LEAD Research Team

One thing I like about Impact Evaluation is the discipline it imposes on your thinking. It makes you think twice before jumping to conclusions, and it makes you open to learning — both qualities I think the world could use more of. I’ve also been surprised by the scope of application of this kind of thinking. Impact Evaluation is useful not just for large-scale, massively-funded studies, but even for structuring the everyday decisions that come up in the course of work. It becomes part of the culture of how you do things.

Here’s a good example. Right now I’m in Thanjavur on the KGFS Impact Evaluation, making preparations for our Endline survey. Unlike the previous two rounds of surveying, this one will be done electronically, opening up a range of new possibilities. Recently we’ve been working on modifying the tablet’s predictive text feature to improve accuracy and speed in spelling respondent names. Incorrect spellings of names can cause a number of problems in fieldwork and analysis, and we hoped we could leverage our new technology to help.

Our solution involved overwriting the tablet’s in-built dictionary of words with a custom dictionary of 5000+ correctly spelled names that we had collected over the past year of fieldwork. Now when users type, the tablet automatically makes suggestions from our names database.

Implementing the new feature wasn’t costless. It added about 5 to 10 minutes to the set up time for each tablet (which, for 10 or so tablets, adds up to a significant chunk of time). Plus, for each new feature we implement, we add another source of potential problems and troubleshooting in the field.

To check whether this effort was worth our time, we tested it. We called 6 surveyors to the office and had them take a dictation test. From a list of 80 words, we randomly selected 30 for the 1st round and 30 for the 2nd. As a dictator read names aloud, the surveyors entered them on the tablets. In the second round, we equipped the tablets with the auto-suggest feature. We used our survey collection software — SurveyCTO — to help us collect the data, and keep track of the amount of time they spent typing each name.

We found that the auto-suggest feature does indeed improve spelling, but not necessarily consistently across surveyors. On the whole, it looks like gains are pretty modest, and has the greatest impact on surveyors who are already good speller.

Graph 1: Spelling Correctness by Surveyor

At the same time, the new feature doesn’t seem to help surveyors type names faster. If anything it looks like they get slightly slower. The p-value from a two-tailed t-test comparing the time taken to complete each word before and after implementing the user dictionary fails to find a statistically significant difference.

Table 2: Average Time to Write a Name Before and After Introducing the Auto-suggest Feature (seconds)

 
Before
After
P-value
All names
13.78
14.39
0.376
Only names in dictionary
12.92
13.30
0.601

Note: Not all of the names in dictation were included in the dictionary that populated the tablet’s auto-suggest. This was done to simulate conditions in the field, where it’s likely that we won’t have encountered a respondent’s name ahead of time. The second row considers the timing for only those names from dictation that appeared in the dictionary.

The results tell me that this feature is probably worth our while to implement, but that if we want to improve the spelling of the worst spellers (which is what I was hoping this feature would do), we’ll have to brainstorm another strategy.

This experiment — a little corner of work within the larger KGFS Impact Evaluation — strikes me as a good metaphor of what we’re trying to accomplish. In development there is still more scope for developing a culture around thinking critically about how we work. It’s a mindset that can be applied to even the smallest of tasks.