Training: A Way to Improve Data Quality

By LEAD Research Team

Currently, I’m involved in the field team training for a second round of data collection for the Bricks project. As we’re in the intervention part of the study, this round is going to be as interesting and challenging as the previous round. There are a few important aspects which, if emphasized at the time of training, could help not only improve the data quality but also reduce the time and effort needed for consecutive rounds of data cleaning.
Here are a couple of pointers from our training:
  • Correct Respondent Identification: If the enumerators are entrusted with the responsibility of identifying the respondent, then the recruitment criteria should be clearly explained and listed. Along with this, all possible cases of contradictions, eliminations and preferences should also be explained. If there is already an identified respondent then the replacement cases should be dealt with specifically. Under all circumstances, any conflict should be reported immediately and the field team should take note of such cases.
  • Codes for options: All the codes for the answers should be discussed and explained properly. Often, there are individual interpretations which lead to bias in selection of the answers for the same response. This can be clearly observed when we start looking at the data enumerator-wise. To avoid this mistake, at the time of training there can be one master respondent who can answer to a group and then the dummy data entered can be verified.
  • Translation errors: Correct translation of the question is very important for the enumerator to understand the context of the question. The question will not make any sense to the respondent if it is not in the correct context and hence, the data  collected can be inaccurate. At times, there are variations in the dialect of the language and so it is important for the enumerators to know about these variations
  • Comments and Closing: Since there are always a few observations which an enumerator captures during the course of the survey, it is very helpful to record these observations. Having specific and to the point comments is very useful as it gives us an idea about the respondent’s state of mind at the time of response. Also, the closing status of each respondent should clearly indicate how successful the survey has been. For e.g. separate codes for Refusal, completion, substitution etc. should be used at the end of the survey.
  • Pilot experience: The field experience from the pilot rounds should be shared with the enumerators. This gives them a pre-field training experience and all possible questions from respondents can be sorted and discussed to bring about more clarity.
  • Handling exceptions: In case of refusals or incomplete surveys, enumerators should be trained to convince respondents of the objectives and benefits of the data collection. Also, it should be explained that refusals are the individual right of the respondent.
  • Mathematical calculations: This is another important aspect during data collection, particularly if there are conversions involved. There are two important points that should be dealt with at the time of training. The unit of data is very important and so it should be clearly noted at the time of response. Secondly, sometimes (like in case of land area, unit of crop production etc) there are mixed units (large- Quintals  and small- Kilograms). In these cases, the conversions have to be done so that the data is in one standard unit. With a lot of practice and the help of the calculators, this can be done easily and accurately or surveyors must be given the option to select units within the survey so these conversions can be done at a later point in time.
Though this list is not exhaustive, if dealt with meticulously at the time of training, it can definitely deliver improved data quality. Along with this, if there is digital data collection with programmed checks and bounds, the errors can be further minimized. I personally feel that, the more effort we put at the time of training, the better data quality we can get.