What is Missing Data?
‘Missing data’ is a pervasive concern amongst the entire research community! It refers to the situation where observations or information for a parameter of interest in an experimental data set are not recorded. Nearly all researchers encounter this problem of “missing data” at some point in their career! This can happen when participants fail to or choose not to reveal certain information during the data recording process. It may also happen if a researcher fails to design an experiment wisely and exhausts valuable but scarce experimental resources. Additionally, it might also happen either when data collection is improper, or mistakes are made during data entry.It is a real challenge when it comes to data analysis and interpretation, as there is no perfect way to deal with datasets missing crucial information or values.
Why is Missing Data a Problem?
The impact of missing data can be serious, especially in quantitative and statistics based research as it may result in biased estimation of crucial study parameters and poor generalizability of findings. Furthermore, overlooking missing information might lead to loss of information and in turn low statistical power due to increase in standard errors. Therefore, it is wise to identify these reasons by making plausible assumptions about the ways in which data might be missed. Researchers must create robust design that minimizes the chances of missing data.
Types of Missing Data
Missing data may also be classified based on the reasons or mechanisms that lead to missing data.
Missing completely at random (MCAR)
The reasons for ‘missing data’ are independent of the observed and missing responses, i.e. all the cases have the same probability of being missing. This is demonstrated in situation A where students were unable to take the survey due to random, unpredictable reasons.
Missing at random (MAR)
The factors that lead to missing data in this case are conditionally independent of the missing responses. For instance, missing data due to scenario B is independent of the variable of interest (i.e. whether students are facing peer pressure or not) but it might depend on other observed variable (i.e. their grades or expertise in certain extracurricular activity).
Missing not at random (MNAR)
In this case, the missing of data depends on both observed and missing data. Scenario C where students not responding to sensitive questions related to peer pressure leads to missing data of this category.
Example of Missing Data
Consider a situation where a researcher wants to administer a questionnaire about peer pressure in college students. On the designated date of survey various circumstances could lead to missing data.
A: Some students may be absent at random due to unpredictable reasons
B: Some students may be absent as they may be representing their college in competitions or events
C: Few students may not respond accurately to sensitive questions (they might be more likely to have experienced peer pressure)
All the above situations will indeed lead to missing data. Based on this information, we might infer that missing data may occur at two levels – unit and/or item. A unit level missing data may occur if an enrolled participant fails to show up for a study or declines to take the survey. The resulting bias is known as ‘selective’ as the responses of these participants might turn out different from those of the other participants. On the other hand, an item level missing data refers to incomplete data collected from a participant enrolled in the study. For instance, the participant may miss or not answer certain questions in the survey.
How to Avoid Missing Data Problems
1. Design your study keeping in mind the research objectives
Ensure that you only collect data that is indispensable or absolutely essential to achieve the target objectives. This might reduce the unnecessary burden on participants and research staff of collecting non-essential information. If three assessments are sufficient to reach valid conclusions, and successfully completing the study objectives, it is not wise to conduct additional assessments. Furthermore, it could result in efficient utilization of resources (time, money and staff) and improved quality of collected data.
2. Target an appropriate participant group
Assess the time period and inclusion/exclusion criteria before enrolling participants for your study. If your study objective is to assess the outcomes of a therapy or drug treatment for six months, it means you have to exclude participants not willing to participate for the said duration.
3. Keep your data collection protocols simple and easy to administer
Use simple words and keep your question short and to the point. Try to be as specific as you can be. For instance, rather than asking “Do you regularly exercise?”, you may instead ask “On an average how many days per week, do you exercise?” for obtaining objective and more precise answers.
4. Be open and flexible to different methods for data collection
Allow and make provision for multiple methods of assessment. For instance, if the study participants are not willing to come to the survey site (clinic or research lab), allow alternative means of assessment such as self-administered questionnaires, telephonic interviews, and zoom interviews, if appropriate.
Before beginning with your research, develop a detailed protocol that includes the methods for screening the participants, procedures to collect, document and record data. This will help in determining all the probable factors that could result in missing the collection of crucial information. Assess these factors thoroughly so that appropriate amendments can be subsequently made in case of missing data. In addition, get your study reviewed by an advisory or data monitoring committee that will methodically and meticulously scrutinize your study proposal. Their inputs will be invaluable to minimize the chances of missing data during the course of the study.
Conduct a training session for all the participants briefing them on all the aspects of the study. This might decrease the chances of participants dropping mid-way as they are now completely aware of the course of the study.
7. Trial run
Perform a mini trial before you begin with the actual study. This might help you recognize unanticipated and unforeseen problems which are likely to occur during the course of the study. This will also help you estimate the amount of missing data you encountered in the trial run. Consequently, keeping this in view you can perform sample size calculations. This might further reduce your chances of having an underpowered study.
8. Set priori targets
Set a limit for acceptable level of missing data. Identify the techniques that can be used to handle missing data in case the acceptable level is breached.
9. Follow-up with participants
For clinical studies, engage and follow-up with participants to ensure they complete the entire study and provide you with all the necessary information. In case, some participants wish to withdraw from your studies, record the reasons for the same for subsequent analysis when you are interpreting the results.
10. Ensure you allocate resources that facilitate data collection effectively
Resources including travel reimbursements to participants, appropriate compensations for their participation time, salary support for the research team conducting research, funds to procure samples or reagents, etc. must be allocated and distributed wisely. A part of contingency fund must be kept reserved in case of any unforeseen events!
To sum it all, careful design considerations is one of the best ways to keep problems arising out of missing data at bay! Although the problem of missing data cannot be avoided completely, the likelihood of its occurrence can be significantly minimized!
Have you faced challenges related to missing data? How did you deal with it? Let us know in the comments section below! You can also visit our Q&A forum for frequently asked questions related to different aspects of research writing and publishing answered by our team that comprises subject-matter experts, eminent researchers, and publication experts.