Elevating Data Quality Considering User Experience

# Elevating Data Quality: The User Experience Connection In an increasingly data-driven world, relying on high-quality data and content to inform our decision-making is essential. However, **the quality of data often suffers**, containing misrepresentations of its intended message. Much of our data is self-reported, leading to inaccuracies or outright falsehoods due to biases or a lack of consideration regarding how **to supply** the data. Examples include employees reporting to supervisors, completing surveys, and providing micro-feedback in applications and solutions. While personal motivations and social pressures do not always influence this data, they often play a significant role. People can manipulate data both consciously and unconsciously, affecting the qualitative and quantitative data they collect. Therefore, understanding human psychology, behavior, and decision-making is crucial for improving data quality. By focusing on user experience, we can enhance the reliability of the data collected, leading to better decision-making and more accurate models for future applications. In this article, I will argue that recognizing the context in which data is gathered, the motivations behind it, and the perceived importance of the data can significantly improve its quality and the willingness to share it. > [!important] Definition > Data quality refers to the characteristics of data, such as accuracy, consistency, completeness, reliability, and timeliness. This article will focus on data accuracy, which is the truthful representation of the world it aims to describe. Poor data quality results in an inaccurate representation of reality. Why is this important? Consider your organization's decision-making process and a well-known concept in computer science: [Garbage in, garbage out - Wikipedia](https://en.wikipedia.org/wiki/Garbage_in,_garbage_out). This model shows that poor input can lead to bad decisions and inadequate data models. Your organization likely relies on various data sources, such as reports, forms, time tracking, engagement polls, and exit interviews. When users provide data without considering their time, focus, and cognitive capacity, they often submit low-quality input. Respecting users is crucial for gathering high-quality data that enhances the decision-making process and supports future machine learning models. ```mermaid graph LR A[🤷 Complacent employee] .-> D; B[👩‍💼 Time-pressured managers] .-> D; C[🏃 Busy employee] .-> D; D[💽 Data] --> E; E[🔍 Decition Making] ``` *(Caption: A model illustrating busy, complacent, or time-pressured employees contributing to the dataset, which in turn affects decision-making)* > [!question] Question > As an employee, have you ever answered a survey or filled out a report by fabricating your response, feeling unsure how to reply, or rushing through it without considering the details? #### The rate of data gathering intensifies In our daily activities, we often rate something on a scale or provide a qualitative assessment of a process or initiative. This practice is common with the services, products, and apps we use. As product advisors, we advocate for this approach to enhance user experience and improve product-user fit. However, it has also been adopted by HR, management, procurement, and other areas within organizations. Unfortunately, organizations often rush to implement feedback mechanisms carelessly, neglecting the needs of employees, customers, or users. Key reasons for the increased data gathering: * **Adoption across many disciplines:** The widespread use of feedback mechanisms overwhelms users with questions from different departments. * **Ease of sending out surveys:** Technology has made it simple to create and distribute surveys. * **Increased focus on data-driven decision-making:** Organizations are adopting data-driven strategies to enhance their decision-making. Examples include quality checks related to work performed, timesheets, and other data collection methods that improve decision-making. Conducting surveys after tasks and in work contexts, such as following meetings and initiatives, has also become more popular. The emphasis on data models and data-driven decision-making has led organizations to seek more data and feedback, often overlooking the overall experience of the recipients. As the number of input boxes, forms, and lists that people feel "forced" to complete increases, cognitive load, stress, and disruptions also rise. This diverts attention and energy from their primary tasks. As we quickly accumulate these 'small' tasks, we rely more on mental shortcuts, such as biases, which distort the dataset and affect its quality. If we base decision-making and models on data heavily influenced by biases, it will lead to poorer decisions or AI models trained on biased data. To build effective agents and AI models, we must provide the most honest and thoughtful data for training, which will yield the greatest value. Given these challenges, focusing on the quality of human-generated data is crucial. > [!important] Bias > Social desirability bias and recall bias are mental shortcuts that often affect the data collected in these surveys. #### Importance of human-generated data As mentioned, respondents face an increasing burden that leads to many inaccuracies in surveys. As surveys multiply, the focus on thoughtful responses diminishes. Many people likely rush through feedback forms, whether for a survey or when reporting to a supervisor. The growing necessity to train specific general AI models with our data emphasizes the importance of the underlying methods for accumulating this data to create high-accuracy representations. Therefore, organizations must prioritize the quality of data input, ensuring that feedback mechanisms minimize cognitive load while maximizing response accuracy. Furthermore, legislation like GDPR and other privacy laws will likely continue to require users to willingly provide their data, making the value of self-reported data significant for some time. Additionally, we need to collect data to train our models so they better align with the needs of those they serve in the future. Humans often give inauthentic responses because we tend to present ourselves in the best possible light. This lack of honesty can distort the representation of the world that the data is trying to convey. As a result, we may obtain unforeseen outcomes that compromise the effectiveness of our models, leading to decisions based on distorted perceptions rather than reality. Humans generate data that provides essential insights into their actions, choices, and experiences. However, self-reported data quality faces several challenges. Users may not fully share their knowledge, and biases can affect the data. Gathering human perspectives is likely valuable for decision-making organizations and the AI models we train to assist us. To address these challenges, organizations should consider the following key factors: * **Regulations and policies**: Ensure compliance with laws that require users to willingly share their data. * **Accuracy**: Emphasize the importance of accurate data input to enhance the reality it aims to represent. * **Gathering the human perspective**: Actively seek to understand the experiences and insights of respondents to enrich the collected data. By focusing on these aspects, organizations can meet the demand for high-quality data and build a more trustworthy relationship between users and data collection processes. Additionally, we can use various data analytics and data science tools to clean and organize the information. #### Mitigating poor data quality I want to acknowledge that there are alternative ways to address the issues with the collected data. In analysis and statistics, various tools can help mitigate some of the challenges mentioned earlier. Despite these adaptations, individuals will still report data with variations and errors; however, this is not my main point. These methods typically address outliers and structural issues, but they do not guarantee accuracy. While techniques like data cleansing and organization aim to minimize human error, additional strategies are necessary to counteract biases, poor survey design, or timing issues. For example, if employees are asked to rate their stress levels during a busy workweek and the survey is conducted on a Friday afternoon, right before the weekend, they may report lower stress levels due to the anticipation of time off, even if they experienced high stress throughout the week. This can result in skewed self-reported data. From a data scientist's perspective, such variations do not necessarily indicate errors in the data collection process. However, completely eliminating these biases remains a challenge. Although this example may seem straightforward, similar instances can frequently occur in a fast-paced business environment, emphasizing the need for more thoughtful approaches to data gathering. > [!quote] Diminishing trust > Requesting data from users without a clear strategy or consideration for their needs can be disastrous. At best, it may annoy users or lead to a loss of trust. #### Building trust and respect, quality will follow High-quality data is essential for decision-making, operational efficiency, regulatory compliance, and customer satisfaction. Reliable and accurate data is vital for achieving successful outcomes in these areas. The key point I want to emphasize is the need for a more mindful approach to data collection. I notice that more organizations are using low-cost methods to gather feedback, make decisions, and manage governance processes. We must raise awareness about the importance of implementing these tactics thoughtfully, taking into account cognitive load, value, and the attention of those providing the data. We must recognize the hidden costs of data collection, which can distract individuals. We live in an attention economy. When organizations ask employees to complete a survey that takes 5 to 10 minutes, they consume time and cause interruptions. This process also requires effort to ensure high-quality responses to what may seem like a simple task, such as filling out a survey or rating on a scale. Is it worth it? Is the survey framed correctly to encourage respondents to genuinely invest 5 or 10 minutes? I won’t go into the details of survey design or the importance of information retrieval for respondents, as that is a broader topic. However, we need to take a thoughtful approach to data collection. This approach should respect users and consider their needs, ultimately improving both the quality and quantity of the data collected. While perfection may be unattainable, we cannot function without data. We can improve our practices today by treating users thoughtfully through good design and understanding the origins of the data. Building trust enhances the quality of our data. **Key points to remember for improving data quality:** * **Evaluate the cost of data gathering:** Consider the true cost of data collection, which goes beyond just time. Aim for deep insights instead of superficial responses. * **Establish governance and clear strategies:** Create a framework for data collection that includes guidelines on necessity, usage, and processes. * **Be thoughtful about survey placement:** Use surveys wisely. Consider whether one-on-one conversations or other methods might provide better insights. * **Utilize a survey designer:** Engage experts in survey design to create effective and unbiased questions. * **Adopt a holistic user perspective:** Understand users' contexts and experiences to tailor data collection methods effectively. * **Communicate the purpose:** Clearly explain the value of data collection to build trust and encourage thoughtful responses. * **Regularly review practices:** Continuously assess and adjust data collection methods based on user feedback and effectiveness. ```mermaid graph TD A[🤔 Consideration & Respect for the user] --> B; B[🤝 Builds Trust] --> C; C[💎 High quality and quantity of data]; ``` *(Caption: A model illustrating how respecting and considering the user builds trust. This, in turn, leads to higher quality and quantity of data.)* > [!tldr] TLDR > Building trust in services and organizations requires consideration of human attention, cognitive capacity, and time. This approach results in high-quality data and improved decision-making. --- # Relatert