Self Service Data Preparation by Advanced Spreadsheet Users is Becoming a Waste of Time and a Danger of Data and Analysis Integrity
New data behavior such as the rapid increase in the amount of data, the variety of new data, the all-presence of its unstructured form, etc. makes the preparation of data for its analysis more and more complex. At the same time, data preparation is becoming increasingly critical. Data Preprocessing tasks such as standardization or merging, and Data Hygiene tasks such as cleansing or augmentation are vital elements to ensure meaningful and useful analysis for decision taking (refer to Big Data Processing Cycle and The Pitfalls of Not Following It for further info.)
Though the research does not follow the same data analytics cycle steps of the Big Data Processing Cycle as described by DigitalFullPotential.com, it covers many of its preliminary tasks. The results are not surprising for those using the spreadsheet in their daily work, but they show a phenomenal improvement opportunity just in wasted time savings alone, not to mention the error-prone related to self service data preparation.
“Each advanced spreadsheet user can spend up to 9 hours per week repeating effort when data sources are updated. In Europe alone, this represents 2 billion hours of duplicate work” Source: IDC and Alteryx Advanced Spreadsheet Users Survey in Europe.
Tasks such as copy-paste, which is the most mentioned method of data acquisition into analytics spreadsheets is a source of waste, weariness and human errors, with no value added. Respondents of IDC study use spreadsheet file links and other functions like VLOOKUP as well, though with less frequency than cut-and-paste. These methods are also prone to repetitive work and errors as original spreadsheet links are easily broken and overwritten. Compliance implications associated with these data handling processes were not reviewed in the study but should also be analyzed by any company using sensitive data.
A new kind of self service data preparation apps has emerged. However, many companies are not adopting them. The main three concerns mentioned are:
- implementation times
- compatibility with other applications
- high cost
As recommended in the study, a business case based on time/cost savings, error elimination and modern apps would make economic sense for most companies.