So, your company is sitting on an ocean of operations-related data. That could be a great first step towards meaningful insights! But when you collected it, did you know up front how you intended to use it?
Data without relevance is fools' gold
The key to relevant data: it's meaningful, objective, aligned with your goals and can be used to solve specific problems. When collecting operational data with the goal of applying intelligent models to a very specific process, consider the labels you are using to tag phrases, qualities, characteristics and images.
"Subjective labels like "good", "better" and "best" might be obvious to your human quality inspectors, but they make no sense to an Al algorithm because they have no idea what characteristics makes a product "good" in the first place," explains Wouter Labeeuw, data scientist at delaware.ai.
"We've kicked off projects and only learned later during the model testing phase that metrics were subjective - which meant relabeling data from square one with more objective tags."
The 3 types of data
In operations and machine learning in general, three types of data are used to train machine-learning models, each of which corresponds with a specific source technology.
Visual data
Captured by cameras, visual data is made up of images that are tagged according to what they contain (people, vehicles, characters, defects, colors, quality, etc.). Computer vision is the corresponding Al technology for visual data.
Textual data
Gathered via camera, scanners or digital documents, textual data is organized into linguistically relevant characters, words, sentences and concepts. Natural language processing is its corresponding Al technology.
Numerical data
This type of data is neither visual nor organized into linguistic elements and is made up of figures and measurements gathered by machines, sensors or people. Driver analysis is the technology that delaware.ai applies to determine how these figures influence each other in specific contexts.
A database is good, but a data platform is better
We encourage our customers who are working with large amounts of data to invest in a data platform - a cloud database managed by a single, central data governance framework. This single framework simplifies the process of transforming data into the form needed by a machine-learning model that is trained to solve a specific problem.
Nowadays, cloud-based data platforms have the power, cost effectiveness, reliability and security to handle almost any corporate or industrial machine-learning project. Without one, your data team will have to rely on good old elbow grease - manual labor - to clean, update and transform the data you collect.