Text analytics is the process of drawing meaning out of written communication and is a growing trend for engineers in industries such as automotive manufacturing, aerospace design, industrial automation, and machinery, or energy distribution. Using text analytics, engineers can extract more value from raw text and combine it with sensor data and machine learning algorithms to improve functions like predictive maintenance.
The challenge with the process, however, is that the sheer volume of unstructured, raw text data sets can make it difficult for analytics tools to quickly and intuitively extract all the valuable information that may be available to the user.
Below, Seth DeLand from MathWorks explains how engineers can extract more value from raw text and combine that with sensor data and machine learning algorithms to improve functions like predictive maintenance.
What kind of text data is important to engineers and why?
Seth Deland: One of the major areas where we see text being used is in gathering and analyzing data from automotive maintenance reports. For example, these maintenance reports include information from the vehicles that can prove valuable to automotive engineers. There's text in those reports from mechanics about the vehicle's service history. At the individual level, a maintenance record describes what happened at that particular service visit. But, if automotive engineers can quickly and easily aggregate all of those reports, then there's a lot of real-world information that can be deciphered from the correlations. For example, automotive engineers could learn the vehicle's common service issues or, from a warranty perspective, understand key failures in the car that happen simultaneously.
On the other hand, many of today's maintenance logs are digitized and generated automatically. In the industrial automation and machinery space, this could mean that -- during the operation of heavy equipment -- the text from these digitized maintenance logs could be analyzed so that error messages or warnings are sent to operators prior to failure, thereby avoiding production having to be stopped.
Finally, Advanced Driver Assistance Systems (ADAS) in the automotive industry is a growing area for text analytics. When a car's camera captures images from road signs, those images need to be interpreted. Text analytics is a way to not only build models to read road signs but also to interpret the meaning of the text on those signs.
What other things are customers exploring when it comes to text analytics and maintenance?
SD: Predictive maintenance is an area that could directly benefit from text analytics. We already talked about how being able to easily generate insights from raw text data from maintenance records can provide benefits; however, this raw text data can also help engineers build algorithms to predict failures before any warnings are sent. For example, in the off-highway commercial space, if a piece of heavy equipment breaks down, that becomes a costly failure. We have customers producing heavy equipment that is going to be used on a construction site. When that piece of equipment fails, it results in more cost and time since the construction is stalled. For engineers in the industrial, automation, and machinery space, to be able to build algorithms that can predict these failures before they occur will prevent delays, thereby saving time and money.
What is the best approach to a text analytics workflow to leverage data and identify trends?
SD: The first step is to equip engineers within your organization with a text analytics tool. Such programs can transform raw text to text data by examining Word documents, PDFs, text files, and databases. Engineers can then sort the text data to identify insights and trends. This makes analyzing hundreds or thousands of Word documents and PDFs easier because it would be too time-consuming or impossible for a human to look at each one individually.
Before analyzing the text, however, it's important to begin with a pre-processing step in which the data is "cleaned" to filter out the noise that comes with human language text. This step eliminates valueless text, such as words like "a," "the," "of," and "uhm." The next step is to "stem," which means to remove verb tenses, pluralities, and misused grammar. For the sake of analysis, engineers often only need the root word; stemming allows them to ignore verb tenses, for example, and focus instead on the broader concept or idea.
The last step is tokenization, during which an algorithm is applied to break up long prose into individual words. This allows the engineers to convert the text data into a numeric representation and then apply statistical techniques or machine learning algorithms.
DeLand, Seth. (2018). “The Growing Trend of Text Analytics”. Retrieved from https://www.eeweb.com/profile/sethdeland/articles/the-growing-trend-of-text-analytics.