Client
Asset Institute
Area of Expertise
- Software Engineering and Development
- Asset Management
The Challenge
Investigate how structured and unstructured data is managed across numerous disparate enterprises including, but not limited to AGL, Macquarie Generation, Sun Water, Queensland Railway and Royal Australian Navy (RAN). Investigate current unstructured incident reports to understand data and information requirements. Utilities companies commonly have a long history of operations and have a large portfolio of physical assets.
Unlike other types of industry, utilities are strictly regulated and are Australian critical infrastructure. Optimized information management covering structured and unstructured data serves multiple purposes:
- Improve efficiency and safety of current operations.
- Improve asset maintenance to increase asset life-span.
- Reduce risks of breakdowns and OHS risks.
- Compliance to external auditing.
The project is a big data analytics solution: linking incident reports, engineering drawings and other unstructured documents to asset registers and work orders for effective information retrieval
- >10 years incident reports collected.
- The report template MS WORD was last changed in 2003.
- Various file formats including .RTF, .PDF, .DOC and .DOCX.
- Stored on Microsoft Exchange server and moved to DocumentManagement System, consisting of text, drawings, images, photos, tables (structured and unstructured text).
Our Approach
The project required a bespoke software tool to automatically parse documents to provide searchable content. Research technology choices: solution performed .PDF to Text, WORD to Text and OCR conversion. Open source solution e.g. Apache parsers and commercial solutions were evaluated against cost-effectiveness and infrastructure requirements.
Programming skill and Web techniques: an ASP.Net program with REST services was created and scheduled to run daily to extract new reports from email attachments using MS Exchange server API’s into non-relational databases (MongoDB) and .XML export files. Industry standard REST web service document auto tagging results are called directly from the corporate search engine to enable a global search across both structured and unstructured data sources.
Scalable analysis system with high data volume and traffic: the system is highly capable and can perform document parsing and advance text analytics (machine learning-based document text to asset register determination) with >10 years historical documents, current and future documents, capable of responding to high-level search engine.
Knowledge of Internet: to work within the corporate network, the final web-service solution was designed to work with the existing secure environment (Linux server, https data communications, SSL tokens, etc).
The Solution
- Bridging the gap between structured and unstructured data sources.
- Automated intervention to improve learning outcome
- Artificial Intelligence (AI) matching algorithm to enable data linkage across
- Numerous data analytics methods and solutions developed have been adopted by the participating organisations.
- Enabler for improved enterprise information management and predictive maintenance planning
- Proposed Information Management methodology for Engineering Asset Management has been adopted by the participating organisations