In my last post I shared an auto-classification project that we undertook recently. Even information professionals need a helping hand to appraise and classify the growing document stockpile. Not surprising then that most organisations we work with have reached the same point where auto-classification is the only viable solution to appraise, capture and classify their information holdings in accordance with the business and governance requirements.
What do we mean by auto-classification?
Auto-classification is the process where documents are classified (tagged with metadata) by machine. So if you read this Webopedia definition, you could easily think that the machine can make informed decisions about classification just by reading the document. In reality, the classification is dependent on the knowledge that you build into the auto-classification engine.
Now classification which supports information governance is significantly different to classification for search. There is more complexity because there are more aspects to consider.
- Certainly users want to be able to capture and classify their documents so they can find, use and share them.
- In addition information professionals have to classify with metadata that governs access, data protection (ie GDPR), retention and disposal, all in accordance with contemporary standards and legislation.
- ICT professionals want to be able to manage information infrastructure more effectively.
- And the C suite wants everyone to be more efficient at information management, spending less money on consultants, and more time delivering products and services. But at the same time they don’t want exposure the business to unnecessary risks through poor governance practices.
Is it possible to achieve all this through auto-classification?
Yes it is , but we need to develop fit for purpose machine readable data models, such as ontologies, that convey the requisite knowledge into the auto-classification platform.
What are ontologies?
Ontologies are linked data models for describing a domain, that list the types of objects and their instances, the relationships that connect them, and the constraints on the ways in which objects and relationships can be combined.
The term dictionary is used to refer to an electronic vocabulary or lexicon as used for example in spelling checkers. If dictionaries are arranged in a subtype-supertype hierarchy of concepts (or terms) then it is called a taxonomy. If it also contains other relations between the concepts, then it is called an ontology. Source Wikipedia.
Unlike File Plans, ontologies enable us to combine multiple concepts and define multiple types of relationships. In an ontology we can accommodate several taxonomies within the same scheme so you can address the needs of all stakeholders. And because they are built for machine application scale is not an issue.
Ontologies hold the key to automated appraisal!
Ontologies enable the automation of records appraisal. If we extract the knowledge built into contemporary disposal authorities we can create data models which enable the auto-classifier to recognise significant concepts, then tag documents with the appropriate records class ID. And this is what we did for the Synercon autoclassification project.
It’s all about the linked data model and given that we incorporated the BCS and retention schedule into our ontology, we were able to able to additionally tag with recordkeeping metadata.
To reiterate, we could not achieve the same outcome with the current file plan model.
Are ontologies difficult to build?
Not in my experience. In fact I find ontologies far easier to build than file plans because the logic is more explicit.
You start by defining your data model (ie what metadata you want to tag with), and the relationships between your concepts. Then slice and dice your existing controls, your metadata libraries, business classification schemes and disposal authorities.
Lastly you need an environment where you can utilise the outputs of auto-classification. We’ve loaded the ontology into the SharePoint Term Store and are using all of the SharePoint functionality for exploiting managed metadata – creating views, setting up filters, refining searches and redirecting documents using workflows.
Of course you need purpose built tools. We used our own a.k.a.® software for building the ontology and the DiscoveryOne auto-classification platform to apply the ontology and tag the documents in our SharePoint library.
Where to get training on building ontologies?
Synercon are offering a new training course: Metadata, Taxonomy and Ontology Design for Automating Information Governance.
We’ve created this workshop to share the knowledge of how to develop your metadata and classification schemes into ontologies for auto-classification and auto-appraisal.
We’ve started scheduling the workshop for Australia, New Zealand and the United Kingdom this year. Don’t hesitate to contact us if you would like to host or attend one of our training courses