The idea of an organized, digital search has come a long way since the Dewey Decimal System and your local library. Now, any search performed, such as one entered into Google or on any website, takes into account hundreds of data points (at least) in order to provide the absolute best results possible. Boosting and improving website search results is a billion dollar industry for good reason, and many companies thrive off of providing these services. Enterprise search, while similar in nature to search engines, as it is used to catalog and provide requested results, has a different means of coming up with these results. Search parameters for internal searches within an organization are often determined by the Administrators (Marketing departments is most common, these days) and IT department, but recent developments within computer learning and AI makes it possible for the computer itself to perform auto-classifications within enterprise search. Auto-classification has the potential to revolutionize the way users within a network search for data. However, understanding the role machine learning plays in enterprise search is necessary before implementing any changes.
Data classification is something that has plagued networks since computers have been used to search for content and files. Outside of knowing exact file names and locations, the need to classify information has proven both desirable and challenging at the same time. Up until recently, a file needed manual classification by having a person enter the metadata. Manually entering information about every file takes time and requires someone knowledgeable enough to perform the work. It also completely depends on the individual providing the information. The tags or categories one user might assign to a file or data record may vary completely from a second user, so manually classifying files has never been an exact science. It is also a time consuming exercise and becomes a full time job for even a small business, let alone one the size of an enterprise. This is where auto-classification comes in.
Auto-classification can happen 2 different ways. One method is to have a separate application to crawl the files and database records, similar to that of a search engine. Another method is to have it as integrated to the search engine as part of the crawl. Once crawled, the application evaluates the contents of the file or database content and associates the information to existing tags or creates new tags and classifications regarding the content. In this way, metadata associated to the original data is created allowing for new ways of filtering and searching. Initially, the tags may prove generic and rather basic, depending on how the evaluation process is performed as well as the quality of the content itself. As users within a network begin to interact with files and other content, and choose certain data from enterprise search results, the application can learn from these selections. This is because auto-classification takes advantage of modern computerized and machine learning techniques (machine learning is also a portion of a larger concept called artificial intelligence or AI).
Machine learning makes auto-classification superior to any possible manual input. With the ability to learn from performed searches, the enterprise search application not only can provide better tags for all users, but the software can (and will) learn the characteristics of individual users, altering search results specific for the end user through the use of personalization functionality (Opentext, 2017).
The benefits of auto-classification are both straight forward and tangible. With auto-classification, a user or admin no longer needs to take time out of the day to input tags and classify content. This in turn boosts productivity and, at the same time, provides a more unified and consistent tagging and classification process. It makes identifying files and data throughout the enterprise network easier, so the classification of one file in Tokyo is done in a similar manor to that in New York or Santiago.
After auto classification up and running, the additional data from the auto classification allows administrators to implement filtering and faceting functionality on the search results. These filtering and faceting features allow the user to fine-tune and minimize the data in the search results, making it easier to find the information that they seek. In addition, including filters and facets for an enterprise search solution provides a more comprehensive and appealing customer experience by giving the visitor more control over the process.
Auto classification also has the capability of slashing enterprise search time by learning how a user performs searches and what kind of end search results they find most beneficial. In the same way, using machine learning to track and discover what each individual is doing in the system and personalizing the search results specifically for that user is a huge benefit in using auto classification.
Despite all the apparent benefits of auto-classification in enterprise search, there is a potentially devastating downside to the entire process. Machine learning and artificial intelligence remain in its infancy. Everything from Google's search to digital assistants like Amazon's Alexa all depend on this technology, learning on the fly and growing smarter by the moment. This sounds great when there is enough data available to make smart decisions, but in the early days of turning the system on, this may not be the case and the ability to provide meaningful classification may be limited until more data is indexed and evaluated and the additional machine learning can be processed. Because of this, the need would arise for an administration system that allows a person to visualize and evaluate what the machine learning is suggesting before it actually gets put into "production". Providing as an intermediate step and approving the classification that is suggested by machine learning is likely going to be necessary until the system is intelligent enough and can be trusted to make its own decisions.
Overall, machine learning and auto classification may seem like a benefit, but there could come a time when an administrator may worry about a point in which a system becomes "too smart." Enterprise search and auto classification is not likely to be an area where doomsday scenarios are going to occur, but it makes sense to evaluate this anyway. Technological professionals are not yet predicting the events of Terminator or other apocalyptic events where computers take over the world. Elon Musk remains at the forefront of modern technology. From the Tesla lineup of vehicles to building spacecrafts and investing in green technology, Mr. Musk has an in-depth understanding of computer technology. This includes artificial intelligence and computerized learning. In early July, 2017, Elon Musk stated Mark Zuckerberg, the creator and CEO of Facebook had a very limited understanding of AI and what it could potentially do (Tech Crunch, 2017).
Musk proved to be more right than anyone would have guessed (especially so soon). Less than a week after calling out Mark Zuckerberg, Facebook had to shut down an AI bot system it had put into place because the bots had created a unique language and stopped responding to prompts from Facebook. While the episode did not harm anyone, it shows the potentially devastating impact artificial intelligence can have, if not properly kept in check (Gadgets, 2017).
There's no denying how beneficial auto classification is to enterprise search. To provide the greatest benefit, auto-classification must rely on machine learning to tag and classify information as it crawls. Since this classification will be imprecise to start, a logic step is to add an administration process with the appropriate user experience to allow a person to monitor and approve and assist the automated process until the system can be trusted to categorize the content correctly on its own. In addition, this could potentially be used to control the system later if the AI gets out of hand.
In the long run however, as the recent Facebook incident has shown, it is vital to establish some sort of protocol to shut down the system, just in case the AI system progresses beyond the point of control. While all of this may sound entirely science fiction, it is quickly transforming into a reality. So for any enterprise looking into implementing this kind of technology into its enterprise search application, setting up necessary safeguards to take it off-line is a must.
Implementing any kind of change into an enterprise network often takes a considerable amount of time. With that said, manually crawling, entering information, and tagging every single data file and piece of content within a network simply is not a feasible, nor a very smart, use of time or resource. Auto-classification has the ability to improve search results and the customer experience at the same time partially through search personalization, which in turn cuts search time and boosts employee productivity across the board. As the benefits of auto-classification greatly outweighs that of manual input, the organization must install safeguards and auto shut-off capabilities to ensure IT always, now and in the future, has complete control of the network.