A Deep Learning Approach to Industry Classification
Prof. Xiao Fang
Professor of MIS and JPMorgan Chase Senior Fellow
Lerner College of Business & Economics, Institute for Financial Services Analytics
University of Delaware
Industry classification systems (ICSs), which identify economically related firms as peer firms, play a fundamental role in business research and practice. Traditional expert-driven approaches manually design ICSs and thus have limitations, including high maintenance costs and coarse granularity of the identified firm relatedness. To circumvent these limitations, recent research takes an algorithm-driven approach, employing a bag-of-words method to represent firms’ 10-K reports and leveraging these representations for identifying economically related firms. While firms’ 10-K reports are highly informative for identifying economically related firms, the bag-of-words method is inadequate for representing these documents, as it ignores the rich semantic information encoded in word contexts and order, resulting in a less effective ICS. Recent developments in deep-learning-based document embedding provide powerful tools for document representation. However, existing document embedding models (DEMs) are not well suited to capture the rich semantics of 10-K reports due to their challenging nature: they are long documents featuring heterogeneous and shifting concepts. We propose a novel DEM to address these challenges; it solves them through an innovative design of an adaptive gating mechanism and its associated gating function. In addition, we develop a new ICS that takes firms’ 10-K reports as input, employs the proposed DEM to represent the semantics of these reports, and identifies economically related firms based on similarities between their 10-K representations. We demonstrate through extensive empirical evaluations that our proposed ICS is superior to representative existing ICSs as well as ICSs constructed using state-of-the-art DEMs. This study contributes to business research and practice with a novel ICS that can effectively identify economically related firms. It also contributes to the field of deep-learning-based document embedding with an innovative DEM that can capture the semantics of a broad variety of long documents with shifting concepts, such as 10-K reports, legal documents, and patent documents.