Pattern Recognition and Anomaly Detection in Bookkeeping Data
Professor Pierre Liang
Professor of Accounting
Tepper School of Business
Carnegie Mellon University
We introduce the Minimum Description Length (MDL) principle in performing tasks of pattern recognition and anomaly detection in bookkeeping data. The MDL principle underlies many machine learning applications in practice, especially in unsupervised settings. We report and summarize recently developed MDL-based computational techniques specifically designed for large volumes of transaction-level data with features such as account networks or graphs, created by double-entry bookkeeping, and the inherent account classification (assets vs. liabilities or operating vs. financing accounts). Applied to journal entry data from four different companies over an entire calendar year, these techniques are shown to be effective in recognizing patterns in the data and (thus) in spotting anomalies, as evidenced by successful case studies and recalling injected anomalies including those created by audit practitioners. It is shown that our MDL-based graph-mining solutions highlight the importance of the double-entry-bookkeeping system and an economics-based account classification. In turn, these bookkeeping features make the MDL-based and graph-mining tools valuable when working with bookkeeping data.