| Article ID: | iaor20172809 |
| Volume: | 68 |
| Issue: | 4 |
| Start Page Number: | 777 |
| End Page Number: | 798 |
| Publication Date: | Aug 2017 |
| Journal: | Journal of Global Optimization |
| Authors: | Kuang Da, Du Rundong, Drake Barry |
| Keywords: | datamining, matrices, heuristics |
The importance of unsupervised clustering and topic modeling is well recognized with ever‐increasing volumes of text data available from numerous sources. Nonnegative matrix factorization (NMF) has proven to be a successful method for cluster and topic discovery in unlabeled data sets. In this paper, we propose a fast algorithm for computing NMF using a divide‐and‐conquer strategy, called DC‐NMF. Given an input matrix where the columns represent data items, we build a binary tree structure of the data items using a recently‐proposed efficient algorithm for computing rank‐2 NMF, and then gather information from the tree to initialize the rank‐