Canonical sectors and evolution of US stocks: an application of machine learning in Python

01:30 PM - 02:00 PM on August 16, 2014, Room 705


Audience level:


An unsupervised machine learning algorithm to exploit the underlying data structure in historical stock market returns shows promising classification results with implications for macroeconomic analysis and for creating financial indices.


A classification of companies into sectors of the economy is important for macroeconomic analysis and for investments into the sector-specific financial indices or exchange traded funds (ETFs). Major industrial classification systems and financial indices are developed essentially manually by relying on expert opinion and stock-picking. Here we show how a broad-level sector decomposition of the stocks can be made more objectively and comprehensively via unsupervised machine learning. An emergent low-dimensional structure in the space of historical stock-price returns makes it possible to automatically identify emergent “canonical sectors” in the market and to assign every stock a participation weight into each sector. Furthermore, by analyzing data from different periods at a time, we show how firms listed in the market have evolved in their decomposition into sectors.