Large language models (LLMs) such as ChatGPT and Google Gemini excel at being trained on large data-sets to generate informative responses to prompts. , an assistant professor of accounting at the at 亚洲AV, and , associate professor and area chair of accounting at Costello, are actively exploring how individual investors can use LLMs to glean market insights from the dizzying array of available data about companies.
Their new , co-authored with Jennifer Wu Tucker of the University of Florida and Chi Wan of University of Massachusetts Boston, examines AI鈥檚 ability to identify 鈥減eer firms,鈥 or product market competitors in an industry.
Cao explains the significance of selecting peers by relating this process to the real-estate market. 鈥淭he capital market is similar to the real-estate market in that a firm鈥檚 value is partially determined by the value of its peers. In the real-estate market, we price a home based on the value of comparable properties in the neighborhood, or the so-called 'comps.' In our paper, we aim to leverage the power of LLMs to identify comps for evaluating firm value.鈥
This task is at least as difficult as it is essential. It takes much time, skill and effort to gather, aggregate and manage data to select peers. However, the researchers reasoned that LLMs could do a lot of the heavy lifting of data aggregation and analysis for the individual investors, and produce a list of peers comparable in validity to that identified by human experts.听
鈥淭he advantage is in the capability to utilize all the information potentially out there so that it is at least performing as well as other traditional methods that can help us investors and researchers,鈥 says Cao.
For the study, Chen and Cao employed Bard from Google, now known as 鈥淕emini,鈥 as their LLM of choice because 鈥淏ard has a greater ability to utilize its pre-training data, which is arguably larger than ChatGPT鈥檚 and with more parameters,鈥 says Cao.听
After defining 鈥減roduct market competition鈥 and forming a prompt for Bard, the researchers instructed Bard to limit its knowledge pool to a specific year within the period 1981-2023, in order to avoid 鈥渓ook-ahead bias,鈥 i.e., future information scrambling the results.
They limited focal firms to large, publicly listed companies as there is less data out there for smaller or private firms. In all, the data-set comprised over 300,000 focal firm-years.听
On average, the LLM could generate about seven peer firms for a focal firm, a number that is similar to the SEC recommendations on how firms should disclose their segments.听
The researchers then compared the LLM鈥檚 performance to the lists generated by three human experts for a set of 40 leading computer software companies. The average overlap was a little over 40 percent, greater than expected. 听
They also compared the AI-identified peer lists to two alternative systems for identifying peers: the federal government鈥檚 Standard Industrial Classification (SIC) codes and Text-based Network Industry Classification (TNIC), which compares firms based on linguistic similarities in their 10-K filings. The LLM鈥檚 output overlapped significantly with TNIC鈥檚. Plus, the peers identified by the LLM were generally a better fit than those from SIC and TNIC, as their monthly stock returns hewed closer to the focal firm.
But TNIC outperformed the LLM in identifying peers for mid-sized firms within the sample, indicating that it is not a clear-cut case of universal LLM superiority.
听鈥淲e need to understand that LLMs are actually a very powerful, new tool, unmatched in their efficiency, ability to process vast amounts of information at a low cost, and accessibility to the general public,鈥 Cao notes.听
鈥淚t鈥檚 especially beneficial for individual investors鈥攁s all the cost concerns that we鈥檙e talking about are especially relevant for them,鈥 Chen adds.
Regarding the future of LLM, Chen states, 鈥淭here are always costs and benefits associated with using generative AI. It is uncertain whether current systems will soon be obsolete.鈥 When asked about the SEC adopting an AI tool for investors, Chen emphasizes that users need to understand the pros and cons of using AI to make their informed judgments 鈥渂ecause AI cannot be held responsible for the information it provides or for how it is utilized.鈥澨
Chen concludes, 鈥淲e need to embrace this new technology, but we must recognize that it is not yet in a perfect state. Competition to improve the technology is fierce. Our findings might just represent the lower bound of the effectiveness of the technology.鈥
听