463 exabytes of data projected to be created daily by 2025


The English language is always evolving, and which words enter our collective vocabulary is usually something of a popularity contest. Catchy acronyms spread like wildfire via social media. Dictionaries offer their choice for word of the year. The rest of us debate whether or not we want to propagate these words or hold the line.

Occasionally, something will come along and force us to update our vocabulary. Data is a prime example. Most people are familiar with gigabytes and terabytes, the units used to measure typical hard drives and memory cards.

Larger units are less familiar, but with almost 100 million photos and videos posted on Instagram every day, we have begun to talk about petabytes and beyond. According to the Center for Advanced Computing Research at the California Institute of Technology (Caltech), a mere two petabytes equals the contents of all academic research libraries in the United States.1 Statista estimated that wearable devices alone were expected to generate 28 petabytes by year-end 2020.2

Caltech also estimates that five exabytes account for all of the words ever spoken by humans, yet 463 exabytes of data are projected to be created every day by 2025.3 All accumulated digital data had already totaled an astounding 4.4 zettabytes in 2013.  As data grows, so does the nomenclature around sizes of data from bytes to yottabytesThis is estimated to grow tenfold by year-end 2020. It won’t be long before we are talking with some regularity about yottabytes, representing numbers comparable to the total number of grains of sand on earth or stars in the entire universe.

Leaving aside that fact that an almost unimaginable amount of data is being created every day, it would take many millions of years just to download all existing data. Given this bounty, it is surprising how little data is actually put to work. IDC estimated in 2012 that only 0.5% of all data was being analyzed, and there is little indication that things have changed significantly since then.4

This startlingly small number underscores the size of the opportunity for data-savvy companies. It also starkly highlights the problems inherent in trying to derive useful insights from the vast oceans of data being created. The challenge becomes even more daunting when one considers the fact that growing data velocity means it is increasingly critical to use data before it becomes obsolete. Privacy concerns also continue to mount, causing policies to be re-examined, cybercriminals to sharpen their attacks, and regulators to expand their remits.

As the global data ecosystem evolves and becomes more complex, there is a veritable flood of new players offering to help companies and individuals navigate an often confusing environment rife with opportunities and challenges. In the middle sits Google. With a corporate mission “to organize the world’s information and make it universally accessible and useful,” it only made sense to make Google the namesake of our initial investigation into the data and analytics revolution when first writing The Upside of Disruption in 2016.

This highlights the problems inherent in trying to derive useful insights from the vast oceans of data being created.

Given the rate of change in the technology sector, it is noteworthy that Google is still our first choice for this role, four years after our initial look at the “Googlization” of financial services. More than just a placeholder for the idea of big data, Google plays the role of a reliable means of deriving utilitarian knowledge from data. It is emblematic of data abundance and our strides in using that data effectively.

Google’s steady presence is especially important because the universe of data in which we operate is morphing as much as it is growing. For example, there has not been an appreciable uptick in daily Twitter activity since The Upside of Disruption was first published. Five hundred million tweets per day remains an awe-inspiring number, but the lack of growth serves as a powerful reminder that data sources are not constant. Tastes change, apps proliferate and organizations come and go as bandwidth continues to grow and spread.


The world is awash in data, but the fact that so little of it is analyzed means much of the data ecosystem remains terra incognita. Just as incomplete maps enticed intrepid explorers over the centuries, this is proving to be an irresistible lure to entrepreneurs.

How to equip businesses with the tools to derive tangible benefits from data has been one area of focus. Advanced analytics now go far beyond the simple metrics tracked by many businesses in the past. In fact, massive data volume and velocity means many analytics would not even be possible were they not automated. Even now, analysts the world over are toiling over spreadsheets as they scrub static data sets and attempt to make sense of information that may already be obsolete. Overhauling this model is a priority, and a growing number of firms are addressing the problem with automated analysis of streaming data.

Automated analysis will be enabled by machine learning. Many organizations are currently in the midst of trying to transition their analytics from descriptive (rear-facing) to predictive (forward-looking). Predictive insights enabled by machine learning could prove to be transformative for many firms, leading to better decisions across a range of functions. Financial firms are no different. Endor, a pioneer in predictive analytics platforms, says, “Deep intelligent insights are required to examine the behavioral attributes that drive consumer decisions. This is where predictive analytics come into play.”5 Furthermore, machine learning produces a positive feedback loop. Artificial intelligence relies on massive amounts of data to learn. Positioning them in the middle of massive data streams means the machines get smarter and produce better results.

This trend may be discomfiting for some, who may feel as if they are losing control of their data. Despite valid privacy concerns, the concept of data ownership may ultimately prove illusory. Much as consumers prefer to stream media rather than own it, we are witnessing a growth in the use of Data as a Service (DaaS) models, developed to serve firms struggling to monetize data streams in relatively short time frames. Currently removing barriers within organizations, this approach will increasingly dismantle barriers between organizations as well.

Contributing to this development is the growing interest in data exhaust. Most data initiatives traditionally focused on data acquired externally and brought in-house to mine for useful insights. That is changing amid a growing awareness that there is value to be derived from the trail of information left behind by the digital activities of individuals and organizations, commonly known as data exhaust. As more powerful tools are adopted to organize and mine data effectively, data exhaust is increasingly viewed as a potential gold mine hidden in plain sight.

"Data Exhaust" — The trail of information left behind by the digital activities of individuals and organizations

For companies working to personalize their client interactions or tailor their messaging to prospects, data exhaust can be extremely helpful. It not only has the potential to reveal previously obscured patterns, but can simultaneously reduce the dependency on more intrusive and time-consuming forms of market research such as surveys. As real but otherwise old and unnecessary data, exhaust can also be used as a test bed for simulations used in formulating and refining risk management processes.6 More companies should also finally be able to economically capitalize on the promise of customization, integrating data mining into key aspects of their operations so they can tailor client experiences based on past interactions.

There are also ethical considerations. Profiting from data exhaust can easily be seen as an invasion of privacy, no matter how altruistic the intent or helpful the result. Data privacy issues exploded into the public consciousness over the past several years, fueled by concerns around fake news and election tampering. Widespread concern has sparked some regulatory action, especially in the U.K. and European Union, which is placing a premium on the protection of personal data. China’s social credit system, on the other hand, relies on data to establish reputational scores for its citizens, ostensibly to encourage trust. In most markets including the United States, data is regularly collected and resold with little or no input from those affected. This may be changing as more companies explore consent-driven monetization of data, but even this approach has critics who maintain that it violates fundamental rights.7

Privacy concerns aside, it is clear that data and analytics will continue to be viewed as key sources of competitive advantage for the foreseeable future. Big data is already creating many new jobs, including not only for analysts and scientists,in 2018 almost two out of three organizations had appointed a CDO but also for data management experts and architects. New types of talent are being recruited by investment firms, causing them to source personnel from new places and company cultures to evolve.

Organizational charts are changing from top to bottom. The need for effective data management means the Chief Data Officer (CDO) “will become indispensable for taking leadership of strategy development along with decisions on the client’s digital assets and its ethical use.” In a 2018 survey of 60 Fortune 1000 companies, almost two out of three said their organization had appointed a CDO, up dramatically from 12% in 2012. Almost all firms surveyed (97%) had invested in big data and artificial intelligence initiatives. Most investments total less than $50 million, but one out of eight firms are spending half a billion dollars or more.8

Rather than develop algorithms from the ground up, more organizations are saving big by purchasing algorithms.

Tweet This

Firms with deep pockets will be able to fund internal development of teams and AIs to leverage these trends. Other organizations will need to see expertise externally, but this does not necessarily relegate them to the sidelines. Outsourced data access, expertise, and analytics mean even firms with modest budgets can benefit as long as they have the vision. The ecosystem of data specialist firms is already exploding. “In the past, businesses spent time, money and resources to develop algorithms from the ground up so that they fulfill every intended purpose… The algorithm market is flourishing at an astounding pace as more organizations are becoming aware of how to save big by purchasing algorithms.”9


Modern finance was built on spreadsheets. It is not an exaggeration to say that Excel is one of the asset management industry’s cornerstones. Replacing such a fundamental piece would have been unimaginable only a few years ago, but it now appears inevitable. The volume and variety of data involved in so many workflows mean we have reached a tipping point where the stakes are too high and the benefits too great to ignore.

Resistance may ultimately be futile, but adoption within asset management has been uneven to date. Much of the industry is deeply conservative, reluctant to adopt new technologies that could jeopardize processes and relationships that are deeply ingrained and have long been successful. This viewpoint is not universal, of course. Driven to differentiate and freer to do so, hedge funds long ago took the lead in trying to derive insights from reams of data gathered from sometimes unorthodox places. Traditional asset managers are increasingly following their lead, cornered by a combination of low yields, margin pressure, the growth of indexing and competition from private markets.

Nor is this necessarily happening at the expense of existing employees. In fact, McKinsey & Company states that “the application of advanced analytics to specific business problems has started to deliver value for traditional asset managers—not by replacing humans but by enabling them to make better decisions quickly and consistently.”10

Asset managers are in a unique position at the nexus of multiple and massive data sets.

The ability to make more informed decisions could not come at a better time. Unprecedented demand for transparency, access, and customization is causing managers to investigate how they can best leverage data to optimize their client experience. An evolving product landscape is also driving change. “Differentiation has never been more important. With unprecedented numbers of managers competing for assets, value propositions are also being scrutinized as never before. Ironically, it has also never been so possible. Large swathes of the investment universe are becoming commoditized, but we also see growing interest in niche asset classes and illiquid ones including private debt and real assets. Overhauling data infrastructure and processes becomes essential when managers must be able to effortlessly plug into this increasingly complex and demanding ecosystem.”11

We noted in our 2018 paper, Digitizing the Client Experience, that “asset managers are in a unique position at the nexus of multiple (and massive) data sets. Client data is a key part of the equation, but it is joined by market, economic and competitive data… The data has always been available in one form or another, but integration has only become truly possible recently with the advent of platforms, aggregators and APIs. Data that has always been compartmentalized is now free, enabling everyone to serve clients more directly and efficiently.”12

This means that “transactions that once took two weeks might take two minutes. Communications are customized, targeted, on-demand and pain-free. Information is freely available. Service levels once available to only the wealthiest investors are increasingly accessible to those of more modest means. Virtually every touch point a manager has with its clients is positively affected.”13 The impact stretches far beyond research and trading. McKinsey & Co. points out that advanced analytics can improve business performance across the value chain, not only improving investment decisions, but improving distribution efforts and operational processes as well.14


Investing has always been data intensive, with analysts and portfolio managers sometimes going to extraordinary lengths to collect elusive data that might generate alpha. Data science is now equipping investment professionals with powerful new tools. This is widely seen as a positive development, but it also places the burden on those same professionals to acquire a “solid understanding of data science in addition to investing skills.”

Advanced analytics can improve performance across the value chain.

The sheer volume of data available to investors is changing everything. Hedge fund managers have long had an interest in alternative data sources, but the market is now swelling as managers face growing pressure from competitors and investors alike to produce superior returns. As it becomes increasingly difficult to consistently add value with traditional fundamental analysis, a growing number of long-only managers will likely follow in the footsteps of hedge funds in scouring the earth for new sources of alpha. Social media feeds and satellite imagery are just two of the many supersized data sets investment professionals are analyzing in greater numbers to reveal timely and valuable insights.

Alternative data sources have the potential to vastly expand the universe of data from which analysts and portfolio managers attempt to derive value. As more data sources become available, they are joined by an emerging ecosystem of vendors, platforms, brokers, aggregators, portals, consultants and conferences.

This flood of new data would be worthless if it was not accompanied by increasingly sophisticated analytical tools. The ability to better analyze unstructured data, for example, is proving to be a boon for forward-thinking managers, who can record all relevant conversations and subsequently feed them into a research platform enabled with natural language processing. Systemic bias can also be reduced, with the caveat that even AIs can exhibit the bias of their programmers.

Automation also plays a major role, enabling faster trading despite the vastly larger number of variables. Companies are also able to take a more holistic view of risk management: “Prescriptive analytics focuses on what the future holds and prescribes the course of action possible against risks... An integral part of prescriptive analytics will include simulation, optimization, decision analysis and game theory.”15

Software cannot currently do all of the things asked of investment professionals, and it may never be able to. Nevertheless, it will fundamentally change the economics of investing as roles and responsibilities adapt and sources of value shift within organizational charts. One industry observer noted that “future investment professionals will derive their alpha from analyzing and predicting the impact of human behavior as an overlay to an already-established array of highly analytical, flexible, AI-based investment frameworks.”16

As more data sources become available, they are joined by an emerging ecosystem of vendors, platforms, brokers, aggregators portals, consultants and conferences.

Nevertheless, more black box approaches are inevitable. How common they become is anybody’s guess at this point, but some firms are already allowing AIs to manage portfolios, in some cases making trades that are not entirely understood.17 “In this world, investment professionals will act more as guardians of investor interests, defining investment goals, optimizing decision-making algorithms and training AI to do most of the analytical heavy lifting.”18 This paradigm will have lasting consequences, as feedback loops seem inevitable with AIs perpetually searching for new sources of alpha.


Distribution is a key pain point for many firms, particularly those facing competition from index products. Traditional wholesaling teams are not finding as much traction as they did in the past, causing many firms to reinvent their entire approach to distribution and/or equip their distribution professionals with upgraded data tools. Client service and relationship management efforts are also being revamped with an emphasis on cross-selling and retention.

Most data initiatives focused on sales and marketing are still in their infancy. A study of 37 fund firms by Ignites found that less than one in four made any effort to calculate a return on investment (ROI) on data analytics in distribution.19 Nevertheless, the use of AI and predictive analytics show great promise in this area.

A growing datasphere across the asset management firm

datashpere showing research, distribution, investing and operations


Client engagement will almost certainly be transformed in coming years as managers rely on data to be more proactive and personalized in their approach. Clients currently segmented by relatively crude metrics like size or location will be increasingly viewed through a behavioral lens. Sales targeting will become more precise. Redemption risks will be highlighted earlier, as will cross-selling opportunities. Customized outreach will be possible in a way that was unthinkable until recently. Advanced analytics will also enable firms to better manage their sales process and distribution talent.


Driven by economic pressures to become more efficient, many firms are looking for ways to automate and improve their middle-and back-office processes. It is widely understood that better data management can lead to lower costs, but it can also improve productivity and enhance the client experience. Portfolio analytics, reporting, risk management and cybersecurity are among the functions that stand to benefit from greater data integration and the use of tools such as natural language processing.

Regulatory compliance is a prime example of a function that benefits from improved data management. Data that is captured cleanly and efficiently simplifies compliance and audits. It should be possible to pre-populate any form with previously captured information, much like one-click checkouts when shopping online. Well-designed workflow should also have the added benefit of consolidating software needs.


Data tools are often considered in tactical terms, but they also having a strategic impact. Senior executives are increasingly able to make decisions in the context of real-time market intelligence. Competitive dynamics can be assessed more carefully in the context of client trends. Board reporting is simplified. Perhaps most importantly in such a people-centric business, talent management is being transformed. The opportunities are rich and varied: Identifying trends, spurring growth, identifying sources of efficiency, improving client retention, gaining market share and providing a better customer experience.


Many vendors offering data solutions to asset managers generally fall into one of two broad categories: Data platforms and analytics providers. There is overlap, but both types of firms are focused on helping investment teams make better decisions through a more effective use of data. However, a third group has also emerged that is oriented toward business decisions and operational optimization. They can serve multiple industries and are sometimes designed to address very specific use cases. The following list is a sampling of all three types of firms and should be considered illustrative rather than comprehensive.


The number of data purveyors has exploded in recent years to now include scrappy startups alongside established industry titans. A good example of the latter is Dun & Bradstreet, which has been busy reinventing itself as a 21st-century data firm serving up trustworthy information and metrics via the cloud.

A company like D&B already handles hundreds of millions of data points, but many investors are interested in looking beyond this universe of largely traditional data. For alternative data, investment 14 professionals are turning to platforms such as Quandl (which was acquired by Nasdaq in 2018), which offers a wide variety of datasets and claims over 400,000 users.20 Even Bloomberg is getting in on the action, with its Enterprise Access Point Datasets now sharing data on metals inventory, equities blogger sentiment, drug approvals, consumer footfall and parking lot activity, construction permits, geopolitical risk and app utilization.

Venture-backed Thinknum relies on approximately 35 datasets to provide investors with daily updates on “most public and private companies.” BattleFin focuses on sourcing, organizing, evaluating and vetting alternative data. Far from being an arm’s-length consultant, the company holds popular conferences, offers an accelerator to qualified data firms and sponsors competitions aimed at highlighting data science skills. Demyst has a similar value proposition, offering users in financial services a fast and safe way to discover, evaluate and use data from what they claim is the world’s largest data marketplace, with access to hundreds of sources. Meta-markets are also emerging. It was announced in September 2019 that Demyst would be joining the Snowflake data exchange, an already massive market maker for data across industries.21


Claiming to reveal otherwise hidden relationships across a vast universe of structured and unstructured data sources, Yewno is an example of a firm bridging the gap between data and analytics. Unlike more generalist tools, it is marketed specifically as a way to identify investment opportunities. Dataminr focuses on speed, analyzing public data about events in real time to provide clients with rapid and actionable intelligence. Industry stalwart FactSet not only offers data spanning the continuum from fundamental to alternative, but also provides an array of analytical solutions to its loyal clients across the financial sector.


Making more informed business decisions requires better data management. Startups offer countless innovative ways to capture and process certain types of data, but it can take more established firms such as Accenture to effectively work around legacy cultures and systems to implement truly transformative data processing at an enterprise scale. Some newer firms like Qlik Technologies have also managed to grow by focusing on data management across multiple industry verticals: Qlik’s mission is to simplify the way people use data by making it a natural part of how they make decisions.

Smaller firms can be effective, particularly if they focus on adding value to particular areas. Quantexa is an analytics firm that works with financial companies toward specific objectives, namely leveraging big data to improve security and gather customer insights. Versive is even more focused, positioning itself as the leader in adversary detection.

Aided by AI, specialist data firms offer increasingly esoteric services. Automated Insights, for example, aids companies in developing content. Its Wordsmith product is claimed to be “the world’s first public natural language generation template engine,” allowing users to generate human-sounding content from data, making it easy to produce countless customized narratives in the time it would have taken to produce just one by traditional means.

Specialized approaches are more common, but a few firms choose to swing for the fences, imagining a world in which data trumps all. Founded in 2013, PeerNova attempts to solve some of “the most prevalent challenges in the financial industry” by enabling financial firms to perpetually synchronize their data across multiple internal and external systems. Its Cuneiform Platform is said to simplify reconciliation, automate exception processing, and provide end-to-end operational visibility across workflows in real time.


It was not so long ago that the only people concerned with big data were astronomers and climatologists. As the variety and volume of data balloons to almost unimaginable sizes, one would now be hard pressed to find a professional in any field who is not at least aware of the fact that they should have a data strategy. The opportunities are vast, and a rapidly expanding ecosystem of experts stands by to help devise and implement effective data initiatives. Understanding that “there is still some uncertainty around the extent and pace with which analytics will impact asset management,” McKinsey & Co takes the view that “superior analytics capabilities will be a key driver of success in the industry going forward.”22

One of the most exciting aspects of advanced data analytics is the fact that insights and benefits are not always predictable. Businesses pour money into data initiatives, but there is an inescapably speculative aspect to any new technology. This means some firms opt to take incremental steps, while others go “all in” with a more transformative approach. Some asset managers will inevitably adopt a wait-and-see attitude, but firms that enthusiastically embrace a data-centric strategy can expect to be rewarded with unanticipated competitive advantages.

download the pdf

Legal Note

The Investment Manager Services division is an internal business unit of SEI Investments Company. This information is provided for education purposes only and is not intended to provide legal or investment advice. SEI does not claim responsibility for the accuracy or reliability of the data provided. Information provided by SEI Global Services, Inc.

1 Roy Williams, Center for Advanced Computing Research at the California Institute of Technology.
2 Statista, April 2019.
3 International Data Corporation (IDC), April 2019.
4 IDC’s Digital Universe Study from 2012.
5 Endor Protocol, “How Predictive Analytics Will Transform the Finance Industry in 2019,” Medium, January 31, 2019.
6 Ryan Ayers, “How data exhaust can be leveraged to benefit your company,” Dataconomy, March 28, 2018.
7 Isabel Rubio, “What price would you put on your political ideology, your religious beliefs or your sexual preferences?”, El País, August 15, 2019.
8 Sixth annual Executives Survey, New Vantage, 2018.
9 Flatworld Solutions, “How will the big data landscape change by 2020?”
10 Sudeep Doshi, Ju-Hon Kwek and Joseph Lai, “Advanced analytics in asset management: Beyond the buzz,” McKinsey & Company, March 2019.
11 SEI, “Digitizing the Investor Experience,” 2018.
12 Ibid.
13 Ibid.
14 Sudeep Doshi, Ju-Hon Kwek and Joseph Lai, “Advanced analytics in asset management: Beyond the buzz,” McKinsey & Company, March 2019.
15 Flatworld Solutions, “How will the big data landscape change by 2020?”
16 Umed Saidov, CFA, “Data Science Will Transform the Investment Industry: Are You Prepared?”, Enterprising Investor, CFA Institute, February 8, 2018.
17 Adam Satariano and Nishant Jumar, “The Massive Hedge Fund Betting on AI,” Bloomberg Markets, October/November 2017.
18 Umed Saidov, CFA, “Data Science Will Transform the Investment Industry: Are You Prepared?”, Enterprising Investor, CFA Institute, February 8, 2018.
19 Ignites Research, CRM Forum.
20 quandl.com
21 Business Wire, “DemystData to Securely Deliver Real-Time Access to Thousands of Premium Datasets on Snowflake Data Exchange,” September 24, 2019.
22 Sudeep Doshi, Ju-Hon Kwek and Joseph Lai, “Advanced analytics in asset management: Beyond the buzz,” McKinsey & Company, March 2019.