Pete Swabey is a senior editor at the Economist Intelligence Unit, specialising in technology. Before joining the EIU, Pete was editor of enterprise IT magazine and website Information Age. He studied psychology at university, and is interested in the convergence of the psychological sciences and information technology.
The pitfalls of data science
Shortcomings in the scientific community's use of statistics are coming to light. Do they apply to business analytics?
Modern businesses are awash with data. In most cases, they have more than they know what to do with. Sophisticated analytics technology, it is hoped by many, will allow businesses to use this data to pursue a more evidence-driven form of management.
However, as recently highlighted by The Economist, even the scientific community struggles to be entirely empirical. Scientific findings - including those that have been accepted as truth - often prove impossible to replicate, if anyone even bothers to try.
The reasons for this, The Economist reports, include the pressure to publish positive findings, the fact that peer reviewers rarely take the time to re-analyse data, and the sheer complexity of modern statistical methods.
"Some scientists use inappropriate techniques because those are the ones they feel comfortable with; others latch on to new ones without understanding their subtleties," according to the newspaper. "Some just rely on the methods built into their software, even if they don’t understand them.”
So if trained scientists struggle to apply statistical techniques appropriately, what hope have businesses of making meaningful sense of ‘big data’?
I asked Duncan Ross, head of data science at database vendor Teradata and a director of both the Society of Data Miners and of Data Kind, a charity that uses data analytics to help good causes, for his thoughts.
Mr Ross points out that businesses do not typically use the classical scientific method – hypothesis, prediction, testing and analysis – to find the answers they want. "Most business uses [of statistics] are actually data mining – largely non-parametric models where we're interested in outcomes, not theories," he says. "This is because the goal is action, not just understanding."
In other words, businesses aren't looking to build an explicit, universal theory of consumer behaviour, they just want to know how their customers are likely to act in the near future.
Therefore, Mr Ross says, it is not the complexity of statistical methods that business typically fall foul of in their analysis. "Where errors occur, there are usually other, more obvious, culprits than a failure to understand statistics: either the problem has been misstated, or the data used isn't relevant or correct."
Peer review by the market
Although apparently flawed, peer review is nevertheless an important mechanism for monitoring the quality of scientific research results. But few businesses will bother to have an external party validate their analyses. Is this a recipe for statistical disaster?
Actually, says Mr Ross, businesses face an even more direct form of feedback than peer review: the market. "If you take actions based on flawed models, you lose money," he says. "This provides a very strong incentive to get things right."
And what about the apparent bias of many scientific journal towards positive results? Is there not a danger that same thing might happen in businesses, with analysts only highlighting the findings they know executives want to hear?
There certainly is, Mr Ross says. That said, he believes that the likes of Bad Science author Ben Goldacre and celebrity statistician Nate Silver have raised the general awareness of and respect for statistical validity, even in the boardroom. "There is a growing realisation that when your analysis shows that what you are doing isn't best for the company, then you should stop doing it."
In all, Mr Ross accepts that there is room for improvement in the governance and oversight of data analytics in business, which is one reason why the Society of Data Miners was set up.
"But the question is, what is the alternative? Any business that thinks they'd be better off abandoning statistics in favour of gut feel is welcome to do so, but they would be leaving the field open for the company that decides to use data mining."
"I'm more than happy for them to try, but with a few conditions: firstly that the data is recorded so we can test (statistically) if it works, and secondly that they let me know in advance, so I can sell any stock I hold."