President's Column

Let's talk about stats' identity crisis

Warren Brettenny

31 August 2022

Photo by Angela Roma

In my position as SASA President, as well as a head of a statistics department, I have been fortunate to engage with colleagues, peers and persons from varying different backgrounds and professional settings. With these interactions, more and more I am confronted with the apparent lack of understanding of what statistics is and, more importantly, where it fits in. I am troubled to note that statistics - for lack of a more unique phrase - may be in the midst of an identity crisis.

In days gone by, owing largely to the dearth of sufficient data and computational resources, statisticians were often found in academic institutions with focus largely placed on the development and expansion of theory and with some consultation work on the side. Statistics practitioners were employed in some progressive business settings who saw the value in data. Even when I came through the university system around the turn of the century, statistics was largely seen (at least from a student’s perspective) as a steppingstone to a supposedly more lucrative careers as an actuary. In the interim and to grow the understanding of statistics, statistics topics were added to the school curriculum. While the intention was good, the execution may have isolated and side-lined statistics as a profession even more than it already was. In this time, my experience was that the attitude of school leavers towards statistics went from the general opinion of “What is statistics?” to “I hate statistics.” It was also at this time, after the dot-com bubble in the 90’s and at the advent of social media platforms in the mid 2000’s that data collection began to proliferate in the online space. Increased data collection necessarily called for two main skill-sets - the ability to store and manage large databases and the ability to make sense of the data collected. Enter the data scientist. The need for more data focused graduates grew while the attitude towards the field diminished. While this has largely been remedied by the global push towards data science as an appealing job opportunity, the role of statistics within this field is often debated.

So while it should be opened-and-shut, the argument of where the data scientist fits in is still yet to be fully resolved. In some instances, it is claimed by computer science, in other it is claimed by business sciences and also (and perhaps most appropriately) it is claimed by statistics. A simple Google search of “what is statistics?” (try it … ) gets the rather unambiguous result that statistics is the “science of collecting and analysing numerical data.” The science of data. Hmmm… where have I heard that before?

While I’d like to rest smugly on my revelation and evidence-based argument that statistics is - in and of itself - data science, it is just not that easy. Therein lies our identity crisis. For many years statistics, at least at a university level, has been caught up in the teaching of long theorems and mathematical complexities. Hence the mathematical statistics designation. While this is - without doubt - essential knowledge for any statistician, there remains a growing need for advanced skills in data analytics and applied statistics (data science). The statistics community, therefore, needs to stake its claim in the data science field. While there are indeed roles for the computer scientists on the data processing and warehousing side of data science and for business on the implementation and roll-out side, the core of the process belongs to statistics. This is an area which is most at threat from other disciplines. Anyone with an internet connection, R and YouTube these days is a self-proclaimed data science expert. Such a designation should be reserved for those who are properly trained in statistics and statistical programming. There was a quote which was mentioned by a delegate in an open session that I attended at the JSM 2022 conference in Washington, D.C. earlier this month. The gist of it was that a data scientist is a “person who is better at statistics than a programmer and better at programming than a statistician”. I argue that this description is simply of a modern applied statistician where high level skills in programming and statistics are mandatory. Applied statistics is a focus area which has often been overlooked in academic settings - in favour of its mathematical statistics counterpart. Could this have created a data-science shaped hole in the statistics offering? A hole that other disciplines are trying to fill? Whatever the cause, there is a need to acknowledge that data science is central to statistics, and vice-versa. But so is analytics, and mathematical statistics.

So, as far as an identity goes, statistics does not have one.

It has many - a so-called multiple-identity discipline. However, it is not as neat as mathematics where there are clearly defined and established pure mathematics and applied mathematics streams. We are now in a position that we must take a leading role in forging the direction of three statistics “identities” - mathematical statistics (theoretical), analytics, and data science. It is with this in mind that SASA intends to provide guidance and structure to these interest areas in the higher education sector through the efforts of our executive committee, special interest groups and members. At present, the 2023 president-elect Prof Inger Fabris-Rotelli and I, are continuing to meet with various stakeholders to get a understanding of what is offered in our academic settings and what are the areas that require guidance. Once completed and reviewed by our members, these guidelines will be posted on the SASA website as our official stance on these matters and as a point of reference for the broader community on all matters related to our “tri”-identity. This will hopefully serve to unify our community in our stance and guide the way forward in the short and medium term.

Thank you to all of our members for your continued support - we are truly grateful.