“Our Master’s programme is in great demand”
Mathematics is the basis for all the new opportunities opened up by digitalisation. Statistics professor Nicolai Meinshausen discusses the increasing importance of his field and the challenges it faces.
Artificial intelligence and machine learning – these are two of today’s hot topics. Mathematicians have focused on them for a while though, haven’t they?
Nicolai Meinshausen: Fundamentally, yes. But it always depends on how you define the terms. “Machine learning” as such has actually been around for decades. It’s just that today the applications are completely different, because computing power has increased and the volumes of qualitative and quantitative data that we process have reached a whole new dimension.
Where are the potential new applications?
One example is image recognition – an area in which humans are very good and have long outperformed computers. Things that computers were very poor at for a long time – for example, recognising a pedestrian in an image or differentiating a car from a tree – are now possible thanks to greater processing power. The underlying questions, however, have been around for a long time.
As a statistician, what topics are you particularly interested in?
Applications that are not intended to replace people, but rather to open up new fields – such as data volumes from biological studies or climate models, for example. These can be understood only if people and computers work together.
And which aspects do you focus on here?
The link between machine learning and causality. With many questions, it’s about making predictions. What would be the impact on our health, for example, if nitrogen emissions were to change? Will I live longer if I drink more green tea? These are causal questions. Perhaps I discover that people who drink green tea live longer than people who don’t. That does not actually prove any causal connection, as it could be that the general lifestyle that goes with drinking green tea makes me live longer – but that the tea itself has no bearing on my life expectancy.
Because non-smokers tend to drink green tea, for example?
Yes, exactly. Coffee used to have a very bad reputation because we did not allow for the fact that smokers tend to drink more coffee. Today, studies that remove these factors show that coffee tends to have a positive influence on our health.
Do you work with scientists from other disciplines on questions like these?
That’s what’s so exciting here. For example, I’m currently working with physicists on a climate change project, in which we aim to determine to what extent events and changes are caused by human behaviour. In earlier projects, we worked with biologists to investigate the interaction between genetic networks, and with astronomers to research the outer solar system. That’s how I get an insight into many fields of application.
How does the collaboration work in practice? Do the researchers come to you with a specific question and set of data?
It really depends. My own projects tend to develop out of long-standing personal connections. Many people also use our consulting service. Sometimes they have basic questions, other times highly complex ones. This can lead to collaborations, some of which develop over a long period of time.
Who can use the consulting service?
In principle, anyone. The service is free for ETH and UZH members; external people have to pay for it. Sometimes companies and institutions send us requests. For example, FIFA wanted to know how to analyse betting odds to find out whether a match has been fixed.
And who handles requests like these?
We have a fixed team of two people who have just completed their Master’s and two senior scientists. Some requests can be answered very quickly, while others develop into student projects, semester papers or Master’s theses.
How far do statisticians have to dig into a topic to be able to answer a question?
Even if you can provide an initial answer very quickly, it’s often worth digging a little deeper into the topic. In biology in particular, questions often seem straightforward, but they get more and more complex the closer you study them. It’s easier for me in physics as I come from this field.
Today, we collect increasing volumes of data. Does this mean that you also receive more requests for analyses?
We have noticed a trend towards more and more points of reference. But today almost all branches of science use data-based and statistical methods, which has increased knowledge considerably. Many scientists are very capable of analysing their data by themselves.
Which brings us to the subject of education. What courses are on offer in the field?
We offer primarily Master’s level courses, such as lectures on causality or new methods in multivariate statistics. A variety of students attend these lectures – not only mathematicians, but also undergraduate and doctoral students in biology, chemistry and physics, who realise they need statistical expertise for their research. And we’re also part of the new Master in Data Science that started this autumn.
Has statistics become a more popular subject?
Our Master’s programme is very popular, both with mathematics students and undergraduates from other disciplines, such as biology, for example. We thought that the new Master in Data Science would reduce the number of applicants for the existing Master in Statistics, but the opposite is true – the number of applications has actually increased.
And how well do you think the general public understand statistics? It’s also possible to use statistics to mislead people...
The term “statistics” is sometimes misunderstood to mean the mere collection of data, whereas for us it’s usually about making solid predictions; i.e. answering the question “What would happen if...?”. When we talk about using statistics to mislead people, it’s important to differentiate between intentional and unintentional deception. There are questions that deal with complex relationships, where the data supports not just a single right answer, but many different viewpoints.
For example?
Is the admissions process at universities fair? Are women disadvantaged? Are the processes for credit approvals fair? Are certain demographic groups disadvantaged? These are all sensitive issues. Depending on the viewpoint, it’s possible to support different statements with the same data, since in principle various questions are answered. This complexity is interesting, as I have to find out which question I want to answer. However, often everything is then summarised under one broad headline.
But there are also people who use statistics for their own ends.
Of course, statistics are also used to defend positions. It’s difficult for somebody without a statistics background to judge the significance of statistical data, particularly in terms of causal relationships. We see examples in the papers every day of questions answered with data from which no meaningful conclusions can actually be drawn.
Because it’s not significant?
Yes, it could be because not enough people were surveyed. Or it could be that the way in which the data was collected has distorted it. So the method leads to an incorrect result, no matter how many people are asked.
Can you give an example?
Many questions relate to health; for example, whether certain foods are good or bad for our health. Or let’s take the question: is it healthier to live in the city or the country? It’s not possible to answer this by simply comparing the health of people in the city and the country. It’s also difficult to gauge the influence of education on subsequent career success, as many different factors are at work here. Or the influence of immigration on the salary levels of the native population. We come across countless examples every day.
So what would be the correct method?
The gold standard is a randomised study, as used in drug trials. But such studies are not always possible: you can’t force people to expose themselves to polluted air or drink more coffee for years on end. We are currently working on methods that will enable us to use data to answer causal questions without randomised studies. It’s difficult, but we’re making progress.
About Nicolai Meinshausen
Nicolai Meinshausen has been Professor of Statistics at ETH Zurich since 2013, where he runs the Seminar for Statistics. His research focuses on causality, high-dimensional data and machine learning. In 2016, the Committee of Presidents of Statistical Societies awarded him the COPSS Presidents’ Award, which, along with the International Prize in Statistics, is the highest honour in the field of statistics.
Data in the spotlight
Data is playing an increasingly important role in our society, and is an issue on which ETH Zurich will focus more closely in the coming years. In a series of interviews, ETH News asks researchers at ETH Zurich about the specific topics they are focussed on, and how they see societal development in their field.
Previous interviews in this series:
- Lino Guzzella:“We have to seize this opportunity” (ETH-News 20.06.2017)
- Srdjan Capkun:“It’s always a compromise” (ETH-News 19.07.2017)
- Joachim Buhmann“Medicine is becoming model-driven” (ETH-News 28.08.2017)
- Roger Wattenhofer“Blockchain has been hyped up” (ETH-News 29.09.2017)