Uncut Interview: Professor Tarik Gouhier on Statistics, Modeling, and Climate Change
NUScience’s Lucas Cohen sat down with Professor Tarik Gouhier to discuss his background and research as part of a feature for NUScience magazine issue 28. This unabbreviated transcript offers more detail from their conversation.
Before we move on to the broader topics, I’d like to ask some general questions about you and your research. Can you briefly explain to me what your field is and where you stand in it?
I’m generally interested in modeling community ecology — so I’m interested in basically using mathematical models to describe the distribution of species in space and time. So, you know, the number of species that you find in a given place and the relative abundance that you find at that particular location — and that’s a bit broad, that’s kind of ecology in general, but I’m also interested in understanding the mechanisms responsible for those patterns. So, basically, how do species end up being abundant at a given location and rare at another location. So that’s where my modeling approach is, and also statistical analyses of large-scale datasets.
And so, most of your research is out of the MSC.
Yes, most of my work focuses on marine ecosystems. I got my PhD at McGill, which is in Montreal, but I was already working quite a bit with datasets from the west coast of the U.S. because my advisor was collaborating with those folks, so I spent quite a bit of my PhD focusing on the distributions of species in space and time on the west coast — specifically focusing on intertidal species. These are organisms that spend most of their life in the water, but at low tide they become exposed to the air. That’s got all kinds of implications in terms of how they deal with that stress — they basically have two types of lives that they’re living, at the edge of two realms: the marine, and the terrestrial. They turned out to be really useful model systems to study ecology in general — and I’ve been studying that on the east coast since I arrived here in 2012.
How did your interest in this field come about in the first place — where’s the appeal for you?
So it all started with a bit of a nightmare situation. Back when I was going through the undergrad standard curriculum in biology, the first couple years were absolutely miserable for me — as in, I just couldn’t take it. The types of skills that were developed in students were just not the kinds of things that were appealing to me. It was just rote memorization of a bunch of different pathways, and then every semester you basically formatted your brain, and the next semester you filled it up with new knowledge, explained it during the exam period, then formatted your brain and did it again and again. And I just couldn’t stand that because we never really got a– there was no understanding or logic behind it, it was just: read an encyclopedia and learn a bunch of facts. You don’t really understand how they came about or why they came about. That really drove me to the edge, so for a while there I was really considering transferring to engineering, where it was all math and all logic and I could understand at least that. And then I got really lucky, I persevered and took a summer field course in my third year, or the beginning of my third year, and this professor just got hired, and he was doing things that I didn’t even know were possible. So, he was a mathematics kind of guy, a computational kind of guy — he was doing ecology (the thing I loved) and biology in general, but he was using the tools that I actually understood and cared about: math, computer science, and programming in general. I didn’t even know you could do that, and so it was a perfect match. And soon, I obviously spent a lot of time with him and he told me about the kind of ecology that he
studied, and I then started a master’s with him and eventually did a PhD with him — so, I got exceedingly lucky.
But you found your niche
Yes, exactly. So it’s basically the use of mathematical and computational tools, based on logic, rational decisions, and understanding in general– using those to address biological problems. That’s my niche.
It’s a complex niche.
Yeah but it’s a fun one, an interesting one, which is what– I was kind of dying the first two years. The lack of a challenge was no good.
What has your research consisted of in the past, and how, also, has it changed over time?
In the twenty-first century it’s now all about collaborations. So, depending on where you are and where you land, typically your research program will take one bend or another. You always have your core themes– so things that you care about and things that you will do, regardless of what everyone else around you is doing, but then as soon as you move to a new place, you have to take the opportunities that are there with you. So there are lots of opportunities for collaborations. At the MSC, I’ve been exceedingly lucky in basically being around very cool scientists doing cool stuff on systems that I hadn’t studied before. One of the things I started to do as I started my position here in 2012 was collaborating with the coral biologist and geneticist, Steve Vollmer. Coral ecosystems are very interesting because we think they’re highly vulnerable to climate change, they’re also the source of biodiversity in the sea (they’re like the rainforests of the sea), and so if those systems go down, a whole bunch of other species will go down in response. So there’s a lot of focus on understanding their response to climate change, and specifically to disease. That’s been my focus with Steve; we’ve collaborated to understand how corals, interacting with their microbiome, are either more resistant or less resistant to diseases that are associated with climate change. We’re trying to gain an understanding of that — what is the relative importance of the corals’ response to disease versus the microbial response to disease, and can we understand these two things, and, when we put them together, can we understand how corals with fare under climate change?
In studying a topic as broad and complex as climate change, what do you think is the importance of modeling and of statistics?
So they’re hugely important, I think. If you think about the climate change pipeline when it comes to an ecologist, at the beginning you’ve got the actual climate, and quantifying that climate, and understanding how you can scale from the broad climate models that we have to the fine-scale spatial variation that people care about when it comes to the ecology and actual organisms. Basically, how does an organism feel climate? That’s an entire pipeline. So I think statistical models are involved in every single level. At the top level, there are techniques called statistical downscaling; which are ways of extracting fine-scale information from broad-scale climate change models. These are things that are typically hundreds of kilometers in terms of their resolution, and to get very fine-scale stuff occurring at, let’s say the kilometer scale, or even the meter scale, what you
need to do is do some downscaling. There are two approaches for that: physical downscaling and statistical downscaling. In statistical downscaling, the idea is to find associations between the broad- scale climate predictors and other variables like precipitation and other things that are varying at different spatial scales, to look at the complex relationship between those two things. Once you develop a predictive model where you can infer the broad-scale pattern in, say, temperature as a function of, say, precipitation and salinity — and other things that varying at different locations around the globe and at different times — then you can use that model to do some statistical downscaling. Statistics is really at the forefront of generating climate forecasts that are relevant to the biology at the appropriate spatial scales. In a given 100 by 100 kilometer grid on the globe, organisms aren’t going to feel one temperature; they are going to feel something very different at those different locations along that 100 by 100 grid. That’s where stats comes in.
So stats takes the big picture and makes it small.
Exactly. It converts — to use a generic phrase — climate data into climate information. It extracts from the reams of climate data that we have and produces information about how organisms will actually feel. It gets closer to the biology, in one way.
And so, this data that you’re using in your statistical models, where does that come from?
The current generation is CMIP5, and this is basically a partnership between about maybe a dozen or so groups around the world. There are about 39 different global models — and these are models that are exceedingly complex, run by supercomputers in different countries (the U.S., France, and so on). What they’ve decided to do is collaborate and put all of their data together and make it available to the world for free. These the output of these datasets — which are really run as a bunch of coupled differential equations that model, using physics, how temperature and atmospheric interactions are influenced by different carbon emission scenarios — is what people analyze on a regular basis. So it’s not like you could run this model on your own personal computer, but what you can do is take a look at the output and analyze quite a bit of the output. That’s what people do nowadays.
What are some of the challenges associated with this type of research, and what are the potential benefits?
I think one of the benefits is getting a clearer idea of what ecosystem resilience might look like under changing environmental conditions. Again, if you go back to that simple question: how many species am I expected to find in a particular location, what’s the identity of those species, and what is their relative abundance? Those basic pieces of information are all modified by climate change in some way. One of the big-picture questions that we want to address is how will climate pick winners and losers? Some species will have a tendency to do better under climate change, others will have a tendency to do less well — how do we predict those winners and losers within a complex community and what are the implications of that for the things that communities do for us in terms of ecosystem services? Those are some hot-button questions and issues that need to be addressed going forward. This better understanding is one of the potential benefits going forward. The drawbacks, of course, are the huge amount of uncertainty that we have when we deal with these kinds of model predictions. When we’re dealing with these forecasts, you’ve got uncertainty in terms of carbon emissions — so we’re making predictions– these climate models run for like a hundred years, right?
So they have predictions about daily temperature over 100 years. They assume various climate scenarios that may or more not come to pass, so one of the things that we don’t know is whether or not we’ll be locked into one scenario versus another. There’s huge variation in terms of the response that we might expect based on a scenario that we end up following. So if because of the Paris agreement we decide to curb our CO2 emissions quite a bit, then many of these predictions that are based on some of the most dire CO2 emissions scenarios might not come to pass. So that’s one source of uncertainty: scenario uncertainty.
It’s just inherent in statistics.
Exactly. It’s one of those unknowns — if you don’t know anything within the statistical framework, what you do is you model every single possible scenario, that that at least you know that, if you go down one specific path, at least you’ll have some kind of prediction. What you don’t know, though, is what path is going to be selected, or is actively being selected — and that’s a bit of an issue. So that’s scenario uncertainty. The other source of uncertainty is sort of inherent to the weather system itself, which is chaotic; so there’s quite a bit of complex fluctuations in the climate that are not something you can predict in any way. So, imagine the weather: you can predict the weather pretty accurately over ten days, but anything beyond that becomes really, really hard to do — and that’s because its just a chaotic system. So if you had a really powerful computer that could measure every single variable that you need to launch the model up to thousands and thousands of decimal points, you might be able to extend your forecasting window from ten days to thirty days, but eventually you’ll make some kind of small error that will accumulate over time and basically lead to large prediction errors. That’s a feature of climate that’s a major issue; even if you know what scenario to run with your models, there’s chaos built in which makes it exceedingly difficult to predict within a given year, will temperature be relatively high compared to the average, or potentially lower?
I was going to say, I find it really interesting that we take weather predictions for granted, and the way in which they’re conceived. The amount of variables that need to be taken into account are just infinitely high — and then when you get to something as complex as predicting the effects climate change, things– on your level, in terms of research– it’s something that I have trouble wrapping my mind around, just… the scale.
Yeah, I mean it’s very, very tricky — but it’s just about tradeoffs. So for the weather, you typically make predictions that are relatively localized, so you get accurate local forecasts over a relatively small time window — but obviously you can’t run these kinds of models for a hundred years because they’re just too intensive, too time-consuming. Besides, again, any variation if you measured a temperature and you were off by a tenth of a tenth of a tenth percent, eventually that accumulates and leads to large prediction errors. It becomes really tricky, and one things climate change deniers say is that, look, you can’t predict how the weather will look like after ten days, how in the world are you predicting what’s going to happen with climate, right? That’s a common misunderstanding, and the response is really simple; seasonality, for example, happens over a year, but I can tell you what’s going to be colder around December, January, February, and March than it is going to be around June, July and August, and that’s because there are repeatable patterns that regularly occur. Sure, there’s a lot of noise around those patterns, but the signal is strong enough for you to capture it. And that’s what climate does: if you look at very, very broad temporal scales, that turns out to be an asset
and not a liability because the signal becomes clearer at the broad-scale as the noise is essentially cancelled out to a certain degree.
As long as we’re on the topic of uncertainty, could you briefly talk about how these sorts of studies have changed over time in terms of accuracy? Because I would assume that the instrumentation we have now, in 2016, is infinitely more useful than it might’ve been ten years ago.
I can tell you from a climate modeling perspective that, yes, there’s tons of more data out there. The big problem with climate, of course, is that it’s over long time (what???), so as soon as you add new, more sensitive measurements you have to do a bunch of work to make sure that you’re not introducing a new bias that’s associated with your new measurement that’s more accurate than your previous measurement, because you need that long time series. The problem is that people always hear that NOAA or some other governmental agency is calibrating a model or doing something and people think that it’s basically cheating, so that you always get a predictable increase in temperature in terms of your prediction. But it’s really all about making sure that the changes that we’ve gone through in terms of how we measure temperature, for instance, have no impact on the patterns that we see in the data, and that’s a bit of a major issue. In terms of the models, we went through multiple generations of the CMIP models (these are the models that are available to the entire community for download and analsysis), and we’re up to CMIP5, and there’s a CMIP6 whose output is out there but hasn’t been analyzed just yet, and people typically think that the latest generation is going to be better. It turns out that one of the collaborators that I have here at Northeastern, who’s in the Civil and Environmental Engineering Department, Auroop Ganguly, he and his students studied and compared CMIP 3 to CMIP 5, and for certain conditions, CMIP 3 was doing a better job of predicting patterns than the CMIP 5 models. So it’s a little bit tricky; you can’t just take the latest generation of models and think that it’s going to be better and more accurate. You have to do all these statistical tests to make sure that what’s coming out of it is actually making more sense than what previous generations were generating. In general, things have gotten much better, which typically means higher resolution in both space and time. So, finer grid scales — and that’s always important.
Over the years, with your research, what have you found and has anything surprised you or jumped out at you?
I talked about my collaboration with Vollmer as being an incipient collaboration (we started in 2013), but my collaboration with Auroop Ganguly on climate change only started in 2015.
Oh, so it’s ongoing.
That’s right, I’m completely new to this game. But, what has been interesting is how you grapple with the uncertainty that comes with the chaotic nature of the climate, the uncertainty in terms of CO2 emissions scenarios, and the uncertainty that’s associated with variation between different climate models. So we have 39 different climate models making predictions right now from around the globe, and they’re all theoretically modeling the same earth and the same system — they’re all parameterized to do so, but there are slight differences in terms of the physics involved and how they model coastlines and all that sort of stuff, and they end up making very different predictions. So if
you take the multi-model mean, the average prediction across the 39 models, you miss out quite a bit on the amount of uncertainty there is in the world in terms of predicting what path we’re on. So, some models, for instance– I study upwelling, which is a really important phenomenon, especially on the west coast. One of the predictions that Andrew Bakun made in 1990 was this idea that climate is simply going to introduce a greater thermal difference between the cool ocean and the land mass which are going to warm faster than the ocean. That differential is going to create greater wind stress; that greater wind stress is going to create more upwelling. That was a hypothesis he put out in 1990, he had some empirical data set that showed this trend is holding up, and one of the things that we did in with a group– my collaborator, Ganguly– is to try to see whether or not these climate models predicted the same thing over much long time (what???) — so, until, like, 2100. What we found is that in many instances it’s true: in 3 out of the 4 eastern boundary current systems (the Canary, the Benguela, and the Humboldt), that’s exactly what we saw, that the patterns were stronger as you went farther from the equator. But the interesting thing was that California didn’t seem to show any trend, so some of the 39 models show California responding positively in terms of upwelling over time, but others showed no response or a negative response — like a decrease in the intensity and the duration of upwelling. So, of course, our planet is probably, most likely one of these models — one of the 39. It’s not the average of the 39. We just don’t know which one it is, so there is that sort of irreducible uncertainty at some point that just makes it really, really hard to get a clear picture of what is going on.
Of course, you can make a pretty good guess as to which one it is using statistical analysis, but you will never truly know, I suppose.
That’s right, and the big issue you’ll have is that if all of these 39 global climate models are predicting different things, it become harder to suggest or state which ones are better than the others and why. And so the hope is again this gets reconciled or resolved with the next generation, the CMIP6 generation. For now, we know that, empirically, California has seen an increase in upwelling over the last 41 years. So there is a tendency to do that, but whether it will continue in the future? Nobody knows.
How do use modeling and statistical analysis to address and possibly mitigate the effects of climate change?
There are a couple things that you have to do in this field, and one of them is– you’re swimming in data. This is the “big data” era, and there’s no way you can get around this much data without summarizing the trends. And to do that you need to apply the correct statistical analyses, so there’s no way you can make sense of all this data, to convert from data to information, with the statistical models to help you — that’s number 1. Number 2, even if the models are flawed in some sense, and there is some uncertainty, it’s important to be able to make predictions in order to plan ahead. Take the example of sea level rise. If you know that Boston is going to be under water by 2060, you’re going to take some kind of steps early on to mitigate that in some way, you’re going to construct artificial barriers and things of that order to prevent that from happening. So part of the deal is using these models to convert data to information, and then that information can be conveyed to people who are in politics or in management to make long-term decisions. As you can expect, mitigation typically involves quite a bit of engineering, which takes a long time to plan and to motivate, because typically it means greater taxes and someone has to pay. And in order to do that, to justify that to the greater public, you need to have strong trends and analyses that are compelling. The only way to do that, really, is to analyze the data that we have at this point. You can go from high up in the ivory tower and make sure that this information gets conveyed to people that are potentially either going to suffer the effects or pay for the remediation. So you can think of it as a bridge.