Monitoring, Modeling, and Memory

Dynamics of Data and Knowledge in Scientific Cyberinfrastructures

Archive for the ‘Uncategorized’ Category

The Politics of Systems

Posted by Paul N. Edwards on September 9, 2011

Here’s an interesting blog I just blundered across. Not sure who writes it, but its nice!

The Politics of Systems: thoughts about Software, Power and Digital Method

Take a look at the post on “The architecture of governance.”

- Paul

Posted in Uncategorized | Leave a Comment »

Data without Borders

Posted by Paul N. Edwards on June 24, 2011

Here’s an interesting idea from Jake Porway, a young “data scientist”: Data without Borders, an enterprise of loosely coordinated volunteers helping NGOs, nonprofits, and other groups by offering data analysis services. Statistics, spreadsheets, charts, data collection.

As he puts it:

There’s a lot of effort in our discipline put toward what I feel are sort of “bourgeois” applications of data science, such as using complex machine learning algorithms and rich datasets not to enhance communication or improve the government, but instead to let people know that there’s a 5% deal on an iPad within a 1 mile radius of where they are. In my opinion, these applications bring vanishingly small incremental improvements to lives that are arguably already pretty awesome.

On the other hand there are lots of NGOs and non-profits out there doing wonderful things for the world, from rehabilitating criminals, to battling hunger, to providing clean drinking water. However, they’re increasingly finding themselves with more and more data about their practices, their clients, and their missions that they don’t have the resources or budgets to analyze. At the same time, the data /dev communities love hacking together weekend projects where we play with new datasets or build helpful scripts, but they usually just culminate in a blog post or some Twitter buzz. Wouldn’t it be rad if we could get these two sides together?

Go Jake!

Posted in Uncategorized | Leave a Comment »

Climate smackdown: models vs. data?

Posted by Paul N. Edwards on March 22, 2011

Just published a long post on A Vast Machine about the recent Nature and QJRMS results, about which I’ve blogged here earlier.

Reposting in full here, since it’s relevant, but NB, this is more of an op-ed than usual:

For decades, skeptics have tried to boil every climate change debate down to this: good, hard data vs. bad, fuzzy models. This caricature lets them attack every model-based climate projection while waiting for data to confirm the reality of human-induced global warming.

Now, here we go again. On February 10, the Wall Street Journal’s professional global warming skeptic, Anne Jolis, trumpeted recent data showing that certain global climate patterns haven’t changed much since 1871. “The weather isn’t getting weirder. The latest research belies the idea that storms are getting more extreme,” she wrote, to loud applause from other skeptics. “Another nail in the coffin of anthropogenic global warming,” crowed one.

Less than a week later, the scientific journal Nature presented two papers suggesting that human greenhouse gas emissions have increased the likelihood of heavy rains. One linked the devastating UK floods of the year 2000 to global warming [see it here]. The other identified greenhouse gas emissions as a likely contributor to increases in exceptionally heavy “precipitation events” across the northern hemisphere [see it here]. These results confirm the obvious: a warmer climate leads to more evaporation, and hence more precipitation overall. But they went further, predicting where this extra precipitation would occur and linking it specifically to human emissions.

Both Nature papers relied on climate models, computer simulations of the global atmosphere. When researchers left human greenhouse-gas emissions out of these simulations — simulating the climate as it might be without industrial societies — their models projected fewer heavy precipitation events than observed. When they put them back in, the likelihood of intense precipitation went up.

The skeptic response? “No real data supporting their claims,” one wrote on Andy Revkin’s Dot Earth blog. “Just climate models. GIGO [garbage in, garbage out].”

It’s a familiar refrain in the climate change wars. Climate models, goes the tune, are insubstantial fantasies. Tweak their knobs and you can make ‘em say anything. Climate data, on the other hand, are solid, substantial. “Sound science” equals “data, not models.”

But wait — about those data that made Jolis so happy… where exactly did they come from? Here’s a hint: the investigators were awarded over 3 million hours of supercomputer time to do their work. It’s called the 20th-Century Reanalysis Project (20CR, for short). “Reanalysis” is a technique for re-processing past weather data to make a climate dataset.

Here’s the paper that got her so jazzed. 20CR began with a comprehensive collection of surface pressure readings covering the period 1871-2008. The project then spent some of those millions of supercomputer hours to pipe those data through a computer forecast system — a simulation model.

That forecast model uses a 3-dimensional grid to represent the atmosphere. The grid mesh contains well over 1 million points, and every one of those points must be assigned a value. Yet the surface pressure readings used as input came from a relative handful of locations — for 1871, the study’s first year, only 62 land stations worldwide. In reanalyses of this type, the vast majority of data  are calculated by the forecast model, not measured by instruments.

So the “data” that had Jolis gloating were in fact largely generated by a computer simulation — the same type of model (though not the same model) used in the Nature studies. According to some skeptics’ own tenets, then, the 20CR data can’t be much more than a scientific fantasy.

True? Of course not. Getting a scientific grip on something as big and complicated as the global atmosphere simply can’t happen without computer modeling. Today, every credible global dataset, without exception, is processed, filtered, corrected, and/or partially generated by computer models. Those who think it’s data versus models — hard evidence vs. squishy algorithms — are living in a long-vanished world, where “science” meant laboratory experiments on highly simplified systems.

So who’s right? Are human greenhouse emissions altering the chances of extreme weather, as the Nature papers suggest? Or does natural climate variability remain unchanged, as 20CR seems to show? Unfortunately, that question can’t yet be answered, because 20CR and the Nature studies addressed different climate patterns that can’t be directly compared. One thing is sure, though: it’s going to take both observations and computer models to find out. Everything we know about the climate — past, present, and future — depends upon our ability to simulate its operation.

The idea that it’s “models bad, data good” just won’t work. We can’t let the skeptics set the terms of the debate. They don’t even understand what the terms mean.

Coda:

As for the 20CR scientists, they responded to Jolis on Feb. 23. Mild-mannered creatures that they are, they wrote that her opinion “does not accurately reflect our views.”

As for the statement that the Twentieth Century Reanalysis Project… shows ‘little evidence of an intensifying weather trend’: We did not look at weather specifically, but we did analyze three weather and climate-related patterns that drive weather, including the North Atlantic Oscillation. And while it is true that we did not see trends in the strength of these three patterns, severe weather is driven by many other factors.

The lack of a trend in these patterns cannot be used to state that our work shows no trend in weather. Many researchers have found evidence of trends in storminess and extreme temperature and precipitation in other weather data over shorter periods.

Finally, the article notes that the findings are ‘contrary to what models predict.’ But models project forward, while our analysis looked back at historical observations. We see no conflict between the 100-year-projection of changes in weather extremes resulting from additional carbon dioxide and the fact that our look back at three indicators showed no historical trend.

They fail to point out that their analysis is itself produced by a model.

Posted in Uncategorized | Leave a Comment »

Digging into Data Challenge: Round 2

Posted by Paul N. Edwards on March 17, 2011

Here’s an interesting funding opportunity on questions of big data for the social sciences and humanities. NB, tomorrow (March 18) is the “Day of Digital Humanities 2011,” a worldwide effort to document a “day in the life” of this new field.

Date: Wed, 16 Mar 2011 14:19:11 -0400
From: “Serventi, Jennifer” <JServenti>
Subject: 2011 Digging into Data Challenge

Eight international research funders jointly announce their
participation in round two of the Digging into Data Challenge, a grant
competition designed to spur cutting edge research in the humanities and
social sciences.

The Digging into Data Challenge asks researchers these provocative
questions: How can we use advanced computation to change the nature of
our research methods? That is, now that the objects of study for
researchers in the humanities and social sciences, including books,
survey data, economic data, newspapers, music, and other scholarly and
scientific resources are being digitized at a huge scale, how does this
change the very nature of our research? How might advanced computation
and data analysis techniques help researchers use these materials to ask
new questions about and gain new insights into our world?

Due to the overwhelming popularity of round one, the Digging into Data
Challenge is pleased to announce that four additional funders have
joined for round two, enabling this competition to have a world-wide
reach into many different scholarly and scientific domains. The eight
sponsoring funding bodies include the Arts & Humanities Research Council
(United Kingdom), the Economic & Social Research Council (United
Kingdom), the Institute of Museum and Library Services (United States),
the Joint Information Systems Committee (United Kingdom), the National
Endowment for the Humanities (United States), the National Science
Foundation (United States), the Netherlands Organisation for Scientific
Research (Netherlands), and the Social Sciences and Humanities Research
Council (Canada).

Final applications will be due June 16, 2011. Further information about
the competition and the application process can be found at
www.diggingintodata.org.

Jennifer Serventi, Office of Digital Humanities
National Endowment for the Humanities
http://www.neh.gov/odh
Twitter: @NEH_ODH

Posted in Uncategorized | Leave a Comment »

Interview with me in Rorotoko

Posted by Paul N. Edwards on March 8, 2011

The intellectual book review Rorotoko published an online “interview” with me yesterday morning (March 7).

My attempt to boil down some of the main points of A Vast Machine to a conversational format.

- Paul

Posted in Uncategorized | Leave a Comment »

LTER goes after climate change

Posted by Paul N. Edwards on February 23, 2011

LTER is getting involved in climate change studies. Sounds like the reporter didn’t really investigate LTER’s original purpose, though.

“The record snows across the United States this winter may be seen as a harbinger of the extreme weather expected from global warming, but figuring out how much the planet is warming and what the impact might be will take long-term studies. The Long-Term Ecological Research project, started by the National Science Foundation in 1980, is doing just that, with 26 sites, most located in the U.S., collecting data related to climate change. And at a symposium in Washington, March 2, seven researchers will present results from a sampling of LTER projects.”

Full story here.

Also, we should eat more insects.

Posted in Uncategorized | Leave a Comment »

Data.Rescue@Home

Posted by Paul N. Edwards on February 18, 2011

I knew this was coming: crowdsourcing climate data.

Data.Rescue@Home is an internet-based attempt to digitize historical weather data from all over the globe and make the digitised data available to everybody. Two projects are currently online: German radiosonde data form the Second World War and meteorological station data from Tulagi (Solomon Islands) for the first half of the 20th century.

You log in, look at a scanned image of a weather record, and enter the data as numbers on a form.

Not much progress yet. Up and running since October 2010, but only about 150 of 2000 scanned images have been coded. Where are the masses when you need them?

Posted in Uncategorized | Leave a Comment »

New climate variability results: models and data, again

Posted by Paul N. Edwards on February 18, 2011

The New York Times reported yesterday on two new Nature papers on climate change (extreme precipitation events linked to anthropogenic global warming through computer simulation), expected to stir up debates again.

Meanwhile, a few weeks ago the 20th-Century Reanalysis Project reported on recent results of the longest-term weather data reanalysis project yet, collecting every scrap of available weather data from 1871-2008 and running them through a weather forecast model to “fill in the blanks” for what’s missing.

A salient finding from this study: changes in the North Atlantic Oscillation (also see the North Atlantic Oscillation theme site) appear to be driven throughout the study period primarily by natural variability. In other words, the reanalysis isn’t seeing an effect of global warming on variability in the NAO.

The reanalysis data go back to 1871 — but as they go back in time, they get thinner and thinner. Most data prior to the 1950s are from the surface only. The reanalysis model fills in the missing data. So the large majority of data in the pre-1950s reanalysis are created by the model.

The Nature studies are looking at an entirely different kind of variability, i.e. frequency of extreme precipitation events in the UK (one study) and the Northern Hemisphere (the second study). (It’s worth jumping to the actual articles from the links given on the Nature news page.) These studies compare observational data with results from simulation models with and without anthropogenic forcing (i.e. greenhouse gases and other human influences on climate). The results: (a) natural variability alone can’t account for the increased northern hemisphere precipitation in the second half of the 20th century, and (b) anthropogenic factors, added to the simulation models, doubled the risk of the floods experienced in the UK in 2000.

This, combined with the comments on the two Nature pieces, make for a lovely skeptic paradox. The skeptics are very happy with the results from the model-driven reanalysis data which (they think) confirm their views. (Another nail in the coffin of AGW, one wrote.) But they roundly reject the idea that simulation models could explain the significant increase in extreme precipitation.

By the way, Piers Corbyn, mentioned in the Kevin Crean comment on the Nature news page, runs a commercial long-term weather prediction service in the UK using his own “solar/lunar” model, whose details he will not reveal and which has never been peer reviewed. He’s had some notable successes in forecasting major storms long in advance (months). He places bets on his own forecasts (and sometimes wins). He’s a skeptic in the Christopher Monckton vein. (Monckton, by the way, claims to be a hereditary member of the House of Lords, but the Lords are having none of it.)

I’m going to be working on an op-ed about this over the weekend. Comments welcome.

 

Posted in Uncategorized | 2 Comments »

[Taxacom]: Data persistence

Posted by Paul N. Edwards on February 11, 2011

From an Ars Technica post, by way of Taxacom:

CERN scientists and researchers from several other facilities have grouped together to preserve data by creating DPHEP (Data Preservation in High Energy Physics). DPHEP recommends that research budgets provide for a data archivist position. The data archivist will preserve data along with key supplementary information that is necessary to interpret and put the data in perspective for future generations. They also recommend creating virtualized software that simulates the computers of today, so whatever programs current physicists use for their data workup can be used long after present technology expires.

Two points here, following up on another post from earlier today. First, DPHEP is upping the metadata ante considerably by requiring not just publication of code, but producing and maintaining emulators that could run that code, even much later. Again — is this worth the effort? When? How much effort? At what cost?

Second, though: institutionalizing positions for data archivists would make a lot of sense. Such positions would go far to solving a sociotechnical problem in the most flexible way, i.e. with people rather than (only) technology. Here’s where training comes in — and where there’s a potentially huge role for iSchools and their graduates.

Posted in Uncategorized | Leave a Comment »

Science special issue on “Dealing with Data”

Posted by Paul N. Edwards on February 11, 2011

Reposting a pointer from Cliff Lynch —

The February 11, 2011 issue of Science has a special section titled “Dealing with Data” with a number of papers and articles covering data intensive science and data curation issues.

They have set up a website that consolidates some of the material from this issue and some related topical material from other Science journals (Signaling, Translational Medicine, Careers) for public access (registration required for non-subscribers).

Posted in Uncategorized | 1 Comment »

 
Follow

Get every new post delivered to your Inbox.