In reviewing some interview notes, I saw this segment from an interview with someone who works for the Earth System Grid. ESG is planning to add observational data to the data portal in addition to its current collection of model data. I asked him what it will be like to move from a system that handles just data output from models to include data collected from observational systems. His reply:
“Oh yeah it’s going to be a big jump. Because the model data is easy compared to the observational data…”
“What makes model data easy compared to observational data?”
“Well, for one thing, it’s all already in a nice gridded format. I mean, you got the nice 2D and 3D pieces, that doesn’t tend to be any missing data, like… I mean, observational data requires all the work just to be able to take it from what the center says to something that human can use. And it’s already in a pretty well-defined format, either GRIB or NetCDF or something like that. It’s just probably… I mean, it’s… Since it’s an idealized representation of the world, I guess, in some ways the data is seen as kind of an idealized data format and data that it’s a lot easier to… Easier doesn’t mean easy but… I’m reading articles about observational data and I’ve accessed enough it that it’s really, really hard sometimes.”
We haven’t often had our three themes of monitoring, modeling, and memory come up as analytical concepts, but this instance was striking because it nicely showed a relationship between them. Idealized system for generating data results in an idealized data format that is easier to store. Model runs with non-idealized data can be repeated, keeping the data cleaner. I don’t want to pretend that model data is always clean or uncomplicated, but it does seem that in some real senses it could be simpler than observational data.