Discovery of Patterns in the Global Climate System
This work presents preliminary work in using data mining techniques to find interesting spatio-temporal patterns from Earth Science data. The data consists of time series measurements for various Earth Science variables (e.g. soil moisture, temperature, and precipitation), along with additional data from existing ecosystem models (e.g. Net Primary Production). The ecological patterns of interest include associations, clusters, predictive models, and trends. In this work, we first discuss some of the challenges involved in preprocessing and analyzing the data. Earth Science data has strong seasonal components that need to be removed prior to pattern analysis, as Earth scientists are primarily interested in patterns that represent deviations from normal seasonal variation such as anomalous climate events (e.g., El Nino) or trends (e.g., global warming). We compare several alternatives (including singular value decomposition (SVD), discrete Fourier transform (DFT), “monthly” Z score, and moving average) with respect to their effectiveness in removing seasonality. After preprocessing, we apply clustering and different kinds of association analysis to the data to discover spatio-temporal relationships among ecological variables at various parts of the Earth. Our current technique for finding associations extracts sets of events from the time series data and then applies existing algorithms traditionally used for market-basket data. We use K-means clustering to divide the land and ocean areas of the earth into disjoint regions in an automatic, but meaningful, way that enables the direct or indirect discovery of interesting patterns.