# Data Mining for the Discovery of Ocean Climate Indices

Ocean climate indices (OCIs), which are time series that summarize the behavior of selected areas of the Earth’s oceans, are important tools for predicting the effect of the oceans on land climate. In this work we describe the use of data mining to discover Ocean Climate Indices (OCIs). In particular, we apply a shared nearest neighbor (SNN) clustering algorithm to cluster the pressure and temperature time series associated with points on the ocean, yielding clusters that represent ocean regions with relatively homogeneous behavior. The centroids of these clusters are time series that summarize the behavior of these ocean areas, and thus, represent potential OCIs. To evaluate cluster centroids for their usefulness as potential OCIs, we must determine which cluster centroids significantly influence the behavior of well-defined land areas. For this task, we use a variety of approaches that analyze the correlation between potential OCIs and the time series (e.g., of temperature or precipitation) which describe the behavior of land points. Based on these approaches, we have identified some cluster centroids that are almost identical to well-known OCIs, e.g., the Southern Oscillation Index (SOI) and the North Atlantic Oscillation (NAO). We also introduce two strategies for validating potential OCIs which do not correspond to well-known (and probably “stronger” OCIs), namely, focusing on the correlation between “extreme” events on the ocean and land and looking for more persistent patterns of correlation.