Exploring the Data

Let's look at our rock samples data:

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from IPython.display import display # pretty display of dataframes

# Load the data
data = pd.read_csv('data/rock_samples.csv')

# Display the first few rows
display(data.head())

# Randomly select two components to plot
components = np.random.choice(data.columns, 2, replace=False)

plt.figure(figsize=(10, 6))
plt.scatter(data[components[0]], data[components[1]])
plt.xlabel(components[0])
plt.ylabel(components[1])
plt.title(f'{components[0]} vs {components[1]}')
plt.show()

PyEarth: A Python Introduction to Earth Science

Unsupervised Learning: Clustering

Major Rock Types on Earth

Igneous Rocks

Sedimentary Rocks

Metamorphic Rocks

An imaginary scenario as a geologist

Exploring the Data

The Need for Advanced Techniques

Dimension Reduction

Principal Component Analysis (PCA)

Applying PCA to Our Rock Samples

Interpreting PCA Results

Clustering

K-Means Clustering

Applying K-Means to Rock Samples

DBSCAN Clustering

Applying DBSCAN to Rock Samples

Hierarchical Clustering

Applying Hierarchical Clustering to Rock Samples

Comparing with True Rock Types

Extra Exercise

Comparing Clustering Results With and Without PCA

Conclusion