# Pearson's correlation coefficient

In [None]:
# /// script
# requires-python = ">=3.12"
# dependencies = [
#     "matplotlib",
#     "ndv[jupyter,vispy]",
#     "numpy",
#     "scikit-image",
#     "scipy",
#     "tifffile",
#     "imagecodecs",
#     "coloc_tools @ git+https://github.com/fdrgsp/coloc-tools.git"
# ]
# ///

## <mark style="color: black; background-color: rgb(127,196,125); padding: 3px; border-radius: 5px;">Description</mark>

In this section, we will explore how to implement in Python the **Pearson's Correlation Coefficient**, which is a common method for quantifying colocalization based on pixel intensities.

The images we will use for this section can be downloaded from the <a href="../../_static/data/08_pixel_intensity_based_coloc.zip" download> <i class="fas fa-download"></i> Manders & Pearson's Colocalization Dataset</a>.

<p class="alert alert-warning">
    <strong>Note:</strong> This notebook aims to show how to practically implement these methods but does not aim to describe when to use which method. For this exercise, we will use a single 2-channel image and without any preprocessing steps.
</p>

<p class="alert alert-info">
    <strong>Note:</strong> In this example, we will not perform any image processing steps before computing the Pearson's Correlation Coefficient. However, when conducting a real colocalization analysis you should consider applying some image processing steps to clean the images before computing the Pearson's Correlation Coefficient, such as background subtraction, flat-field correction, etc.
</p>

<p class="alert alert-info">
    <strong>Note:</strong> In this notebook we will only use a single image pair for demonstration purposes. Often, Pearson's coefficients should not be interpreted as absolute values in isolation. Instead, it's always recommended to consider them in the context of comparisons between different conditions, controls, treatments, or experimental groups. The relative changes and ratios between conditions are often more meaningful than the absolute coefficient values themselves.
</p>

## <mark style="color: black; background-color: rgb(127,196,125); padding: 3px; border-radius: 5px;">Pearson's Correlation Coefficients</mark>

The Pearson's correlation coefficient measures the **linear relationship** between pixel intensities in two fluorescence channels. 

<div align="center"> <img src="https://raw.githubusercontent.com/bobiac/bobiac-book/main/_static/images/coloc/pearsons_slide.png" alt="pearsons" width="800"></div>

<br>

It quantifies how well the intensity variations in one channel predict the intensity variations in another channel across all pixels in the image. 

The coefficient ranges from **-1 to +1**, where:
- **+1** indicates perfect positive correlation (when one channel's intensity increases, the other increases proportionally)
- **0** indicates no linear correlation
- **-1** indicates perfect negative correlation (when one channel's intensity increases, the other decreases proportionally)

<div align="center"> <img src="https://raw.githubusercontent.com/bobiac/bobiac-book/main/_static/images/coloc/pearsons_graphs.png" alt="pearsons_graphs" width="770"></div>

<br>

Pearson's correlation considers the **entire intensity range** and evaluates how intensities **co-vary** across the image. This makes it particularly useful for detecting cases where two proteins show coordinated expression levels, even if they don't necessarily occupy the exact same pixels.

### <mark style="color: black; background-color: rgb(190,223,185); padding: 3px; border-radius: 5px;">Load and Visualize the Image</mark>

Open and visualize (with ndv) the image named `t_cell.tif` from the <a href="../../_static/data/08_pixel_intensity_based_coloc.zip" download><i class="fas fa-download"></i> Manders & Pearson's Colocalization Dataset</a>. This is a two-channel image of HEK293 cells where two distinct fluorescent proteins have been labeled with different fluorescent markers.

To compute Pearson's Correlation Coefficients, we need **two separate images** (channels).

What is the image shape? How do we split the channels?

### <mark style="color: black; background-color: rgb(190,223,185); padding: 3px; border-radius: 5px;">Scatter Plot</mark>

It is often useful to visualize the relationship between the two channels using a scatter plot. This can help us understand the distribution of pixel intensities.

### <mark style="color: black; background-color: rgb(190,223,185); padding: 3px; border-radius: 5px;">Calculate Pearson's Correlation Coefficients</mark>

<div align="left"> <img src="https://raw.githubusercontent.com/bobiac/bobiac-book/main/_static/images/coloc/pearsons_eq.png" alt="pearsons_eq" width="400"></div>

There are several libraries in Python that alreqady implement the Pearson's Correlation Coefficient. Two examples are `scipy.stats.pearsonr` and `numpy.corrcoef`.

In [None]:
# Calculate Pearson's correlation coefficient using scipy
pearson, p_value = pearsonr(ch1.ravel(), ch2.ravel())
print(f"Pearson's (scipy): {pearson:.2f}, p-value: {p_value:.4f}")
# Calculate Pearson's correlation coefficient using numpy
pearson_numpy = np.corrcoef(ch1.ravel(), ch2.ravel())[0, 1]
print(f"Pearson's (numpy): {pearson_numpy:.2f}")

### <mark style="color: black; background-color: rgb(190,223,185); padding: 3px; border-radius: 5px;">Costes Pixel Randomization Test</mark>

The **Costes pixel randomization test** is a statistical method used to validate the significance of colocalization results, particularly for Pearson's correlation coefficients. This method involves **randomly shuffling the pixel intensities of one channel and recalculating the Pearson's correlation coefficient** to create a distribution of values under the null hypothesis of no colocalization.

The [costes_pixel_randomization](https://github.com/fdrgsp/coloc-tools/blob/fee98bb72ccdbffabdc0d4875a9d4fccd43cc8ab/src/coloc_tools/_costes_pixel_randomization.py#L7) function from `coloc-tools` provides an implementation of this method in Python. This function returns the observed Pearson's correlation coefficient, a list of randomized correlation coefficients, and the p-value indicating the significance of the observed correlation.

A low `p-value` (e.g. 0.0001) means that none of the `n` random translations (by default 500) produced a correlation coefficient as high as the observed one, indicating that the observed colocalization is statistically significant: the probability of getting the observed colocalization by random chance is < 0.0001 (less than 0.01%).

Let's run it on the two channels we have been working with.

We can now run the Costes pixel randomization test and print the pearson's correlation coefficient, the p-value and the first 5 randomized correlation coefficients.

**Bonus:** We can also visualize the distribution of the randomized Pearson's correlation coefficients to better understand the significance of our observed correlation.

### <mark style="color: black; background-color: rgb(190,223,185); padding: 3px; border-radius: 5px;">Summary</mark>

The Python implementation for calculating Pearson's Correlation Coefficient is straightforward and concise, as demonstrated in the code below.

```python
mean_ch1 = np.mean(ch1)
mean_ch2 = np.mean(ch2)
numerator = np.sum((ch1 - mean_ch1) * (ch2 - mean_ch2))
denominator = np.sqrt(np.sum((ch1 - mean_ch1) ** 2) * np.sum((ch2 - mean_ch2) ** 2))
pearson_coefficient = numerator / denominator
```

And it is even easier using already available libraries like `scipy` or `numpy`:

```python
from scipy.stats import pearsonr
pearson_coefficient, p_value = pearsonr(ch1.ravel(), ch2.ravel())
```
```python
import numpy as np
pearson_coefficient = np.corrcoef(ch1.ravel(), ch2.ravel())[0, 1]
```

**Important Note on Spatial Information:**

It's crucial to understand that **Pearson's correlation works on vectors/pairs of values and does not use spatial information at all**. The correlation is calculated based purely on the intensity values and their relationships, regardless of where those pixels are located in the image. This means that if you randomly shuffle the pixel positions in both channels in exactly the same way, you would get identical Pearson's correlation results. This characteristic makes Pearson's correlation fundamentally different from spatial colocalization measures and emphasizes that it's measuring intensity co-variation rather than spatial co-occurrence.

**Key Considerations for Pearson's Correlation Analysis:**

1. **Region of Interest (ROI) Analysis**: Instead of running Pearson's correlation on the entire image, it is often beneficial to perform **segmentation of the structures you care about** first. Ideally, use a **third independent channel** (such as a nuclear stain or cell membrane marker) to define regions of interest. This approach:
   - Reduces background noise interference
   - Focuses analysis on biologically relevant areas
   - Improves the biological interpretation of results
   - Eliminates correlation artifacts from empty regions

2. **Background Considerations**: Pearson's correlation can be heavily influenced by background pixels. Consider applying background subtraction or flat-field correction before analysis, or use segmentation masks to exclude background regions.

3. **Statistical Validation**: Always validate your results using statistical tests such as the Costes pixel randomization test demonstrated above. This helps assess whether observed correlations are statistically significant or could have occurred by chance. Rotating 90 or 180 degrees one channel and computing Pearson's correlation can also help validate the significance of the observed correlation.

   - **Costes Test**: This test generates a distribution of Pearson's coefficients from randomized pixel intensities, allowing you to calculate a p-value for your observed correlation. A low p-value indicates that the observed correlation is unlikely to have occurred by chance.

4. **Comparative Analysis**: Pearson's correlation values should not be interpreted as absolute measures in isolation. Instead, consider them in the context of:
   - Comparisons between different experimental conditions
   - Control vs. treatment groups
   - Different time points or developmental stages
   - Relative changes between conditions are often more meaningful than absolute values

5. **Limitations**: Remember that Pearson's correlation measures **linear relationships** between pixel intensities. It may not capture more complex colocalization patterns and can be sensitive to outliers and intensity variations.

By combining proper image preprocessing, ROI-based analysis, and statistical validation, Pearson's correlation coefficient becomes a powerful tool for quantitative colocalization analysis in fluorescence microscopy.