# Retraining Cellpose on Custom Data

## <mark style="color: black; background-color: rgb(127,196,125); padding: 3px; border-radius: 5px;">Overview</mark>

In this section, we‚Äôll walk through how to **retrain Cellpose on your own data**. This is useful when the default models don‚Äôt perform well on your specific cell type, staining method, or imaging modality.

Retraining allows Cellpose to learn directly from your examples‚Äîleading to better segmentation accuracy and more relevant masks for your experiments.

We‚Äôll cover:
- Preparing your training data (images + label masks)
- Mounting your Google Drive to access files
- Setting training parameters
- Running the training process
- Evaluating the new model on test images

> üí° You‚Äôll need pairs of raw microscopy images and their corresponding label masks. If you haven‚Äôt labeled your images yet, we recommend using the [Cellpose GUI](https://cellpose.readthedocs.io/en/latest/gui.html#training-your-own-cellpose-model) to draw or edit masks manually before starting.

The dataset we‚Äôll use here can be downloaded below. It includes both training and test images:
<a href="../../../_static/data/05_segmentation_cellpose_training.zip" download>
<i class="fas fa-download"></i> Cellpose Training Dataset</a>

<p class="alert alert-warning">
    <strong>‚ö†Ô∏è Note:</strong> This notebook is designed to run in <a href="https://colab.research.google.com/github/bobiac/bobiac-book/blob/gh-pages/colab_notebooks/05_segmentation/deep_learning/cellpose_retraining_notebook.ipynb" target="_blank"> Google Colab</a>. If you want to run it locally, you may need to adjust some paths and install the required packages.
</p>

## <mark style="color: black; background-color: rgb(127,196,125); padding: 3px; border-radius: 5px;">Make sure you have GPU access</mark>

To Enable GPU:

1. navigate to `Runtime -> Change Runtime Type`
2. select `Python 3` as `Runtime Type`
3. select one available GPU (e.g. `T4 GPU`) as `Hardware accelerator`.

<br>

<div align="left"> <img src="https://raw.githubusercontent.com/bobiac/bobiac-book/main/_static/images/cellpose/colab_runtime.png" alt="Ilastik Logo" width="400"></div>


## <mark style="color: black; background-color: rgb(127,196,125); padding: 3px; border-radius: 5px;">Mount your google drive</mark>

To access the data for the course you first need to mount your Google Drive.

Run the cell below to connect your Google Drive to colab and follow the instructions to authenticate your Google account.

You will need to allow access to your Google Drive so that the notebook can read and write files.

In [None]:
from google.colab import drive

drive.mount("/content/drive")


Then click on `folder icon` on the left bar, press the `refresh button`. Your Google Drive folder should now be available here (e.g. MyDrive).

<div align="left"> <img src="https://raw.githubusercontent.com/bobiac/bobiac-book/main/_static/images/cellpose/colab_folder.png" alt="Ilastik Logo" width="300"></div>

## <mark style="color: black; background-color: rgb(127,196,125); padding: 3px; border-radius: 5px;">Download the Data</mark>

Run the cell below to download the data for this exercise and save it in you Google Drive. A new folder called `bobiac_data_cellpose` will be created in your Google Drive.

In [None]:
# Create directory
!mkdir -p /content/bobiac_data_cellpose
# Download the data
!wget https://raw.githubusercontent.com/bobiac/bobiac-book/main/_static/data/05_segmentation_cellpose_training.zip -O /content/bobiac_data_cellpose/05_segmentation_cellpose_training.zip
# Unzip the data, remove zip file and macOS metadata files (if any)
!cd /content/bobiac_data_cellpose && unzip 05_segmentation_cellpose_training.zip && rm -f 05_segmentation_cellpose_training.zip && rm -rf __MACOSX

## <mark style="color: black; background-color: rgb(127,196,125); padding: 3px; border-radius: 5px;">Install Cellpose</mark>


In [None]:
# !pip install cellpose

## <mark style="color: black; background-color: rgb(127,196,125); padding: 3px; border-radius: 5px;">Import Libraries</mark>

## <mark style="color: black; background-color: rgb(127,196,125); padding: 3px; border-radius: 5px;">Setup</mark>

## <mark style="color: black; background-color: rgb(127,196,125); padding: 3px; border-radius: 5px;">Data Handling</mark>

For training, Cellpose expects:
- A folder of raw images (e.g., TIFF or PNG)
- A matching folder of masks, where each mask corresponds to an image and contains labeled regions

You‚Äôll also need to **split your data** into a training set and a test set. This allows the model to learn from one portion of the data, and then be evaluated on a separate portion it hasn't seen before.

> ‚úÖ The images and masks must have the **same filenames** (e.g., `img001.png` and `img001_masks.png`) so Cellpose can pair them correctly.

During training, Cellpose will:
- Load batches of training images
- Compare its predictions to the ground-truth masks
- Adjust itself (via backpropagation) to reduce errors over time

Keep your training and test folders organized and double-check for any mismatches.

### <mark style="color: black; background-color: rgb(190,223,185); padding: 3px; border-radius: 5px;">Init the Model</mark>

Before we can train a new model, we need to initialize Cellpose with the correct settings.

Here, we‚Äôll:
- Specify the **model type** (e.g., "cpsam" (default), "cyto" or "nuclei") to use as a base model
- Set the **channels** depending on how your images are structured (e.g., single-channel grayscale, or dual-channel with nuclei and cytoplasm)
- Choose where to **save the model weights** during training

> üí° Even when training a new model, Cellpose builds on a pre-trained backbone (unless you explicitly start from scratch). This helps it learn faster and perform better‚Äîespecially on small datasets.


## <mark style="color: black; background-color: rgb(127,196,125); padding: 3px; border-radius: 5px;">Train New Model</mark>

Now we‚Äôre ready to train! In this step, we‚Äôll tell Cellpose to:
- Use the training images and masks
- Save the trained model to your specified directory
- Run for a defined number of **epochs** (iterations over the full dataset)

You can also set other options like:
- Learning rate
- Batch size
- Whether to use GPU

> üí° Training time will vary depending on your dataset size and hardware. On Google Colab with a GPU, small datasets may train in just a few minutes.

After training, the model weights will be saved and ready to use for predictions. We‚Äôll evaluate performance on the test data in the next step.


## <mark style="color: black; background-color: rgb(127,196,125); padding: 3px; border-radius: 5px;">Evaluate on test data</mark>