wsipipe.preprocess.patching package

Patches are generated according to settings of patch finders. Patches are then stored as patchsets.

patch_finder module

Patch Finders describe how patches are created for a slide.

They work on a labelled image, that is a numpy array with integers giving the annotation category for each pixel.

The input labelled image can be at any level of the pyramid for which a numpy array for that size can fit into memory.

A patch finder will create a dataframe with columns x, y, label where x and y represents the top left corner of the patch and label is the label applied to the patch.

class GridPatchFinder(labels_level, patch_level, patch_size, stride, border=0, jitter=0, remove_background=True, pool_mode='max')[source]

Bases: PatchFinder

Parameters
  • labels_level (int) –

  • patch_level (int) –

  • patch_size (int) –

  • stride (int) –

  • border (int) –

  • jitter (int) –

  • remove_background (bool) –

  • pool_mode (str) –

labels_level()[source]
class PatchFinder[source]

Bases: object

Generic patch finder class

Parameters
  • labels_image (np.array) – The whole slide image represented as a 2d numpy array, the classification is given by an integer. For example an image such as those output by AnnotationSet.render

  • slide_shape (Size) – The size of the WSI at the level at which the labels are rendered. This may be different to the labels image shape, as the labels image may not include blank parts of the slide in the bottom right.

abstract property labels_level
class RandomPatchFinder(labels_level, patch_level, patch_size, border=0, npatches=1000, pool_mode='mode')[source]

Bases: PatchFinder

Parameters
  • labels_level (int) –

  • patch_level (int) –

  • patch_size (int) –

  • border (int) –

  • npatches (int) –

  • pool_mode (str) –

labels_level()[source]

patchset module

PatchSets are sets of patches and all the information required to create them from the slides.

Many patches in the set may use the same details, (which we call PatchSettings):
  • the path of the slide to read from

  • the level of the slide at which to create the patch

  • the size of the patch to be created

  • how to load the slide

To create an individual patch, you need to know:
  • the top left position of the patch

  • the label to be applied to the patch

Therefore the PatchSets are a dataframe and a settings list.

The settings list is a list of PatchSettings each of which contains:

slide_path, level, patch_size, loader

In the dataframe each row represents a patch and contains columns:

x (top), y (left), label, settings (index to list)

class PatchSet(df, settings)[source]

Bases: object

Parameters
  • df (pandas.DataFrame) –

  • settings (List[PatchSetting]) –

description()[source]

Returns basic summary of patchset

returns the labels and the total number of patches of each label

export_patches(output_dir)[source]

Creates all patches in a patch set

Writes patches in subdirectories of their label Patches are name slide_path_x_y_level_patch_size.png

Parameters

output_dir (Path) – the directory in which the patches are saved

Return type

None

classmethod load(path)[source]

Loads a PatchSet from disk

Assumes: The dataframe is saved to a csv called frame.csv The settings are saved in a text file called settings.json

Parameters

path (Path) – the directory in which the patchset is saved

Return type

PatchSet

save(path)[source]

Saves a PatchSet to disk

The dataframe is saved to a csv called frame.csv The settings are saved in a text file called settings.json

Parameters

path (Path) – the directory in which to save the patchset

Return type

None

class PatchSetting(level, patch_size, slide_path, loader)[source]

Bases: object

Patch Setting Definition

Parameters
  • level (int) – The level at which patches are extracted

  • patch_size (int) – The size of patches to be created assumes square

  • slide_path (Path) – the path to the whole slide image

  • loader (Loader) – A method for loading the slide

classmethod from_sdict(sdict)[source]

Converts a dictionary to a PatchSetting

Parameters

sdict (dict) –

level: int
loader: Loader
patch_size: int
slide_path: Path
to_sdict()[source]

Writes a PatchSetting to a dictionary so it can be saved to disk

patchset_utils module

Utilities for creating sets of patches

combine(patchsets)[source]

Combines multiple patchsets into one

This gives a combined dataframe with all patches in a dataset, for example to use to sample patches. It also renumbers settings so that indexes in dataframe match correct setting in combined_settings list

Parameters

patchsets (List[PatchSets]) – A list of PatchSets

Returns

A combined patchset

Return type

PatchSet

load_patchsets_from_directory(patchsets_dir)[source]

Loads PatchSets from a directory

Loads patchsets for a whole dataset stored in subdirectories of patchsets_dir

Parameters

patchsets_dir (Path) – a path to a directory containing subdirectories with PatchSets

Returns

A list of PatchSets one for each slide

Return type

patchset (List[PatchSet])

make_and_save_patchsets_for_dataset(dataset, loader, tissue_detector, patch_finder, output_dir, project_root=PosixPath('/'))[source]

Creates PatchSets for all slides in a dataset

For each slide in the dataset this creates the PatchSet then saves it in a sub directory of the output_dir

Parameters
  • dataset (pd.DataFrame) – a dataframe containing columns slide and annotation

  • loader (Loader) – loader to use to load slide and annotations

  • tissue_detector (TissueDetector) – tissue detector to use to remove background

  • patch_finder (PatchFinder) – patch finder to use to create patches

  • output_dir (Path) – a directory to save the patchsets in

  • project_root (Path, optional) – paths will be stored relative to the project root. Defaults to root (absolute paths)

Returns

A list of PatchSets one for each slide

Return type

patchset (List[PatchSet])

make_patchset_for_slide(slide_path, annot_path, loader, tissue_detector, patch_finder, project_root=PosixPath('/'))[source]

Creates a patchset for a single slide

This creates a PatchSet for a single slide.

Parameters
  • slide_path (Path) – path to whole slide image

  • annot_path (Path) – annotation information for slide

  • loader (Loader) – loader to use to load slide and annotations

  • tissue_detector (TissueDetector) – tissue detector to use to remove background

  • patch_finder (PatchFinder) – patch finder to use to create patches

  • project_root (Path, optional) – paths will be stored relative to the project root. Defaults to root (absolute paths)

Returns

A PatchSet for the slide

Return type

patchset (PatchSet)

make_patchsets_for_dataset(dataset, loader, tissue_detector, patch_finder, project_root=PosixPath('/'))[source]

Creates PatchSets for all slides in a dataset

For each slide in the dataset this creates the PatchSet

Parameters
  • dataset (pd.DataFrame) – a dataframe containing columns slide and annotation

  • loader (Loader) – loader to use to load slide and annotations

  • tissue_detector (TissueDetector) – tissue detector to use to remove background

  • patch_finder (PatchFinder) – patch finder to use to create patches

  • project_root (Path, optional) – paths will be stored relative to the project root. Defaults to root (absolute paths)

Returns

A list of PatchSets one for each slide

Return type

patchset (List[PatchSet])

visualise_patches_on_slide(ps, vis_level, project_root=PosixPath('/'))[source]

Draws patches on a thumbnail of the slide

Visualise where on the slide the patches occur. Assumes a patch set for one slide with only one set of setting

Parameters
  • ps (PatchSet) – A PatchSet for one slide

  • vis_level (int) – the level at which to create a slide image to draw patches on

  • project_root (Path) –

Returns

A thumbnail of the slide with patch locations drawn on

Return type

thumb (Image)