CytoNormPy - FCS files

CytoNormPy - FCS files#

In this vignette, we showcase a typical analysis workflow using FCS files read from disk.

First, we import the necessary libraries

[1]:
import cytonormpy as cnp

import os
import pandas as pd

Metadata#

In order to tell cytonormpy, which data are references used for calculating the spline functions, we provide it with a metadata file as a pandas.DataFrame. Here, we read it from disk, but you can also create it on the fly using the pandas library.

[2]:
input_directory = "../_resources/"
output_directory = os.path.join(input_directory, "normalized")

if not os.path.exists(output_directory):
    os.mkdir(output_directory)

metadata = pd.read_csv(os.path.join(input_directory, "metadata_sid.csv"))
metadata.head()
[2]:
file_name reference batch sample_ID
0 Gates_PTLG021_Unstim_Control_1.fcs ref 1 1
1 Gates_PTLG021_Unstim_Control_2.fcs other 1 2
2 Gates_PTLG028_Unstim_Control_1.fcs ref 2 3
3 Gates_PTLG028_Unstim_Control_2.fcs other 2 4
4 Gates_PTLG034_Unstim_Control_1.fcs ref 3 5

Data setup#

We instantiate the cytonorm object and add a data transformer that will transform our data to the asinh space and the clusterer that will cluster the cells.

[3]:
cn = cnp.CytoNorm()

t = cnp.AsinhTransformer()
fs = cnp.FlowSOM(n_clusters=4)

cn.add_transformer(t)
cn.add_clusterer(fs)

The channels to be normalized are saved in a text file and will be passed to the run_fcs_data_setup() method.

Alternatively, valid arguments for the channels keyword are “markers” and “all”.

[4]:
coding_detectors = pd.read_csv(input_directory + "coding_detectors.txt", header=None)[0].tolist()
[5]:
cn.run_fcs_data_setup(
    input_directory=input_directory,
    metadata=metadata,
    channels=coding_detectors,
    output_directory=output_directory,
    prefix="Norm",
)

CV thresholding#

For clustering, it is important to visualize the distribution of files within one cluster. We have already added a FlowSOM Clusterer instance. the function ‘calculate_cluster_cvs’ will now calculate, for each metacluster number that we want to analyze, the cluster cv per sample.

We then visualize it via a waterfall plot as in the original CytoNorm implementation in R.

CytoNorm2.0: We can now use a different set of markers for clustering using the ‘markers’ parameter. If you want to use all markers, do not pass anything!

[6]:
markers_for_clustering = coding_detectors[4:15]

cn.calculate_cluster_cvs(n_metaclusters=list(range(3, 15)), markers=markers_for_clustering)
cnp.pl.cv_heatmap(cn, n_metaclusters=list(range(3, 15)), max_cv=2)
../_images/vignettes_cnp_fcs_file_10_0.png

Clustering#

We run the FlowSOM clustering and pass a cluster_cv_threshold of 2. This value is used to evaluate if the distribution of files within one cluster is sufficient. A warning will be raised if that is not the case.

[7]:
cn.run_clustering(markers=markers_for_clustering, cluster_cv_threshold=2)

Calculation#

Finally, we calculate the quantiles per batch and cluster, calculate the spline functions and transform the expression values accordingly.

The data will automatically be saved to disk using the prefix Norm_. To change that prefix, pass the keyword prefix to the .run_fcs_data_setup() method above.

[8]:
cn.calculate_quantiles()
cn.calculate_splines(goal="batch_mean")
cn.normalize_data()
C:\Users\tarik\anaconda3\envs\cytonorm\lib\site-packages\cytonormpy\_cytonorm\_cytonorm.py:524: UserWarning: 23 cells detected in batch 1 for cluster 1. Skipping quantile calculation.
  warnings.warn(warning_msg, UserWarning)
C:\Users\tarik\anaconda3\envs\cytonorm\lib\site-packages\cytonormpy\_cytonorm\_cytonorm.py:524: UserWarning: 32 cells detected in batch 1 for cluster 3. Skipping quantile calculation.
  warnings.warn(warning_msg, UserWarning)
C:\Users\tarik\anaconda3\envs\cytonorm\lib\site-packages\cytonormpy\_cytonorm\_cytonorm.py:524: UserWarning: 6 cells detected in batch 1 for cluster 4. Skipping quantile calculation.
  warnings.warn(warning_msg, UserWarning)
C:\Users\tarik\anaconda3\envs\cytonorm\lib\site-packages\cytonormpy\_cytonorm\_cytonorm.py:524: UserWarning: 41 cells detected in batch 1 for cluster 6. Skipping quantile calculation.
  warnings.warn(warning_msg, UserWarning)
C:\Users\tarik\anaconda3\envs\cytonorm\lib\site-packages\cytonormpy\_cytonorm\_cytonorm.py:524: UserWarning: 15 cells detected in batch 1 for cluster 7. Skipping quantile calculation.
  warnings.warn(warning_msg, UserWarning)
C:\Users\tarik\anaconda3\envs\cytonorm\lib\site-packages\cytonormpy\_cytonorm\_cytonorm.py:524: UserWarning: 5 cells detected in batch 1 for cluster 8. Skipping quantile calculation.
  warnings.warn(warning_msg, UserWarning)
C:\Users\tarik\anaconda3\envs\cytonorm\lib\site-packages\cytonormpy\_cytonorm\_cytonorm.py:524: UserWarning: 3 cells detected in batch 1 for cluster 9. Skipping quantile calculation.
  warnings.warn(warning_msg, UserWarning)
C:\Users\tarik\anaconda3\envs\cytonorm\lib\site-packages\cytonormpy\_cytonorm\_cytonorm.py:524: UserWarning: 17 cells detected in batch 1 for cluster 10. Skipping quantile calculation.
  warnings.warn(warning_msg, UserWarning)
C:\Users\tarik\anaconda3\envs\cytonorm\lib\site-packages\cytonormpy\_cytonorm\_cytonorm.py:524: UserWarning: 2 cells detected in batch 1 for cluster 12. Skipping quantile calculation.
  warnings.warn(warning_msg, UserWarning)
C:\Users\tarik\anaconda3\envs\cytonorm\lib\site-packages\cytonormpy\_cytonorm\_cytonorm.py:524: UserWarning: 9 cells detected in batch 1 for cluster 13. Skipping quantile calculation.
  warnings.warn(warning_msg, UserWarning)
C:\Users\tarik\anaconda3\envs\cytonorm\lib\site-packages\cytonormpy\_cytonorm\_cytonorm.py:524: UserWarning: 14 cells detected in batch 2 for cluster 1. Skipping quantile calculation.
  warnings.warn(warning_msg, UserWarning)
C:\Users\tarik\anaconda3\envs\cytonorm\lib\site-packages\cytonormpy\_cytonorm\_cytonorm.py:524: UserWarning: 43 cells detected in batch 2 for cluster 3. Skipping quantile calculation.
  warnings.warn(warning_msg, UserWarning)
C:\Users\tarik\anaconda3\envs\cytonorm\lib\site-packages\cytonormpy\_cytonorm\_cytonorm.py:524: UserWarning: 8 cells detected in batch 2 for cluster 4. Skipping quantile calculation.
  warnings.warn(warning_msg, UserWarning)
C:\Users\tarik\anaconda3\envs\cytonorm\lib\site-packages\cytonormpy\_cytonorm\_cytonorm.py:524: UserWarning: 7 cells detected in batch 2 for cluster 7. Skipping quantile calculation.
  warnings.warn(warning_msg, UserWarning)
C:\Users\tarik\anaconda3\envs\cytonorm\lib\site-packages\cytonormpy\_cytonorm\_cytonorm.py:524: UserWarning: 10 cells detected in batch 2 for cluster 8. Skipping quantile calculation.
  warnings.warn(warning_msg, UserWarning)
C:\Users\tarik\anaconda3\envs\cytonorm\lib\site-packages\cytonormpy\_cytonorm\_cytonorm.py:524: UserWarning: 1 cells detected in batch 2 for cluster 9. Skipping quantile calculation.
  warnings.warn(warning_msg, UserWarning)
C:\Users\tarik\anaconda3\envs\cytonorm\lib\site-packages\cytonormpy\_cytonorm\_cytonorm.py:524: UserWarning: 14 cells detected in batch 2 for cluster 10. Skipping quantile calculation.
  warnings.warn(warning_msg, UserWarning)
C:\Users\tarik\anaconda3\envs\cytonorm\lib\site-packages\cytonormpy\_cytonorm\_cytonorm.py:524: UserWarning: 49 cells detected in batch 2 for cluster 11. Skipping quantile calculation.
  warnings.warn(warning_msg, UserWarning)
C:\Users\tarik\anaconda3\envs\cytonorm\lib\site-packages\cytonormpy\_cytonorm\_cytonorm.py:524: UserWarning: 1 cells detected in batch 2 for cluster 12. Skipping quantile calculation.
  warnings.warn(warning_msg, UserWarning)
C:\Users\tarik\anaconda3\envs\cytonorm\lib\site-packages\cytonormpy\_cytonorm\_cytonorm.py:524: UserWarning: 3 cells detected in batch 2 for cluster 13. Skipping quantile calculation.
  warnings.warn(warning_msg, UserWarning)
C:\Users\tarik\anaconda3\envs\cytonorm\lib\site-packages\cytonormpy\_cytonorm\_cytonorm.py:524: UserWarning: 11 cells detected in batch 3 for cluster 1. Skipping quantile calculation.
  warnings.warn(warning_msg, UserWarning)
C:\Users\tarik\anaconda3\envs\cytonorm\lib\site-packages\cytonormpy\_cytonorm\_cytonorm.py:524: UserWarning: 12 cells detected in batch 3 for cluster 4. Skipping quantile calculation.
  warnings.warn(warning_msg, UserWarning)
C:\Users\tarik\anaconda3\envs\cytonorm\lib\site-packages\cytonormpy\_cytonorm\_cytonorm.py:524: UserWarning: 47 cells detected in batch 3 for cluster 6. Skipping quantile calculation.
  warnings.warn(warning_msg, UserWarning)
C:\Users\tarik\anaconda3\envs\cytonorm\lib\site-packages\cytonormpy\_cytonorm\_cytonorm.py:524: UserWarning: 24 cells detected in batch 3 for cluster 7. Skipping quantile calculation.
  warnings.warn(warning_msg, UserWarning)
C:\Users\tarik\anaconda3\envs\cytonorm\lib\site-packages\cytonormpy\_cytonorm\_cytonorm.py:524: UserWarning: 6 cells detected in batch 3 for cluster 8. Skipping quantile calculation.
  warnings.warn(warning_msg, UserWarning)
C:\Users\tarik\anaconda3\envs\cytonorm\lib\site-packages\cytonormpy\_cytonorm\_cytonorm.py:524: UserWarning: 7 cells detected in batch 3 for cluster 9. Skipping quantile calculation.
  warnings.warn(warning_msg, UserWarning)
C:\Users\tarik\anaconda3\envs\cytonorm\lib\site-packages\cytonormpy\_cytonorm\_cytonorm.py:524: UserWarning: 23 cells detected in batch 3 for cluster 10. Skipping quantile calculation.
  warnings.warn(warning_msg, UserWarning)
C:\Users\tarik\anaconda3\envs\cytonorm\lib\site-packages\cytonormpy\_cytonorm\_cytonorm.py:524: UserWarning: 40 cells detected in batch 3 for cluster 11. Skipping quantile calculation.
  warnings.warn(warning_msg, UserWarning)
C:\Users\tarik\anaconda3\envs\cytonorm\lib\site-packages\cytonormpy\_cytonorm\_cytonorm.py:524: UserWarning: 7 cells detected in batch 3 for cluster 12. Skipping quantile calculation.
  warnings.warn(warning_msg, UserWarning)
C:\Users\tarik\anaconda3\envs\cytonorm\lib\site-packages\cytonormpy\_cytonorm\_cytonorm.py:524: UserWarning: 11 cells detected in batch 3 for cluster 13. Skipping quantile calculation.
  warnings.warn(warning_msg, UserWarning)
C:\Users\tarik\anaconda3\envs\cytonorm\lib\site-packages\cytonormpy\_normalization\_quantile_calc.py:274: RuntimeWarning: Mean of empty slice
  self.distrib = mean_func(expr_quantiles._expr_quantiles, axis=self._batch_axis)
normalized file Gates_PTLG028_Unstim_Control_1.fcs
normalized file Gates_PTLG021_Unstim_Control_1.fcs
normalized file Gates_PTLG034_Unstim_Control_1.fcs
normalized file Gates_PTLG028_Unstim_Control_2.fcs
normalized file Gates_PTLG021_Unstim_Control_2.fcs
normalized file Gates_PTLG034_Unstim_Control_2.fcs
C:\Users\tarik\anaconda3\envs\cytonorm\lib\site-packages\cytonormpy\_dataset\_dataset.py:376: RuntimeWarning: overflow encountered in cast
  orig_events[:, channel_indices] = inv_transformed.values

In order to use the cytonorm object on new data, simply pass the filenames and the batch information. You can either pass a single filename or provide a list.

[9]:
cn.normalize_data(file_names="Gates_PTLG034_Unstim_Control_2_dup.fcs", batches=3)
normalized file Gates_PTLG034_Unstim_Control_2_dup.fcs