CytoNormPy - AnnData objects

CytoNormPy - AnnData objects#

In this vignette, we showcase a typical analysis workflow using anndata objects.

First, we import the necessary libraries and create the anndata object.

[1]:
import cytonormpy as cnp

import anndata as ad
import pandas as pd
import os
import numpy as np


from cytonormpy import FCSFile

AnnData creation#

We use the internal representation to create an AnnData object as follows:

[2]:
def _fcs_to_anndata(input_directory, file, file_no, metadata) -> ad.AnnData:
    fcs = FCSFile(input_directory=input_directory, file_name=file)
    events = fcs.original_events
    md_row = metadata.loc[metadata["file_name"] == file, :].to_numpy()
    obs = np.repeat(md_row, events.shape[0], axis=0)
    var_frame = fcs.channels
    obs_frame = pd.DataFrame(
        data=obs,
        columns=metadata.columns,
        index=pd.Index([f"{file_no}-{str(i)}" for i in range(events.shape[0])]),
    )
    adata = ad.AnnData(obs=obs_frame, var=var_frame, layers={"compensated": events})
    adata.obs_names_make_unique()
    adata.var_names_make_unique()
    return adata
[3]:
input_directory = "../_resources/"
fcs_files = [
    "Gates_PTLG021_Unstim_Control_1.fcs",
    "Gates_PTLG021_Unstim_Control_2.fcs",
    "Gates_PTLG028_Unstim_Control_1.fcs",
    "Gates_PTLG028_Unstim_Control_2.fcs",
    "Gates_PTLG034_Unstim_Control_1.fcs",
    "Gates_PTLG034_Unstim_Control_2.fcs",
]
adatas = []
metadata = pd.read_csv(os.path.join(input_directory, "metadata_sid.csv"))
for file_no, file in enumerate(fcs_files):
    adatas.append(_fcs_to_anndata(input_directory, file, file_no, metadata))

dataset = ad.concat(adatas, axis=0, join="outer", merge="same")
dataset.obs = dataset.obs.astype("object")
dataset.var = dataset.var.astype("object")
dataset.obs_names_make_unique()
dataset.var_names_make_unique()
[4]:
dataset
[4]:
AnnData object with n_obs × n_vars = 6000 × 55
    obs: 'file_name', 'reference', 'batch', 'sample_ID'
    var: 'pns', 'png', 'pne', 'channel_numbers'
    layers: 'compensated'

Data setup#

We instantiate the cytonorm object and add a data transformer that will transform our data to the asinh space and the clusterer that will cluster the cells.

[5]:
cn = cnp.CytoNorm()

t = cnp.AsinhTransformer()
fs = cnp.FlowSOM(n_clusters=10)

cn.add_transformer(t)
cn.add_clusterer(fs)

Next, we run the run_anndata_setup() method.

[6]:
cn.run_anndata_setup(dataset, layer="compensated", key_added="normalized", n_cells_reference=1000)

CV thresholding#

For clustering, it is important to visualize the distribution of files within one cluster. We have already added a FlowSOM Clusterer instance. the function ‘calculate_cluster_cvs’ will now calculate, for each metacluster number that we want to analyze, the cluster cv per sample.

We then visualize it via a waterfall plot as in the original CytoNorm implementation in R.

CytoNorm2.0: We can now use a different set of markers for clustering using the ‘markers’ parameter. If you want to use all markers, do not pass anything!

[7]:
markers_for_clustering = dataset.var_names[4:15].tolist()

cn.calculate_cluster_cvs(n_metaclusters=list(range(3, 15)), markers=markers_for_clustering)
cnp.pl.cv_heatmap(cn, n_metaclusters=list(range(3, 15)), max_cv=2)
../_images/vignettes_cnp_anndata_11_0.png

Clustering#

We run the FlowSOM clustering and pass a cluster_cv_threshold of 2. This value is used to evaluate if the distribution of files within one cluster is sufficient. A warning will be raised if that is not the case. We can see from above that, regardless of which metacluster number we choose, this will not be the case!

[8]:
cn.run_clustering(markers=markers_for_clustering, cluster_cv_threshold=2)

Calculation#

Finally, we calculate the quantiles per batch and cluster, calculate the spline functions and transform the expression values accordingly.

The data will automatically be saved to the anndata object in the layer “normalized”. In order to change the layer name, use the keyword key_added in the run_anndata_setup() method from above.

[9]:
cn.calculate_quantiles()
cn.calculate_splines(goal="batch_mean")
cn.normalize_data()
C:\Users\tarik\anaconda3\envs\cytonorm\lib\site-packages\cytonormpy\_cytonorm\_cytonorm.py:524: UserWarning: 24 cells detected in batch 1 for cluster 3. Skipping quantile calculation.
  warnings.warn(warning_msg, UserWarning)
C:\Users\tarik\anaconda3\envs\cytonorm\lib\site-packages\cytonormpy\_cytonorm\_cytonorm.py:524: UserWarning: 7 cells detected in batch 1 for cluster 4. Skipping quantile calculation.
  warnings.warn(warning_msg, UserWarning)
C:\Users\tarik\anaconda3\envs\cytonorm\lib\site-packages\cytonormpy\_cytonorm\_cytonorm.py:524: UserWarning: 17 cells detected in batch 1 for cluster 7. Skipping quantile calculation.
  warnings.warn(warning_msg, UserWarning)
C:\Users\tarik\anaconda3\envs\cytonorm\lib\site-packages\cytonormpy\_cytonorm\_cytonorm.py:524: UserWarning: 6 cells detected in batch 1 for cluster 8. Skipping quantile calculation.
  warnings.warn(warning_msg, UserWarning)
C:\Users\tarik\anaconda3\envs\cytonorm\lib\site-packages\cytonormpy\_cytonorm\_cytonorm.py:524: UserWarning: 2 cells detected in batch 1 for cluster 9. Skipping quantile calculation.
  warnings.warn(warning_msg, UserWarning)
C:\Users\tarik\anaconda3\envs\cytonorm\lib\site-packages\cytonormpy\_cytonorm\_cytonorm.py:524: UserWarning: 24 cells detected in batch 1 for cluster 10. Skipping quantile calculation.
  warnings.warn(warning_msg, UserWarning)
C:\Users\tarik\anaconda3\envs\cytonorm\lib\site-packages\cytonormpy\_cytonorm\_cytonorm.py:524: UserWarning: 8 cells detected in batch 1 for cluster 11. Skipping quantile calculation.
  warnings.warn(warning_msg, UserWarning)
C:\Users\tarik\anaconda3\envs\cytonorm\lib\site-packages\cytonormpy\_cytonorm\_cytonorm.py:524: UserWarning: 24 cells detected in batch 1 for cluster 13. Skipping quantile calculation.
  warnings.warn(warning_msg, UserWarning)
C:\Users\tarik\anaconda3\envs\cytonorm\lib\site-packages\cytonormpy\_cytonorm\_cytonorm.py:524: UserWarning: 43 cells detected in batch 2 for cluster 0. Skipping quantile calculation.
  warnings.warn(warning_msg, UserWarning)
C:\Users\tarik\anaconda3\envs\cytonorm\lib\site-packages\cytonormpy\_cytonorm\_cytonorm.py:524: UserWarning: 26 cells detected in batch 2 for cluster 3. Skipping quantile calculation.
  warnings.warn(warning_msg, UserWarning)
C:\Users\tarik\anaconda3\envs\cytonorm\lib\site-packages\cytonormpy\_cytonorm\_cytonorm.py:524: UserWarning: 21 cells detected in batch 2 for cluster 4. Skipping quantile calculation.
  warnings.warn(warning_msg, UserWarning)
C:\Users\tarik\anaconda3\envs\cytonorm\lib\site-packages\cytonormpy\_cytonorm\_cytonorm.py:524: UserWarning: 17 cells detected in batch 2 for cluster 6. Skipping quantile calculation.
  warnings.warn(warning_msg, UserWarning)
C:\Users\tarik\anaconda3\envs\cytonorm\lib\site-packages\cytonormpy\_cytonorm\_cytonorm.py:524: UserWarning: 16 cells detected in batch 2 for cluster 7. Skipping quantile calculation.
  warnings.warn(warning_msg, UserWarning)
C:\Users\tarik\anaconda3\envs\cytonorm\lib\site-packages\cytonormpy\_cytonorm\_cytonorm.py:524: UserWarning: 3 cells detected in batch 2 for cluster 8. Skipping quantile calculation.
  warnings.warn(warning_msg, UserWarning)
C:\Users\tarik\anaconda3\envs\cytonorm\lib\site-packages\cytonormpy\_cytonorm\_cytonorm.py:524: UserWarning: 8 cells detected in batch 2 for cluster 9. Skipping quantile calculation.
  warnings.warn(warning_msg, UserWarning)
C:\Users\tarik\anaconda3\envs\cytonorm\lib\site-packages\cytonormpy\_cytonorm\_cytonorm.py:524: UserWarning: 9 cells detected in batch 2 for cluster 11. Skipping quantile calculation.
  warnings.warn(warning_msg, UserWarning)
C:\Users\tarik\anaconda3\envs\cytonorm\lib\site-packages\cytonormpy\_cytonorm\_cytonorm.py:524: UserWarning: 9 cells detected in batch 2 for cluster 13. Skipping quantile calculation.
  warnings.warn(warning_msg, UserWarning)
C:\Users\tarik\anaconda3\envs\cytonorm\lib\site-packages\cytonormpy\_cytonorm\_cytonorm.py:524: UserWarning: 37 cells detected in batch 3 for cluster 3. Skipping quantile calculation.
  warnings.warn(warning_msg, UserWarning)
C:\Users\tarik\anaconda3\envs\cytonorm\lib\site-packages\cytonormpy\_cytonorm\_cytonorm.py:524: UserWarning: 14 cells detected in batch 3 for cluster 7. Skipping quantile calculation.
  warnings.warn(warning_msg, UserWarning)
C:\Users\tarik\anaconda3\envs\cytonorm\lib\site-packages\cytonormpy\_cytonorm\_cytonorm.py:524: UserWarning: 4 cells detected in batch 3 for cluster 8. Skipping quantile calculation.
  warnings.warn(warning_msg, UserWarning)
C:\Users\tarik\anaconda3\envs\cytonorm\lib\site-packages\cytonormpy\_cytonorm\_cytonorm.py:524: UserWarning: 6 cells detected in batch 3 for cluster 9. Skipping quantile calculation.
  warnings.warn(warning_msg, UserWarning)
C:\Users\tarik\anaconda3\envs\cytonorm\lib\site-packages\cytonormpy\_cytonorm\_cytonorm.py:524: UserWarning: 15 cells detected in batch 3 for cluster 10. Skipping quantile calculation.
  warnings.warn(warning_msg, UserWarning)
C:\Users\tarik\anaconda3\envs\cytonorm\lib\site-packages\cytonormpy\_cytonorm\_cytonorm.py:524: UserWarning: 15 cells detected in batch 3 for cluster 11. Skipping quantile calculation.
  warnings.warn(warning_msg, UserWarning)
C:\Users\tarik\anaconda3\envs\cytonorm\lib\site-packages\cytonormpy\_normalization\_quantile_calc.py:274: RuntimeWarning: Mean of empty slice
  self.distrib = mean_func(expr_quantiles._expr_quantiles, axis=self._batch_axis)
normalized file Gates_PTLG021_Unstim_Control_1.fcs
normalized file Gates_PTLG021_Unstim_Control_2.fcs
normalized file Gates_PTLG034_Unstim_Control_1.fcs
normalized file Gates_PTLG028_Unstim_Control_1.fcs
normalized file Gates_PTLG028_Unstim_Control_2.fcs
normalized file Gates_PTLG034_Unstim_Control_2.fcs
[10]:
dataset
[10]:
AnnData object with n_obs × n_vars = 6000 × 55
    obs: 'file_name', 'reference', 'batch', 'sample_ID'
    var: 'pns', 'png', 'pne', 'channel_numbers'
    layers: 'compensated', 'normalized'

In order to run the algorithm on new data, we can just pass the updated anndata and specify the necessary file names.

We will first create the new anndata object that contains an additional file.

[11]:
filename = "Gates_PTLG034_Unstim_Control_2_dup.fcs"
metadata = pd.DataFrame(data=[[filename, "other", 3]], columns=["file_name", "reference", "batch"])
new_adata = _fcs_to_anndata(input_directory, filename, 7, metadata)

dataset = ad.concat([dataset, new_adata], axis=0, join="outer")
dataset
[11]:
AnnData object with n_obs × n_vars = 7000 × 55
    obs: 'file_name', 'reference', 'batch', 'sample_ID'
    layers: 'compensated', 'normalized'

Currently, all ‘normalized’ values for the new file are NaN:

[12]:
dataset[dataset.obs["file_name"] == filename, :].to_df(layer="normalized").head()
[12]:
Time Event_length Y89Di Pd102Di Pd104Di Pd105Di Pd106Di Pd108Di Pd110Di In113Di ... Yb171Di Yb172Di Yb173Di Yb174Di Lu175Di Yb176Di Ir191Di Ir193Di Pt195Di beadDist
7-0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
7-1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
7-2 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
7-3 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
7-4 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

5 rows × 55 columns

[13]:
cn.normalize_data(adata=dataset, file_names=filename, batches=3)
normalized file Gates_PTLG034_Unstim_Control_2_dup.fcs

The normalized values are now stored inplace!

[14]:
dataset[dataset.obs["file_name"] == filename, :].to_df(layer="normalized").head()
[14]:
Time Event_length Y89Di Pd102Di Pd104Di Pd105Di Pd106Di Pd108Di Pd110Di In113Di ... Yb171Di Yb172Di Yb173Di Yb174Di Lu175Di Yb176Di Ir191Di Ir193Di Pt195Di beadDist
7-0 134.582993 16.0 0.000000 8.679433 8.292034 75.802243 9.135942 102.328946 75.562056 0.000488 ... 0.245267 2.841038 0.077476 9.155870 8.721652 20.603478 42.043896 118.089443 0.000885 27.542093
7-1 307.864990 25.0 0.002169 10.821452 6.606235 142.933832 124.046685 231.742272 313.555878 0.000000 ... 7.293311 3.704368 13.276683 164.605647 6.250109 6.026575 58.298905 108.974573 0.000000 31.969299
7-2 370.299011 13.0 0.003742 32.530681 12.905950 196.132910 46.107563 261.139574 249.113909 0.000000 ... 5.781553 65.096013 2.148641 0.015413 0.010149 1.321422 121.642487 260.703220 0.000000 34.630819
7-3 390.078003 25.0 0.000000 3.518037 5.657144 151.235453 18.623958 176.520250 121.060864 0.000488 ... 3.005988 3.015534 91.248730 30.708479 0.074746 13.267579 81.579132 152.255885 0.000885 32.892003
7-4 723.723999 15.0 0.000000 4.033677 0.000000 23.492430 0.000000 48.940914 30.778446 3.794250 ... 0.000000 0.000000 0.000000 0.180230 0.000000 3.118176 4.195136 9.201713 0.000000 31.036688

5 rows × 55 columns

[15]:
dataset
[15]:
AnnData object with n_obs × n_vars = 7000 × 55
    obs: 'file_name', 'reference', 'batch', 'sample_ID'
    layers: 'compensated', 'normalized'