Merging datasets and partial training¶

This vignette demonstrates how to merge datasets, which are present in different zarr files. The vignette will also demonstrate the steps for performing partial training. Partial PCA training is a lightweight alternative to perform batch effect correction, that often helps obtain a well-integrated embedding and clustering.

[1]:

%config InlineBackend.figure_format = 'retina'
%load_ext autotime

import scarf
scarf.__version__

[1]:

'0.8.5'

time: 952 ms (started: 2021-08-22 18:42:50 +00:00)

1) Fetch datasets in Zarr format¶

Here we will use the same datasets are we use in the ‘data projection’ vignette. We download the files in zarr format.

[2]:

scarf.fetch_dataset('kang_15K_pbmc_rnaseq', save_path='scarf_datasets', as_zarr=True)
scarf.fetch_dataset('kang_14K_ifnb-pbmc_rnaseq', save_path='scarf_datasets', as_zarr=True)

INFO: Download started...
INFO: Download finished! File saved here: /home/docs/checkouts/readthedocs.org/user_builds/scarf/checkouts/0.8.5/docs/source/vignettes/scarf_datasets/kang_15K_pbmc_rnaseq/data.zarr.tar.gz
INFO: Extracting Zarr file for kang_15K_pbmc_rnaseq

2021-08-22 18:43:04 URL:https://storage.googleapis.com/cos-osf-prod-files-de-1/92d89fdae4a3edd779343681655a0c41303eace73f9eefed57667fabc99b7620?response-content-disposition=attachment%3B%20filename%3D%22data.zarr.tar.gz%22%3B%20filename%2A%3DUTF-8%27%27data.zarr.tar.gz&GoogleAccessId=files-de-1%40cos-osf-prod.iam.gserviceaccount.com&Expires=1629657839&Signature=qJsI9I9wb%2FtDFT2soQyDfSf0K0377rplhAZ2dKwPEYg3wAOBuO9YiDa4gQdUzGE07MaffvqaHi6ATuqNmu%2F%2FLGnii37PgvhnaR7LOzZUwP0cFSD%2BQL2RMEgdQpV2LmGOg7R2vqEZsv%2BJRzNffa%2BEqtAtiEBUCtNZEdoU4nFYIBAahnQlMIY%2Feacwc9s2RMugavUVOgW4XJHbH0bRcE8GsaMkNGG%2B8a8IlB5fNkV1CvvVdD5NnIJJtRm2NMUuq8vrg6kJYXQmIrMrxpSuhiMwv15cH358NrVhwipof7kL5n%2FnvOoA6tHPVnvuFhosD3wxA8y3Ke2ioj%2Fodb0sxIR0jQ%3D%3D [54198050/54198050] -> "/home/docs/checkouts/readthedocs.org/user_builds/scarf/checkouts/0.8.5/docs/source/vignettes/scarf_datasets/kang_15K_pbmc_rnaseq/data.zarr.tar.gz" [1]

INFO: Download started...
INFO: Download finished! File saved here: /home/docs/checkouts/readthedocs.org/user_builds/scarf/checkouts/0.8.5/docs/source/vignettes/scarf_datasets/kang_14K_ifnb-pbmc_rnaseq/data.zarr.tar.gz
INFO: Extracting Zarr file for kang_14K_ifnb-pbmc_rnaseq

2021-08-22 18:43:10 URL:https://storage.googleapis.com/cos-osf-prod-files-de-1/7d6eb5c5a9df9b9ab032b5aca53624d2bd1d69dafc588ad110305e104e61c74c?response-content-disposition=attachment%3B%20filename%3D%22data.zarr.tar.gz%22%3B%20filename%2A%3DUTF-8%27%27data.zarr.tar.gz&GoogleAccessId=files-de-1%40cos-osf-prod.iam.gserviceaccount.com&Expires=1629657847&Signature=AgxiDjtJ464KBPTWmaHugJNwy%2BlfMzlF86RSA7IcqaGxY%2F71ClSwLXfh%2Fakb7rbm5IUt4F%2Fq%2FjslNKghOJ%2Fxn%2FLynW4tjeRrATlfPinzAY4cZbAY5hjIpguyuBVnoj%2BPGmKIP0dGym7Zz%2BDv%2FI588coAd851oNrRUoibdTv6471tkDCGAi5aSa1DQ1YmbdtOFl0PBN7HwUyjnQuikFloE46pDuci90prgIc0n%2FAZdi16egn2QxjkLnQWXDkIi29WNwqKXK5niF1kqy6crpuMio27x6CYuvBbamEoHGL4qomjMYq4y18akktsYNlqRh92353BpeulSBdDwKjy5U0wJw%3D%3D [60301820/60301820] -> "/home/docs/checkouts/readthedocs.org/user_builds/scarf/checkouts/0.8.5/docs/source/vignettes/scarf_datasets/kang_14K_ifnb-pbmc_rnaseq/data.zarr.tar.gz" [1]

time: 20.3 s (started: 2021-08-22 18:42:51 +00:00)

The Zarr files need to be loaded as a DataStore before they can be merged:

[3]:

ds_ctrl = scarf.DataStore('scarf_datasets/kang_15K_pbmc_rnaseq/data.zarr', nthreads=4)
ds_ctrl

/home/docs/checkouts/readthedocs.org/user_builds/scarf/envs/0.8.5/lib/python3.8/site-packages/scarf/datastore.py:430: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
  if 'indices' in self.z[i]['projections'][j]:

[3]:

DataStore has 8487 (14619) cells with 1 assays: RNA
   Cell metadata:
            'I', 'ids', 'names', 'RNA_UMAP1', 'RNA_UMAP2',
            'RNA_leiden_cluster', 'RNA_nCounts', 'RNA_nFeatures', 'RNA_percentMito', 'RNA_percentRibo',
            'RNA_unified_UMAP1', 'RNA_unified_UMAP2', 'cluster_labels'
   RNA assay has 11352 (35635) features and following metadata:
            'I', 'ids', 'names', 'I__hvgs', 'dropOuts',
            'nCells'
      Projected samples:
            'stim'
      Co-embeddings:
            'unified_UMAP'

time: 115 ms (started: 2021-08-22 18:43:11 +00:00)

[4]:

ds_stim = scarf.DataStore('scarf_datasets/kang_14K_ifnb-pbmc_rnaseq/data.zarr', nthreads=4)
ds_stim

[4]:

DataStore has 10111 (14446) cells with 1 assays: RNA
   Cell metadata:
            'I', 'ids', 'names', 'RNA_UMAP1', 'RNA_UMAP2',
            'RNA_leiden_cluster', 'RNA_nCounts', 'RNA_nFeatures', 'RNA_percentMito', 'RNA_percentRibo',
            'cluster_labels', 'transferred_labels'
   RNA assay has 11051 (35635) features and following metadata:
            'I', 'ids', 'names', 'I__hvgs', 'dropOuts',
            'nCells'

time: 18 ms (started: 2021-08-22 18:43:11 +00:00)

2) Merging datasets¶

The merging step will make sure that the features are in the same order as in the merged file. The merged data will be dumped into a new Zarr file. ZarrMerge class allows merging multiple samples at the same time. Though only one kind of assays can be added at a time, other modalities for the same cells can be added at a later point.

[5]:

#Can be used to merge multiple assays
scarf.ZarrMerge(zarr_path='scarf_datasets/kang_merged_pbmc_rnaseq.zarr',  # Path where merged Zarr files will be saved
                assays=[ds_ctrl.RNA, ds_stim.RNA],                        # assays to be merged
                names=['ctrl', 'stim'],                                   # these names will be preprended to the cell ids with '__' delimiter
                merge_assay_name='RNA', overwrite=True).write()           # Name of the merged assay. `overwrite` will remove an existing Zarr file.

Writing data to merged file: 100%|██████████| 8/8 [00:10<00:00,  1.27s/it]
Writing data to merged file: 100%|██████████| 8/8 [00:10<00:00,  1.37s/it]

time: 21.5 s (started: 2021-08-22 18:43:11 +00:00)

Load the merged Zarr file as a DataStore:

[6]:

ds = scarf.DataStore('scarf_datasets/kang_merged_pbmc_rnaseq.zarr', nthreads=4)

INFO: Setting assay RNA to assay type: RNAassay
INFO: (RNA) Computing nCells and dropOuts
[########################################] | 100% Completed |  3.9s
INFO: (RNA) Computing nCounts
[########################################] | 100% Completed |  3.8s
WARNING: Minimum cell count (562) is lower than size factor multiplier (1000)
INFO: (RNA) Computing nFeatures
[########################################] | 100% Completed |  4.0s
INFO: Computing percentage of RNA_percentMito
[########################################] | 100% Completed |  2.8s
INFO: Computing percentage of RNA_percentRibo
[########################################] | 100% Completed |  3.1s
time: 18.3 s (started: 2021-08-22 18:43:33 +00:00)

So now we print the merged datastore. The merging removed all the precalculated data. Even the information on which cells were filtered out is lost in the process. This is done deliberately, to allow users to start fresh with the merged dataset.

[7]:

ds

[7]:

DataStore has 29065 (29065) cells with 1 assays: RNA
   Cell metadata:
            'I', 'ids', 'names', 'RNA_nCounts', 'RNA_nFeatures',
            'RNA_percentMito', 'RNA_percentRibo', 'orig_RNA_UMAP1', 'orig_RNA_UMAP2', 'orig_RNA_leiden_cluster',
            'orig_RNA_nCounts', 'orig_RNA_nFeatures', 'orig_RNA_percentMito', 'orig_RNA_percentRibo', 'orig_RNA_unified_UMAP1',
            'orig_RNA_unified_UMAP2', 'orig_cluster_labels', 'orig_transferred_labels'
   RNA assay has 12450 (35635) features and following metadata:
            'I', 'ids', 'names', 'dropOuts', 'nCells',

time: 5.29 ms (started: 2021-08-22 18:43:51 +00:00)

If we have a look at the cell attributes table, we can clearly see the that the sample identity is shown in the ids column, prepended to the barcode.

[8]:

ds.cells.head()

[8]:

	I	ids	names	RNA_nCounts	RNA_nFeatures	RNA_percentMito	RNA_percentRibo	orig_RNA_UMAP1	orig_RNA_UMAP2	orig_RNA_leiden_cluster	orig_RNA_nCounts	orig_RNA_nFeatures	orig_RNA_percentMito	orig_RNA_percentRibo	orig_RNA_unified_UMAP1	orig_RNA_unified_UMAP2	orig_cluster_labels	orig_transferred_labels
0	True	ctrl__AAACATACAATGCC-1	AAACATACAATGCC-1	2191.0	852.0	0.273848	32.131447	2.224133	-7.265071	2	2191.0	852.0	0.273848	32.131447	2.669009	-7.516445	CD4 naive T	nan
1	True	ctrl__AAACATACATTTCC-1	AAACATACATTTCC-1	3018.0	878.0	0.099404	15.440689	-2.979369	14.908970	10	3018.0	878.0	0.099404	15.440689	-2.833214	13.966822	CD 14 Mono	nan
2	True	ctrl__AAACATACCAGAAA-1	AAACATACCAGAAA-1	2481.0	713.0	0.241838	4.957678	4.714710	9.148883	4	2481.0	713.0	0.241838	4.957678	7.227715	10.572861	CD 14 Mono	nan
3	True	ctrl__AAACATACCAGCTA-1	AAACATACCAGCTA-1	3157.0	950.0	0.031676	11.308204	3.747393	13.955492	4	3157.0	950.0	0.031676	11.308204	4.494255	13.256012	CD 14 Mono	nan
4	True	ctrl__AAACATACCATGCA-1	AAACATACCATGCA-1	703.0	337.0	0.426743	10.953058	NaN	NaN	-1	703.0	337.0	0.426743	10.953058	NaN	NaN	nan	nan

time: 41.9 ms (started: 2021-08-22 18:43:51 +00:00)

It can be a good idea to keep track of the cells from different samples, we can fetch out the dataset id from cell-barcodes and add them separately in a new column (this step might get automated in the future).

[9]:

ds.cells.insert(
    column_name='sample_id',
    values=[x.split('__')[0] for x in ds.cells.fetch_all('ids')],
    overwrite=True
)

WARNING: 'values' parameter is of `list` type and not `np.ndarray` as expected. The correct dtype may not be assigned to the column
time: 27.6 ms (started: 2021-08-22 18:43:51 +00:00)

Rather than performing a fresh round of annotation, we will also import the cluster labels from the unmerged datasets. This help us at later steps to evaluate our results.

[10]:

ctrl_labels = list(ds_ctrl.cells.fetch_all('cluster_labels'))
stim_labels = list(ds_stim.cells.fetch_all('cluster_labels'))

ds.cells.insert(
    column_name='imported_labels',
    values=ctrl_labels + stim_labels,
    overwrite=True
)

WARNING: 'values' parameter is of `list` type and not `np.ndarray` as expected. The correct dtype may not be assigned to the column
time: 24 ms (started: 2021-08-22 18:43:51 +00:00)

As well as re-using annotations, we import the information about which cells where kept and which ones where filtered out.

[11]:

ctrl_valid_cells = list(ds_ctrl.cells.fetch_all('I'))
stim_valid_cells = list(ds_stim.cells.fetch_all('I'))

ds.cells.update_key(
    values=ctrl_valid_cells + stim_valid_cells,
    key='I'
)

time: 6.59 ms (started: 2021-08-22 18:43:51 +00:00)

Now we can check the number of cells from each of the samples:

[12]:

ds.cells.to_pandas_dataframe(['sample_id'], key='I')['sample_id'].value_counts()

[12]:

stim    10111
ctrl     8487
Name: sample_id, dtype: int64

time: 12.5 ms (started: 2021-08-22 18:43:51 +00:00)

3) Naive analysis of merged datasets¶

By naive, we mean that we make no attempt to remove/account for the latent factors that might contribute to batch effect or treatment-specific effect. It is usually a good idea to perform a ‘naive’ pipeline to get an idea about the degree of batch effects.

We start with detecting the highly variable genes:

[13]:

ds.mark_hvgs(min_cells=10, top_n=2000, min_mean=-3, max_mean=2, max_var=6)

INFO: (RNA) Computing nCells
[########################################] | 100% Completed |  4.7s
INFO: (RNA) Computing normed_tot
[########################################] | 100% Completed |  4.4s
INFO: (RNA) Computing sigmas
[########################################] | 100% Completed |  5.4s
INFO: 2000 genes marked as HVGs

../_images/vignettes_merging_datasets_25_1.png

time: 17.2 s (started: 2021-08-22 18:43:51 +00:00)

Next, we create a graph of cells in a standard way.

[14]:

ds.make_graph(feat_key='hvgs', k=21, dims=25, n_centroids=100)

INFO: No value provided for parameter `log_transform`. Will use default value: True
INFO: No value provided for parameter `renormalize_subset`. Will use default value: True
INFO: No value provided for parameter `pca_cell_key`. Will use default value: I
INFO: Using PCA for dimension reduction
INFO: No value provided for parameter `ann_metric`. Will use default value: l2
INFO: No value provided for parameter `ann_efc`. Will use default value: min(100, max(k * 3, 50))
INFO: No value provided for parameter `ann_ef`. Will use default value: min(100, max(k * 3, 50))
INFO: No value provided for parameter `ann_m`. Will use default value: 48
INFO: No value provided for parameter `rand_state`. Will use default value: 4466
INFO: No value provided for parameter `local_connectivity`. Will use default value: 1.0
INFO: No value provided for parameter `bandwidth`. Will use default value: 1.5
INFO: Normalizing with feature subset
[########################################] | 100% Completed |  3.2s

Writing data to normed__I__hvgs/data: 100%|██████████| 30/30 [00:05<00:00,  5.64it/s]

INFO: Calculating mean of norm. data
[                                        ] | 0% Completed |  0.0s

[########################################] | 100% Completed |  0.4s
INFO: Calculating std. dev. of norm. data
[########################################] | 100% Completed |  0.5s

Fitting PCA: 100%|██████████| 19/19 [00:31<00:00,  1.65s/it]
Fitting ANN: 100%|██████████| 19/19 [00:02<00:00,  6.82it/s]
Fitting kmeans: 100%|██████████| 19/19 [00:03<00:00,  5.56it/s]
Estimating seed partitions: 100%|██████████| 19/19 [00:02<00:00,  7.15it/s]

INFO: Saving loadings to RNA/normed__I__hvgs/reduction__pca__25__I
INFO: Saving ANN index to RNA/normed__I__hvgs/reduction__pca__25__I/ann__l2__63__63__48__4466

INFO: Saving kmeans clusters to RNA/normed__I__hvgs/reduction__pca__25__I/kmeans__100__4466

Saving KNN graph: 100%|██████████| 19/19 [00:02<00:00,  6.43it/s]

INFO: ANN recall: 99.80%


Smoothening KNN distances: 100%|██████████| 4/4 [00:02<00:00,  1.60it/s]

time: 58.1 s (started: 2021-08-22 18:44:08 +00:00)

Calculating UMAP embedding of cells:

[15]:

ds.run_umap(fit_n_epochs=250, spread=5, min_dist=1, parallel=True)

/home/docs/checkouts/readthedocs.org/user_builds/scarf/envs/0.8.5/lib/python3.8/site-packages/umap/umap_.py:1330: RuntimeWarning: divide by zero encountered in power
  return 1.0 / (1.0 + a * x ** (2 * b))

        completed  0  /  250 epochs
        completed  25  /  250 epochs
        completed  50  /  250 epochs
        completed  75  /  250 epochs
        completed  100  /  250 epochs
        completed  125  /  250 epochs
        completed  150  /  250 epochs
        completed  175  /  250 epochs
        completed  200  /  250 epochs
        completed  225  /  250 epochs
        completed  0  /  100 epochs
        completed  10  /  100 epochs
        completed  20  /  100 epochs
        completed  30  /  100 epochs
        completed  40  /  100 epochs
        completed  50  /  100 epochs
        completed  60  /  100 epochs
        completed  70  /  100 epochs
        completed  80  /  100 epochs
        completed  90  /  100 epochs
time: 39.3 s (started: 2021-08-22 18:45:06 +00:00)

[16]:

ds.cells.head()

[16]:

	I	ids	names	RNA_UMAP1	RNA_UMAP2	RNA_nCounts	RNA_nFeatures	RNA_percentMito	RNA_percentRibo	imported_labels	...	orig_RNA_leiden_cluster	orig_RNA_nCounts	orig_RNA_nFeatures	orig_RNA_percentMito	orig_RNA_percentRibo	orig_RNA_unified_UMAP1	orig_RNA_unified_UMAP2	orig_cluster_labels	orig_transferred_labels	sample_id
0	True	ctrl__AAACATACAATGCC-1	AAACATACAATGCC-1	-13.570444	23.108271	2191.0	852.0	0.273848	32.131447	CD4 naive T	...	2	2191.0	852.0	0.273848	32.131447	2.669009	-7.516445	CD4 naive T	nan	ctrl
1	True	ctrl__AAACATACATTTCC-1	AAACATACATTTCC-1	22.328730	-23.031784	3018.0	878.0	0.099404	15.440689	CD 14 Mono	...	10	3018.0	878.0	0.099404	15.440689	-2.833214	13.966822	CD 14 Mono	nan	ctrl
2	True	ctrl__AAACATACCAGAAA-1	AAACATACCAGAAA-1	32.599670	-18.015129	2481.0	713.0	0.241838	4.957678	CD 14 Mono	...	4	2481.0	713.0	0.241838	4.957678	7.227715	10.572861	CD 14 Mono	nan	ctrl
3	True	ctrl__AAACATACCAGCTA-1	AAACATACCAGCTA-1	28.793385	-18.763035	3157.0	950.0	0.031676	11.308204	CD 14 Mono	...	4	3157.0	950.0	0.031676	11.308204	4.494255	13.256012	CD 14 Mono	nan	ctrl
4	False	ctrl__AAACATACCATGCA-1	AAACATACCATGCA-1	NaN	NaN	703.0	337.0	0.426743	10.953058	nan	...	-1	703.0	337.0	0.426743	10.953058	NaN	NaN	nan	nan	ctrl

5 rows × 22 columns

time: 176 ms (started: 2021-08-22 18:45:46 +00:00)

Visualization of cells from the two samples in the 2D UMAP space:

[17]:

ds.plot_layout(layout_key='RNA_UMAP', color_by='sample_id', cmap='RdBu', legend_ondata=False)

../_images/vignettes_merging_datasets_32_0.png

time: 665 ms (started: 2021-08-22 18:45:46 +00:00)

Visualization of cluster labels in the 2D UMAP space:

[18]:

ds.plot_layout(layout_key='RNA_UMAP', color_by='imported_labels', legend_ondata=False)

../_images/vignettes_merging_datasets_34_0.png

time: 796 ms (started: 2021-08-22 18:45:46 +00:00)

4) Partial PCA training to reduce batch effects¶

The plots above clearly show that the cells from the two samples are distinct on the UMAP space and have not integrated. This clearly indicates a treatment-specific or simply a batch effect between the cells from the two samples. Another interesting pattern in the UMAP plot above is the ‘mirror effect’, i.e. the equivalent clusters from the two samples look like mirror images. This is often seen in the datasets where the heterogenity/cell population composition is not strongly affected by the treatment.

We will now attempt to integrate the cells from the two samples so that we obtain same cell types that do not form separate clusters. One can do this by training the PCA on cells from only one of the samples. Training PCA on cells from only one of the samples will diminish the contribution of genes differentially expressed between the two samples.

First, we need to create a boolean column in the cell attribute table. This column will indicate whether a cell belongs to one of the samples. Here we will create a new column is_ctrl and mark the values as True when a cell belongs to the ctrl sample.

[19]:

ds.cells.insert(column_name=f'is_ctrl',
                           values=(ds.cells.fetch_all('sample_id') == 'ctrl'),
                           overwrite=True)

time: 4.5 ms (started: 2021-08-22 18:45:47 +00:00)

The next step is to perform the partial PCA training. PCA is trained during the graph creation step. We will now use pca_cell_key parameter and set it to is_ctrl so that only ‘ctrl’ cells are used for PCA training.

[20]:

ds.make_graph(feat_key='hvgs', k=21, dims=25, n_centroids=100, pca_cell_key='is_ctrl')

INFO: No value provided for parameter `log_transform`. Will use previously used value: True
INFO: No value provided for parameter `renormalize_subset`. Will use previously used value: True
INFO: Using PCA for dimension reduction
INFO: No value provided for parameter `ann_metric`. Will use default value: l2
INFO: No value provided for parameter `ann_efc`. Will use default value: min(100, max(k * 3, 50))
INFO: No value provided for parameter `ann_ef`. Will use default value: min(100, max(k * 3, 50))
INFO: No value provided for parameter `ann_m`. Will use default value: 48
INFO: No value provided for parameter `rand_state`. Will use default value: 4466
INFO: No value provided for parameter `local_connectivity`. Will use default value: 1.0
INFO: No value provided for parameter `bandwidth`. Will use default value: 1.5
INFO: Using existing normalized data with cell key I and feat key I__hvgs
INFO: Calculating mean of norm. data
[########################################] | 100% Completed |  0.4s
INFO: Calculating std. dev. of norm. data
[########################################] | 100% Completed |  0.5s

Fitting PCA: 100%|██████████| 19/19 [00:12<00:00,  1.47it/s]
Fitting ANN: 100%|██████████| 19/19 [00:03<00:00,  6.31it/s]
Fitting kmeans: 100%|██████████| 19/19 [00:03<00:00,  5.80it/s]
Estimating seed partitions: 100%|██████████| 19/19 [00:02<00:00,  7.49it/s]

INFO: Saving loadings to RNA/normed__I__hvgs/reduction__pca__25__is_ctrl

INFO: Saving ANN index to RNA/normed__I__hvgs/reduction__pca__25__is_ctrl/ann__l2__63__63__48__4466
INFO: Saving kmeans clusters to RNA/normed__I__hvgs/reduction__pca__25__is_ctrl/kmeans__100__4466

Saving KNN graph: 100%|██████████| 19/19 [00:03<00:00,  6.31it/s]

INFO: ANN recall: 99.77%


Smoothening KNN distances: 100%|██████████| 4/4 [00:00<00:00, 27.47it/s]

time: 27.8 s (started: 2021-08-22 18:45:47 +00:00)

We run UMAP as usual, but the UMAP embeddings are saved in a new cell attribute column so as to not overwrite the previous UMAP values. The new column will be called RNA_pUMAP; ‘RNA’ is automatically prepend because the assay name is RNA

[21]:

ds.run_umap(fit_n_epochs=250, spread=5, min_dist=1, parallel=True, label='pUMAP')

/home/docs/checkouts/readthedocs.org/user_builds/scarf/envs/0.8.5/lib/python3.8/site-packages/umap/umap_.py:1330: RuntimeWarning: divide by zero encountered in power
  return 1.0 / (1.0 + a * x ** (2 * b))

        completed  0  /  250 epochs
        completed  25  /  250 epochs
        completed  50  /  250 epochs
        completed  75  /  250 epochs
        completed  100  /  250 epochs
        completed  125  /  250 epochs
        completed  150  /  250 epochs
        completed  175  /  250 epochs
        completed  200  /  250 epochs
        completed  225  /  250 epochs
        completed  0  /  100 epochs
        completed  10  /  100 epochs
        completed  20  /  100 epochs
        completed  30  /  100 epochs
        completed  40  /  100 epochs
        completed  50  /  100 epochs
        completed  60  /  100 epochs
        completed  70  /  100 epochs
        completed  80  /  100 epochs
        completed  90  /  100 epochs
time: 39.8 s (started: 2021-08-22 18:46:15 +00:00)

Visualize the new UMAP

[22]:

ds.plot_layout(layout_key='RNA_pUMAP', color_by='sample_id', cmap='RdBu', legend_ondata=False)

../_images/vignettes_merging_datasets_43_0.png

time: 609 ms (started: 2021-08-22 18:46:55 +00:00)

Visualization of cluster labels in the new UMAP space shows that the cells from the same cell-type do not split into separate clusters like they did before.

[23]:

ds.plot_layout(layout_key='RNA_pUMAP', color_by='imported_labels', legend_ondata=False)

../_images/vignettes_merging_datasets_45_0.png

time: 787 ms (started: 2021-08-22 18:46:55 +00:00)

That is all for this vignette.