Multi-modal#

Warning

This is, for now, just a stub.

Here, we’ll showcase how to curate and register ECCITE-seq data from Papalexi21 in the form of MuData objects. ECCITE-seq is designed to enable interrogation of single-cell transcriptomes together with surface protein markers in the context of CRISPR screens.

Setup#

!lamin init --storage ./test-multimodal --schema bionty

import lamindb as ln
import lnschema_bionty as lb

lb.settings.species = "human"

💡 loaded instance: testuser1/test-multimodal (lamindb 0.56a1)

ln.track()

💡 notebook imports: lamindb==0.56a1 lnschema_bionty==0.32.0

💡 Transform(id=1, uid='yMWSFirS6qv2z8', name='Multi-modal', short_name='multimodal', version='0', type=notebook, updated_at=2023-10-16 21:51:03, created_by_id=1)

💡 Run(id=1, uid='6NSIfP0PKlynxSiJxA66', run_at=2023-10-16 21:51:03, transform_id=1, created_by_id=1)

Papalexi21#

Let’s use a MuData object:

Transform #

MuData objects build on top of AnnData objects to store and serialize multimodal data. More information can be found on the MuData documentation.

First we register the file:

file = ln.File(
    "papalexi21_subset.h5mu", description="Sub-sampled MuData from Papalexi21"
)
file.save()

Now let’s validate and register the 3 feature sets this data contains:

RNA (gene expression)
ADT (antibody derived tags reflecting surface proteins)
obs (metadata)

For the two modalities rna and adt, we use bionty tables as the reference:

Validate #

mdata["rna"].var_names[:5]

Index(['RP5-827C21.6', 'XX-CR54.1', 'SH2D6', 'RP11-379B18.5', 'RP11-778D9.12'], dtype='object', name='index')

lb.Gene.validate(mdata["rna"].var_names, lb.Gene.symbol);

❗ 173 terms (100.00%) are not validated for symbol: RP5-827C21.6, XX-CR54.1, SH2D6, RP11-379B18.5, RP11-778D9.12, RP11-703G6.1, AC005150.1, RP11-717H13.1, CTC-498J12.1, CTC-467M3.1, ARHGAP26-AS1, GABRA1, HIST1H4K, HLA-DQB1-AS1, RP11-524H19.2, SPACA1, VNN1, AC006042.7, AC002066.1, AC073934.6, ...

genes = lb.Gene.from_values(mdata["rna"].var_names, lb.Gene.symbol)
ln.save(genes)

❗ ambiguous validation in Bionty for 6 records: 'HLA-DQB1-AS1', 'CTAGE15', 'CTRB2', 'LGALS9C', 'PCDHB11', 'TBC1D3G'

❗ did not create Gene records for 84 non-validated symbols: 'AC002066.1', 'AC004019.13', 'AC005150.1', 'AC006042.7', 'AC011558.5', 'AC026471.6', 'AC073934.6', 'AC091132.1', 'AC092295.4', 'AC092687.5', 'AE000662.93', 'AL132989.1', 'AP000442.4', 'CTA-373H7.7', 'CTB-134F13.1', 'CTB-31O20.9', 'CTC-498J12.1', 'CTD-2562J17.2', 'CTD-3012A18.1', 'CTD-3065B20.2', ...

mdata["rna"].var_names = lb.Gene.standardize(mdata["rna"].var_names, lb.Gene.symbol)

validated = lb.Gene.validate(mdata["rna"].var_names, lb.Gene.symbol)

❗ 84 terms (48.60%) are not validated for symbol: RP5-827C21.6, XX-CR54.1, RP11-379B18.5, RP11-778D9.12, RP11-703G6.1, AC005150.1, RP11-717H13.1, CTC-498J12.1, RP11-524H19.2, AC006042.7, AC002066.1, AC073934.6, RP11-268G12.1, U52111.14, RP11-235C23.5, RP11-12J10.3, RP11-324E6.9, RP11-187A9.3, RP11-365N19.2, RP11-346D14.1, ...

new_genes = [lb.Gene(symbol=symbol) for symbol in mdata["rna"].var_names[~validated]]
ln.save(new_genes)

lb.Gene.validate(mdata["rna"].var_names, lb.Gene.symbol);

feature_set_rna = ln.FeatureSet.from_values(
    mdata["rna"].var_names, field=lb.Gene.symbol
)

mdata["adt"].var_names

Index(['CD86', 'PDL1', 'PDL2', 'CD366'], dtype='object', name='index')

lb.CellMarker.validate(mdata["adt"].var_names);

❗ 4 terms (100.00%) are not validated for name: CD86, PDL1, PDL2, CD366

markers = lb.CellMarker.from_values(mdata["adt"].var_names)
ln.save(markers)

lb.CellMarker.validate(mdata["adt"].var_names);

Register #

feature_set_adt = ln.FeatureSet.from_values(
    mdata["adt"].var_names, field=lb.CellMarker.name
)

Link them to file:

file.features.add_feature_set(feature_set_rna, slot="rna")
file.features.add_feature_set(feature_set_adt, slot="adt")

The 3rd feature set is the obs:

obs = mdata["rna"].obs

We’re only interested in a single metadata column:

ln.Feature(name="gene_target", type="category").save()

features = ln.Feature.from_df(obs)
ln.save(features)

feature_set_obs = ln.FeatureSet.from_df(obs)

file.features.add_feature_set(feature_set_obs, slot="obs")

gene_targets = lb.Gene.from_values(obs["gene_target"], lb.Gene.symbol)
ln.save(gene_targets)
features = ln.Feature.lookup()
file.labels.add(gene_targets, feature=features.gene_target)

❗ ambiguous validation in Bionty for 4 records: 'MARCHF8', 'IRF7', 'IFNGR2', 'TNFRSF14'

❗ did not create Gene record for 1 non-validated symbol: 'NT'

nt = ln.ULabel(name="NT", description="Non-targeting control of perturbations")
nt.save()

file.labels.add(nt, feature=features.gene_target)

for col in ["orig.ident", "perturbation", "replicate", "Phase", "guide_ID"]:
    labels = [ln.ULabel(name=name) for name in obs[col].unique()]
    ln.save(labels)

❗ record with similar name exist! did you mean to load it?

	id	__ratio__
name
G1	14	90.0

❗ record with similar name exist! did you mean to load it?

	id	__ratio__
name
G1	14	90.0

❗ record with similar name exist! did you mean to load it?

	id	__ratio__
name
S	15	90.0

❗ records with similar names exist! did you mean to load one of them?

	id	__ratio__
name
G1	14	90.0
S	15	90.0

❗ record with similar name exist! did you mean to load it?

	id	__ratio__
name
NT	1	90.0

❗ records with similar names exist! did you mean to load one of them?

	id	__ratio__
name
G1	14	90.0
NT	1	90.0

❗ record with similar name exist! did you mean to load it?

	id	__ratio__
name
S	15	90.0

❗ record with similar name exist! did you mean to load it?

	id	__ratio__
name
G1	14	90.0

❗ record with similar name exist! did you mean to load it?

	id	__ratio__
name
G1	14	90.0

❗ records with similar names exist! did you mean to load one of them?

	id	__ratio__
name
G1	14	90.0
S	15	90.0

❗ record with similar name exist! did you mean to load it?

	id	__ratio__
name
G1	14	90.0

❗ record with similar name exist! did you mean to load it?

	id	__ratio__
name
S	15	90.0

❗ record with similar name exist! did you mean to load it?

	id	__ratio__
name
G1	14	90.0

❗ records with similar names exist! did you mean to load one of them?

	id	__ratio__
name
G1	14	90.0
NT	1	90.0

❗ record with similar name exist! did you mean to load it?

	id	__ratio__
name
G1	14	90.0

❗ record with similar name exist! did you mean to load it?

	id	__ratio__
name
G1	14	90.0

❗ record with similar name exist! did you mean to load it?

	id	__ratio__
name
G1	14	90.0

❗ record with similar name exist! did you mean to load it?

	id	__ratio__
name
S	15	90.0

❗ records with similar names exist! did you mean to load one of them?

	id	__ratio__
name
G1	14	90.0
S	15	90.0

❗ record with similar name exist! did you mean to load it?

	id	__ratio__
name
G1	14	90.0

❗ record with similar name exist! did you mean to load it?

	id	__ratio__
name
G1	14	90.0

❗ record with similar name exist! did you mean to load it?

	id	__ratio__
name
S	15	90.0

❗ record with similar name exist! did you mean to load it?

	id	__ratio__
name
NT	1	90.0

❗ record with similar name exist! did you mean to load it?

	id	__ratio__
name
G1	14	90.0

❗ record with similar name exist! did you mean to load it?

	id	__ratio__
name
NT	1	90.0

❗ record with similar name exist! did you mean to load it?

	id	__ratio__
name
G1	14	90.0

❗ record with similar name exist! did you mean to load it?

	id	__ratio__
name
G1	14	90.0

❗ record with similar name exist! did you mean to load it?

	id	__ratio__
name
NT	1	90.0

❗ record with similar name exist! did you mean to load it?

	id	__ratio__
name
S	15	90.0

❗ record with similar name exist! did you mean to load it?

	id	__ratio__
name
G1	14	90.0

❗ record with similar name exist! did you mean to load it?

	id	__ratio__
name
G1	14	90.0

❗ record with similar name exist! did you mean to load it?

	id	__ratio__
name
S	15	90.0

❗ record with similar name exist! did you mean to load it?

	id	__ratio__
name
G1	14	90.0

❗ record with similar name exist! did you mean to load it?

	id	__ratio__
name
S	15	90.0

❗ record with similar name exist! did you mean to load it?

	id	__ratio__
name
G1	14	90.0

❗ record with similar name exist! did you mean to load it?

	id	__ratio__
name
S	15	90.0

❗ record with similar name exist! did you mean to load it?

	id	__ratio__
name
S	15	90.0

Because none of these labels seem like something we’d want to track in the registry or validate, we don’t link them to the file.

file.features

Features:
  rna: FeatureSet(id=1, uid='qN6G1jRaRDoVnH9xGAne', n=184, type='number', registry='bionty.Gene', hash='Y8lsRtXCZKyPPberKAF0', updated_at=2023-10-16 21:51:11, created_by_id=1)
    'SH2D6', 'MEF2C-AS2', 'ARHGAP26-AS1', 'GABRA1', 'H4C12', 'HLA-DQB1-AS1', 'HLA-DQB1-AS1', 'HLA-DQB1-AS1', 'HLA-DQB1-AS1', 'HLA-DQB1-AS1', 'HLA-DQB1-AS1', 'HLA-DQB1-AS1', 'SPACA1', 'VNN1', 'CTAGE15', 'CTAGE15', 'PFKFB1', 'TRPC5', 'RBPMS-AS1', 'CA8', ...
  adt: FeatureSet(id=2, uid='RSx8Ozli8JUm1jBYcaS0', n=4, type='number', registry='bionty.CellMarker', hash='b-CtyjgPRO0WN27lTOqC', updated_at=2023-10-16 21:51:11, created_by_id=1)
    'CD86', 'PDL1', 'PDL2', 'CD366'
  obs: FeatureSet(id=3, uid='dCR5dcT1GGhWgay4UR96', n=19, registry='core.Feature', hash='IrzrMLxvQgFvcWoLg1Bm', updated_at=2023-10-16 21:51:12, created_by_id=1)
    🔗 gene_target (bionty.Gene|core.ULabel)
        🔗 gene_target (28, bionty.Gene): 'MARCHF8', 'MARCHF8', 'IFNGR1', 'CAV1', 'IRF7', 'IRF7', 'ATF2', 'NFKBIA', 'STAT1', 'SPI1', ...
        🔗 gene_target (1, core.ULabel): 'NT'
    orig.ident (category)
    nCount_RNA (number)
    nFeature_RNA (number)
    nCount_HTO (number)
    nFeature_HTO (number)
    nCount_GDO (number)
    nCount_ADT (number)
    nFeature_ADT (number)
    percent.mito (number)
    MULTI_ID (category)
    HTO_classification (category)
    guide_ID (category)
    NT (category)
    perturbation (category)
    replicate (category)
    S.Score (number)
    G2M.Score (number)
    Phase (category)

file.describe()

File(id=1, uid='2obuVHcf3vhZw7bmPGPe', suffix='.h5mu', accessor='MuData', description='Sub-sampled MuData from Papalexi21', size=606320, hash='RaivS3NesDOP-6kNIuaC3g', hash_type='md5', updated_at=2023-10-16 21:51:04)

Provenance:
  🗃️ storage: Storage(id=1, uid='W1nSwRAH', root='/home/runner/work/lamin-usecases/lamin-usecases/docs/test-multimodal', type='local', updated_at=2023-10-16 21:50:57, created_by_id=1)
  💫 transform: Transform(id=1, uid='yMWSFirS6qv2z8', name='Multi-modal', short_name='multimodal', version='0', type=notebook, updated_at=2023-10-16 21:51:03, created_by_id=1)
  👣 run: Run(id=1, uid='6NSIfP0PKlynxSiJxA66', run_at=2023-10-16 21:51:03, transform_id=1, created_by_id=1)
  👤 created_by: User(id=1, uid='DzTjkKse', handle='testuser1', email='testuser1@lamin.ai', name='Test User1', updated_at=2023-10-16 21:50:57)
Features:
  rna: FeatureSet(id=1, uid='qN6G1jRaRDoVnH9xGAne', n=184, type='number', registry='bionty.Gene', hash='Y8lsRtXCZKyPPberKAF0', updated_at=2023-10-16 21:51:11, created_by_id=1)
    'SH2D6', 'MEF2C-AS2', 'ARHGAP26-AS1', 'GABRA1', 'H4C12', 'HLA-DQB1-AS1', 'HLA-DQB1-AS1', 'HLA-DQB1-AS1', 'HLA-DQB1-AS1', 'HLA-DQB1-AS1', 'HLA-DQB1-AS1', 'HLA-DQB1-AS1', 'SPACA1', 'VNN1', 'CTAGE15', 'CTAGE15', 'PFKFB1', 'TRPC5', 'RBPMS-AS1', 'CA8', ...
  adt: FeatureSet(id=2, uid='RSx8Ozli8JUm1jBYcaS0', n=4, type='number', registry='bionty.CellMarker', hash='b-CtyjgPRO0WN27lTOqC', updated_at=2023-10-16 21:51:11, created_by_id=1)
    'CD86', 'PDL1', 'PDL2', 'CD366'
  obs: FeatureSet(id=3, uid='dCR5dcT1GGhWgay4UR96', n=19, registry='core.Feature', hash='IrzrMLxvQgFvcWoLg1Bm', updated_at=2023-10-16 21:51:12, created_by_id=1)
    🔗 gene_target (bionty.Gene|core.ULabel)
        🔗 gene_target (28, bionty.Gene): 'MARCHF8', 'MARCHF8', 'IFNGR1', 'CAV1', 'IRF7', 'IRF7', 'ATF2', 'NFKBIA', 'STAT1', 'SPI1', ...
        🔗 gene_target (1, core.ULabel): 'NT'
    orig.ident (category)
    nCount_RNA (number)
    nFeature_RNA (number)
    nCount_HTO (number)
    nFeature_HTO (number)
    nCount_GDO (number)
    nCount_ADT (number)
    nFeature_ADT (number)
    percent.mito (number)
    MULTI_ID (category)
    HTO_classification (category)
    guide_ID (category)
    NT (category)
    perturbation (category)
    replicate (category)
    S.Score (number)
    G2M.Score (number)
    Phase (category)
Labels:
  🏷️ genes (28, bionty.Gene): 'MARCHF8', 'MARCHF8', 'IFNGR1', 'CAV1', 'IRF7', 'IRF7', 'ATF2', 'NFKBIA', 'STAT1', 'SPI1', ...
  🏷️ ulabels (1, core.ULabel): 'NT'

file.view_flow()

https://d33wubrfki0l68.cloudfront.net/61128f963501ea92d884a4bc002fe3b6cc8f7bd5/4bbe9/_images/12134cdb6955a366d44ae153bd3d9c4540bd516a36276a39cd32ba32c253bb22.svg

# clean up test instance
!lamin delete --force test-multimodal
!rm -r test-multimodal