Look up records of Bionty entities#
Entities and ontologies can be complex with many different identifiers.
Here we show Bionty’s lookup model for species, genes, proteins and cell markers. You’ll see how to
access the reference table via
.df()
look up an entity term via
.lookup()
look up an entity term via
.fuzzy_match()
import bionty as bt
✅ Created /home/runner/.lamin/bionty/versions/sources_local.yaml!
.fields: fields of an ontology reference#
gene_bionty = bt.Gene()
gene_bionty
Gene
Species: human
Source: ensembl, release-108
📖 Gene.df(): ontology reference table
🔎 Gene.lookup(): autocompletion of ontology terms
🎯 Gene.fuzzy_match(): fuzzy match against ontology terms
🧐 Gene.inspect(): check if identifiers are mappable
👽 Gene.map_synonyms(): map synonyms to standardized names
🔗 Gene.ontology: Pronto.Ontology object
gene_bionty.fields
{'description',
'ensembl_gene_id',
'gene_type',
'hgnc_id',
'id',
'ncbi_gene_id',
'omim_id',
'symbol',
'synonyms'}
Fields can be accessed as attributes for autocompletion:
(You can pass them to the field
parameter in any bionty function instead of strings.)
gene_bionty.ncbi_gene_id
ncbi_gene_id
.df()
: reference table#
Data scientists love DataFrames, and every entity has a reference table containing all the fields.
df = gene_bionty.df()
df.head()
id | ensembl_gene_id | symbol | gene_type | description | ncbi_gene_id | hgnc_id | omim_id | synonyms | version | |
---|---|---|---|---|---|---|---|---|---|---|
0 | Lzl9xt | ENSG00000210049 | MT-TF | Mt_tRNA | mitochondrially encoded tRNA-Phe (UUU/C) [Sour... | None | HGNC:7481 | None | MTTF|trnF | Ens107 |
1 | ILAWa7 | ENSG00000211459 | MT-RNR1 | Mt_rRNA | mitochondrially encoded 12S rRNA [Source:HGNC ... | None | HGNC:7470 | None | 12S|MOTS-c|MTRNR1 | Ens107 |
2 | XkyeQz | ENSG00000210077 | MT-TV | Mt_tRNA | mitochondrially encoded tRNA-Val (GUN) [Source... | None | HGNC:7500 | None | MTTV|trnV | Ens107 |
3 | jDD2jW | ENSG00000210082 | MT-RNR2 | Mt_rRNA | mitochondrially encoded 16S rRNA [Source:HGNC ... | None | HGNC:7471 | None | 16S|HN|MTRNR2 | Ens107 |
4 | J58H9b | ENSG00000209082 | MT-TL1 | Mt_tRNA | mitochondrially encoded tRNA-Leu (UUA/G) 1 [So... | None | HGNC:7490 | None | MTTL1|TRNL1 | Ens107 |
To access the information of, for example the multiple gene symbols, we select the corresponding species through Pandas:
df.set_index("symbol").loc[["LMNA", "TCF7", "BRCA1"]]
id | ensembl_gene_id | gene_type | description | ncbi_gene_id | hgnc_id | omim_id | synonyms | version | |
---|---|---|---|---|---|---|---|---|---|
symbol | |||||||||
LMNA | 96RlDv | ENSG00000160789 | protein_coding | lamin A/C [Source:HGNC Symbol;Acc:HGNC:6636] | 4000 | HGNC:6636 | 150330 | CMD1A|HGPS|LGMD1B|LMN1|LMNL1|MADA|PRO1 | Ens107 |
TCF7 | sXCrmQ | ENSG00000081059 | protein_coding | transcription factor 7 [Source:HGNC Symbol;Acc... | 6932 | HGNC:11639 | 189908 | TCF-1 | Ens107 |
BRCA1 | 9FY8yO | ENSG00000012048 | protein_coding | BRCA1 DNA repair associated [Source:HGNC Symbo... | 672 | HGNC:1100 | 113705 | BRCC1|FANCS|PPP1R53|RNF53 | Ens107 |
.lookup(): Lookup terms and records with autocompletion#
Terms can be searched with auto-complete using a lookup object.
lookup = gene_bionty.lookup()
Pythonic terms can be directly fetched via dot .
accessor:
lookup.TCF7
Gene(id='sXCrmQ', ensembl_gene_id='ENSG00000081059', symbol='TCF7', gene_type='protein_coding', description='transcription factor 7 [Source:HGNC Symbol;Acc:HGNC:11639]', ncbi_gene_id='6932', hgnc_id='HGNC:11639', omim_id='189908', synonyms='TCF-1', version='Ens107')
For non-pythonic string, use bracket []
for autocompletion:
lookup["ADGRL1-AS1"]
Gene(id='v68LyZ', ensembl_gene_id='ENSG00000267169', symbol='ADGRL1-AS1', gene_type='lncRNA', description='ADGRL1 antisense RNA 1 [Source:HGNC Symbol;Acc:HGNC:55309]', ncbi_gene_id='100507373', hgnc_id='HGNC:55309', omim_id=None, synonyms=None, version='Ens107')
By default, the name
field is used to generate lookup keys.
You can specify another field to look up:
lookup = gene_bionty.lookup(gene_bionty.hgnc_id)
lookup["HGNC:10478"]
Gene(id='AdmgUK', ensembl_gene_id='ENSG00000204231', symbol='RXRB', gene_type='protein_coding', description='retinoid X receptor beta [Source:HGNC Symbol;Acc:HGNC:10478]', ncbi_gene_id='6257', hgnc_id='HGNC:10478', omim_id='180246', synonyms='H-2RIIBP|NR2B2|RCoR-1|RXR-beta|RXRbeta', version='Ens107')
.fuzzy_match
: Look up a term via fuzzy matching#
celltype_bionty = bt.CellType()
celltype_bionty.fuzzy_match("cytotoxic T cells")
ontology_id | definition | synonyms | children | __ratio__ | |
---|---|---|---|---|---|
name | |||||
cytotoxic T cell | CL:0000910 | A Mature T Cell That Differentiated And Acquir... | cytotoxic T lymphocyte|cytotoxic T-lymphocyte|... | [] | 96.969697 |
By default, fuzzy_match also matches against synonyms:
celltype_bionty.fuzzy_match("P cell")
ontology_id | definition | synonyms | children | __ratio__ | |
---|---|---|---|---|---|
name | |||||
nodal myocyte | CL:0002072 | A Specialized Cardiac Myocyte In The Sinoatria... | cardiac pacemaker cell|myocytus nodalis|P cell | [CL:1000409, CL:1000410] | 100.0 |
You can turn off synonym matching with synonyms_field=None
:
celltype_bionty.fuzzy_match("P cell", synonyms_field=None)
ontology_id | definition | synonyms | children | __ratio__ | |
---|---|---|---|---|---|
name | |||||
PP cell | CL:0000696 | A Cell That Stores And Secretes Pancreatic Pol... | type F enteroendocrine cell | [CL:0002680] | 92.307692 |
Match against another field (default is “name”):
celltype_bionty.fuzzy_match("CD8+ alpha beta T cells", field=celltype_bionty.definition)
ontology_id | name | synonyms | children | __ratio__ | |
---|---|---|---|---|---|
definition | |||||
A T Cell That Expresses An Alpha-Beta T Cell Receptor Complex. | CL:0000789 | alpha-beta T cell | alpha-beta T-cell|alpha-beta T-lymphocyte|alph... | [CL:0000790, CL:0000791] | 75.0 |
Return all results ranked by matching ratios:
celltype_bionty.fuzzy_match("P cell", return_ranked_results=True).head()
ontology_id | definition | synonyms | children | __ratio__ | |
---|---|---|---|---|---|
name | |||||
nodal myocyte | CL:0002072 | A Specialized Cardiac Myocyte In The Sinoatria... | cardiac pacemaker cell|myocytus nodalis|P cell | [CL:1000409, CL:1000410] | 100.000000 |
double-positive, alpha-beta thymocyte | CL:0000809 | A Thymocyte Expressing The Alpha-Beta T Cell R... | DP cell|DP thymocyte|double-positive, alpha-be... | [CL:0002430, CL:0002427, CL:0002431, CL:000242... | 92.307692 |
PP cell | CL:0000696 | A Cell That Stores And Secretes Pancreatic Pol... | type F enteroendocrine cell | [CL:0002680] | 92.307692 |
pigmented ciliary epithelial cell | CL:0002303 | A Cell That Is Part Of Pigmented Ciliary Epith... | PE cell | [] | 92.307692 |
GIP cell | CL:0002278 | An Enteroendocrine Cell Of Duodenum And Jejunu... | type K enteroendocrine cell | [] | 85.714286 |
Tied results will all be returns:
celltype_bionty.fuzzy_match("A cell", synonyms_field=None)
ontology_id | definition | synonyms | children | __ratio__ | |
---|---|---|---|---|---|
name | |||||
T cell | CL:0000084 | A Type Of Lymphocyte Whose Defining Characteri... | T-cell|T lymphocyte|T-lymphocyte | [CL:0000798, CL:0002420, CL:0002419, CL:0000789] | 83.333333 |
B cell | CL:0000236 | A Lymphocyte Of B Lineage That Is Capable Of B... | B lymphocyte|B-lymphocyte|B-cell | [CL:0009114, CL:0001201] | 83.333333 |