Using eutils¶
Installation¶
The easiest way to install the eutils package is to use pre-build Python package from PyPI, like so:
$ pip install eutils
Consider using virtualenvwrapper or pyvenv to setup virtual environments before installing eutils.
Code that relies on eutils should specify a version bracket to ensure that eutils receives bug fixes but not updates that might break compatibility. In your package’s setup.py:
setup(
...
install_requires = [
'eutils>=0.4,<0.3',
],
...
)
Alternatively, you may install from source; please see Installation for Development for details.
Examples¶
Common setup¶
Instantiating an eutils eutils.Client
is this easy:
>>> import eutils
# Initialize a client. This client handles all caching and query
# throttling
>>> ec = eutils.Client()
Fetching gene information¶
# search for tumor necrosis factor genes
# any valid NCBI query may be used
>>> esr = ec.esearch(db='gene',term='tumor necrosis factor')
# fetch one of those (gene id 7157 is human TNF)
>>> egs = ec.efetch(db='gene',id=7157)
# One may fetch multiple genes at a time. These are returned as an
# EntrezgeneSet. We'll grab the first (and only) child, which returns
# an instance of the Entrezgene class.
>>> eg = egs.entrezgenes[0]
# Easily access some basic information about the gene
>>> eg.hgnc, eg.maploc, eg.description, eg.type, eg.genus_species
('TP53', '17p13.1', 'tumor protein p53', 'protein-coding', 'Homo sapiens')
# get a list of genomic references
>>> [str(r) for r in eg.references]
['GeneCommentary(acv=NC_000017.11,type=genomic,heading=Reference GRCh38.p2 Primary Assembly,label=Chromosome 17 Reference GRCh38.p2 Primary Assembly)',
'GeneCommentary(acv=NG_017013.2,type=genomic,heading=None,label=RefSeqGene)',
'GeneCommentary(acv=NC_018928.2,type=genomic,heading=Alternate CHM1_1.1,label=Chromosome 17 Alternate CHM1_1.1)']
# Get all products defined on GRCh38
>>> [p.acv for p in eg.references[0].products]
[u'NM_001126112.2', u'NM_001276761.1', u'NM_000546.5',
u'NM_001276760.1', u'NM_001126113.2', u'NM_001276695.1',
u'NM_001126114.2', u'NM_001276696.1', u'NM_001126118.1',
u'NM_001126115.1', u'NM_001276697.1', u'NM_001126117.1',
u'NM_001276699.1', u'NM_001126116.1', u'NM_001276698.1']
# As a sample, grab the first product defined on this reference (order is arbitrary)
>>> mrna = eg.references[0].products[0]
>>> str(mrna)
'GeneCommentary(acv=NM_001126112.2,type=mRNA,heading=Reference,label=transcript variant 2)'
# mrna.genomic_coords provides access to the exon definitions on this
reference
>>> mrna.genomic_coords.gi, mrna.genomic_coords.strand
('568815581', -1)
>>> mrna.genomic_coords.intervals
[(7687376, 7687549), (7676520, 7676618), (7676381, 7676402),
(7675993, 7676271), (7675052, 7675235), (7674858, 7674970),
(7674180, 7674289), (7673700, 7673836), (7673534, 7673607),
(7670608, 7670714), (7668401, 7669689)]
# and the mrna has a product, the resulting protein:
>>> str(mrna.products[0])
'GeneCommentary(acv=NP_001119584.1,type=peptide,heading=Reference,label=isoform a)'
Fetch PubMed information¶
# search pubmed by author
>>> esr = c.esearch(db='pubmed', term='Nussbaum RL')
# fetch all of them
>>> paset = c.efetch(db='pubmed', id=esr.ids)
# paset represents PubmedArticleSet, a collection of
PubmedArticles. The major interface component is to iterate over
PubmedArticles with constructs like `for pa in paset: ...`. We
fetch the first PubmedArticle like this:
>>> pa = iter(paset).next()
PubmedArticle provides acccessors to essential data:
>>> pa.title
'High incidence of functional ion-channel abnormalities in a
consecutive Long QT cohort with novel missense genetic variants of
unknown significance.'
>>> pa.authors
[u'Steffensen AB', u'Refaat MM', u'David JP', u'Mujezinovic A',
u'Calloe K', u'Wojciak J', u'Nussbaum RL', u'Scheinman MM',
u'Schmitt N']
>>> pa.jrnl, pa.volume, pa.issue, pa.year
('Sci Rep', '5', None, '2015')
>>> pa.jrnl, pa.volume, pa.issue, pa.year, pa.pages
('Sci Rep', '5', None, '2015', '10009')
>>> pa.pmid, pa.doi, pa.pmc
('26066609', '10.1038/srep10009', '4464365')