API Documentation

This page contains documentation and examples for our API. This API can be used to query the Pogo database directly.

Introduction

A key feature of POGO is the ability for users to be able to mass query our database. Our web interface has a certain use-case, and we recognize that users may have different needs than our website provides. They might have a different workflow, need to feed our data through a pipeline, or download large sets of data.

In order to remedy this, we provide users not only with the ability to download our entire database, but also with the ability to directly query the database for information they are interested in.

Organization

POGO's database is internally represented by two tables. The data table contains comparison data between two genomes, and the taxonomy table containing taxonomic information about the genomes.

Our database's API returns JSON formatted arrays or CSV files, and uses REST's "GET" mechanism to work. POGO's database API is loosely based around SQL select statements, since we use MySQL as our database backend.

Query Basics

Users can query the website using this url: http://pogo.ece.drexel.edu/query.php

Queries are done by specifying certain GET variables in the URL. An example of this can be seen below, where we query the taxonomy table for all rows with the columns species, genome, and family.

http://pogo.ece.drexel.edu/query.php?type=taxonomy&select=species,genome,family&limit=10

To get more specific results we need to tell the database to return only rows that fit what we are interested in. As you saw above, we can tell the database what columns we are interested in, but we now need to tell it what columns we are interested in

The 'methods' section below explains how we can do just that.

Methods

There are three main methods that our API accepts. Type, Select, and Where. There are also other arguments including and Array, and Limit and Output.

Type

The Type argument tells the API which table you are querying, and is always required when using the API. There are only two options, "data" and "taxonomy".

The taxonomy table contains information about the different genomes that were compared.

Taxonomy Example: http://pogo.ece.drexel.edu/query.php?type=taxonomy&limit=10

The data table contains data from the comparisons.

Data Example: http://pogo.ece.drexel.edu/query.php?type=data&limit=10

Select

The Select argument allows you to choose which columns you are interested in. To know what columns are available please refer to the Properties section of this document.

If no results are returned, then all rows in the selected table are returned

This example returns the columns genus, species, ord, and superkingdom from the taxonomy table: http://pogo.ece.drexel.edu/query.php?type=taxonomy&select=genus,species,ord,superkingdom&limit=10

This example returns the columns id from data from the data table: http://pogo.ece.drexel.edu/query.php?type=data&select=id&limit=10

Where

The Where argument allows you to filter the rows based upon a statement. These operators and statements should be familiar to anyone with rudimentary knowledge of logic or programming.

At the bottom of this document are examples for different where statements

The operators we support are listed below

Equality Operator Explanation
= equal
< less than
> greater than
! not. this operator proceeds others, like !=
and AND operator instead of &&
or OR operator
xor Exclusive OR operator

We also support other statements that allow users to do string comparisons.

String Comparison Explanation Usage
like(string) wrapper for MySQL LIKE genus like('Chlamy')

Examples

Select all columns from rows where the genus is Bacillus http://pogo.ece.drexel.edu/query.php?type=taxonomy&where=genus='Bacillus'

Select all taxonomy where the genus contains 'Actino' http://pogo.ece.drexel.edu/query.php?type=taxonomy&where=genus like('Actino')

Select all data where the Genomic_Fluidity is over 90% http://pogo.ece.drexel.edu/query.php?type=data&where=Genomic_Fluidity>.90

Select all data where the Genomic_Fluidity is over 90% or less than 20% http://pogo.ece.drexel.edu/query.php?type=data&where=Genomic_Fluidity>.90 OR Genomic_Fluidity<.20

Warning: Order of Operations

Consider that the following statement could have multiple meanings: Select all taxonomy where the Genomic_Fluidity is over 90% or less than 20% and Average Amino Acid Identity is over 90%.

Using parentheses we can control the order of evaluation in a statement. This is the same as with math, inside to outside. It also follows the same style as most programming languages.

Here we have a statement where we select where Genomic_Fluidity is either over 90%, or is less than 20% and has an AAAI over 90%. http://pogo.ece.drexel.edu/query.php?type=data&where=(Genomic_Fluidity > .90) or (Genomic_Fluidity < .20 AND Average_Amino_Acid_Identity > .90 )

Limit

The Limit argument allows the user to specify how many results you want to return at maximum

http://pogo.ece.drexel.edu/query.php?type=data&limit=1000

Output

The Output argument allows you to specify if you want CSV or JSON output. By default a JSON array will be returned.

http://pogo.ece.drexel.edu/query.php?type=data&limit=1000&output=csv

Array

The Array argument allows you to specify if you want either a JSON Associative Array, or a Indexed Array, if you are using JSON as your output type. For more information read this link.

This argument is optional, and the POGO database will return numerical arrays by default.

Option Explanation
ASSOC Associative Array
NUM Numerical Array

This is an example of returning an associative array in the data table

http://pogo.ece.drexel.edu/query.php?type=data&output=JSON&array=ASSOC&limit=10

Properties

This section details the columns available in our data and taxonomy tables. Each column can be used in where statements, and in the select arguments.

Taxonomy Table

Our taxonomy table is collected from NCBI with some small changes.

Column Name Description Type
id This is a unique identifier for the genome. genome_id1 and genome_id2 in the data table correspond to these values. integer
genome The name of the genome, which also is also a unique identifier. string
phylum Phylum of genome. string
class Class of genome. string
ord Order of genome. string
family Family of genome. string
genus Genus of genome. string
species Species of genome. string
superkingdom Superkingdom of genome. string

Comparison Data

The comparison table contains all the information you see on the regular webpage, like orthologs, 16S_rRNA, and other marker genes.

Column Name Description Type
id This is a unique identifier for the genome comparison. integer
genome_id1, genome_id2 An id of one of the two genomes in the comparison. string
number_of_genes1, number_of_genes2 Number of genes from respective genome in comparison. integer
orthologs_criterion1, orthologs_criterion2 See the about page for more about ortholog criterions. integer
Average_Amino_Acid_Identity The Average Amino Acid Identity. See the about page for more. float
Genomic_Fluidity See about page for more about Genomic Fluidity float
16S_rRNA 16S_rRNA identity float
ArgS, CdsA, CoaE, etc. other (besides 16S rRNA) marker gene identities float
genome1_name, genome2_name the name of the genome. string
genome1_phylum, genome2_phylum the phylum of the genome. string
genome1_class, genome2_class the class of the genome. string
genome1_genus, genome2_genus the genus of the genome. string
genome1_species, genome2_species the species of the genome. string
genome1_superkingdom, genome2_superkingdom the superkingdom of the genome. string

Blast files

In order to get a tarball of blast files from our database, you need to query our download url. This is done in the same method as the regular query.

The ids variable corresponds to the "id" column in the comparison table.

This example requests a tarball containing blast files from comparisons with the id's 2354, 19201, and 623719.

http://pogo.ece.drexel.edu/download.php?ids=2354,19201,623719

Examples

Taxonomy Comparisons

Comparing genus's and other taxonomy is slightly more complicated because there are two different genomes in each comparison, but we aren't ever sure if which one is categorized as genome1 or genome2. Therefore you need to have slightly more complex statements to properly select based upon taxonomy. here's a pseudo-code where statement on how to correctly ask for all A vs B:
 if (genome1_genus is A and genome2_genus is B)
 OR
 if (genome1_genus is B and genome2_genus is A)

 >>> Then show me the results

One Genus vs Another

http://pogo.ece.drexel.edu/query.php?type=data&where=(genome1_genus='Bacillus' and genome2_genus='Chlamydia') or (genome1_genus='Chlamydia' and genome2_genus='Bacillus')

One Genus vs Itself

http://pogo.ece.drexel.edu/query.php?type=data&where=genome1_genus='Bacillus' and genome2_genus='Bacillus'

One Species vs All Others

http://pogo.ece.drexel.edu/query.php?type=data&where=genome1_species='Haemophilus influenzae' or genome2_species='Haemophilus influenzae'

One Species vs Itself

http://pogo.ece.drexel.edu/query.php?type=data&where=genome1_species='Haemophilus influenzae' and genome2_species='Haemophilus influenzae'