E-value Cutoff Threshold for Homolog Searching

The cutoff of BLASTP E-value to search homologs of the query proteins A and B.

Joint E-value

Joint E-value is a quantitative degree to measure the similarity between two protein pairs. We followed previous works (Matthews et al., 2001; Yu et al., 2004) to define the joint sequence similarity as

where EA denotes the E-value of proteins A and its homolog A1'; and EB is the E-value of proteins B and its homolog B1'.

Number of Interactions in Each Species ( Ranking by Joint E-value )

Users can choose the max number of output homologous protein-protein interactions (PPIs) in one species. For example, if the number is :three;, this server will return the top-3 PPIs of each species, which are ranked by joint E-values of each interaction.

The conservation ratio (CRD) of a domain-domain pair

A query protein pair and its homologous PPIs can often agree on interacting DDPs. To measure the agreement of each DDP in a PPI family, we defined the conservation ratio (CRDp) of a DDP p in homologous PPIs of a query protein pair i as

To statistically evaluate the transferability of DDPs between a query and its homologous PPIs, this study defines the shared ratio (SRD) of DDPs using CRDp and 103,762 annotated PPIs as query protein pairs. The SRD of DDPs against different ratio c is given as

where Q is a set of annotated PPIs in databases (here, the total number of PPIs in Q is 103,762); i is a query protein pair; di(CRDp > c) is the number of DDPs with CRDp values exceeding c, and these DDPs are shared by query i and its homologous PPIs. Di(CRDp > c) is the total number of the DDPs with CRDp > c, where DDPs are derived from homologous PPIs of the query i. Here, this work used a statistical approach to determine threshold c (here, default value of c is 0.6) of CRDp to yield reliable DDP annotations with an acceptable level of Di. Please note that CRDp and SRD are computed from a query protein pair and a set of queries, respectively.

The conservation ratio (CRF) of a molecular function term pair

The members of a PPI family often have similar molecular functions. PPISearch utilizes the molecular function (MF) terms of Gene Ontology to annotate the functions of a query protein pair. The conservation ratio (CRFm) of an MF term pair (MFP) m in homologous PPIs of a query i is used to measure the agreement and is defined as

Additionally, the shared ratio of MFPs (SRF), which is statistically derived from 106,997 annotated queries, is utilized to estimate the transferability of conserved function pairs shared by the query and its homologous PPIs. The SRF against different ratio k is defined as

where Q is a set of annotated PPIs in databases; i is a query protein pair; fi(CRFm > k) is the number of MFPs with CRFm values exceeding k and these MFPs are shared by the query i and its homologous PPIs; and Fi(CRFm > k) is the total number of MFPs with CRFm > k, where MFPs are derived from homologous PPIs of query i. Here, default value of k is 0.6.

Choose Output Species

We list seven model organisms, such as Homo sapiens (human) and Drosophila melanogaster (fruit fly), which have high-quality protein-protein interaction data. After users choose one or more than one species, this web server will present only the homologous PPIs in these selected species. The choice :All; means no species limitation. Totally there are 290,137 PPIs (54,422 proteins of 576 species) in the annotated database used in this server.

If users are interested in PPIs of one or more than one specific species, users can input NCBI taxonomy ID(s) to filter search results.