uniprot-protein-retrieval
community[skill]
Retrieve protein sequences and functional information from UniProt database by protein name, enabling protein analysis and bioinformatics workflows.
$
/plugin install InnoClawdetails
UniProt Protein Sequence Retrieval
Usage
1. MCP Server Definition
Use the standard MCP client pattern for Origene-UniProt server.
2. Protein Sequence Retrieval Workflow
This workflow retrieves protein sequences and associated information from the UniProt database using protein names or identifiers.
Workflow Steps:
- Query by Protein Name - Search UniProt using common protein names
- Retrieve Sequence Data - Get amino acid sequence and metadata
Implementation:
from mcp.client.streamable_http import streamablehttp_client
from mcp import ClientSession
import json
from contextlib import AsyncExitStack
class OrigeneClient:
def __init__(self, server_url: str):
self.server_url = server_url
self.session = None
async def connect(self):
try:
self.transport = streamablehttp_client(
url=self.server_url,
headers={"SCP-HUB-API-KEY": "<your-api-key>"}
)
self._stack = AsyncExitStack()
await self._stack.__aenter__()
self.read, self.write, self.get_session_id = await self._stack.enter_async_context(self.transport)
self.session_ctx = ClientSession(self.read, self.write)
self.session = await self._stack.enter_async_context(self.session_ctx)
await self.session.initialize()
print("✓ Connected to Origene-UniProt")
return True
except Exception as e:
print(f"✗ Connection failed: {e}")
return False
async def disconnect(self):
"""Disconnect from server"""
try:
if hasattr(self, '_stack'):
await self._stack.aclose()
print("✓ already disconnect")
except Exception as e:
print(f"✗ disconnect error: {e}")
def parse_result(self, result):
try:
if hasattr(result, 'content') and result.content:
content = result.content[0]
if hasattr(content, 'text'):
return json.loads(content.text)
return str(result)
except Exception as e:
return {"error": f"Parse error: {e}", "raw": str(result)}
## Initialize client
client = OrigeneClient("https://scp.intern-ai.org.cn/api/v1/mcp/10/Origene-UniProt")
if not await client.connect():
print("Connection failed")
return
## Step 1: Retrieve protein sequence by name
protein_name = "insulin" # Can be common name, gene symbol, or UniProt ID
result = await client.session.call_tool(
"get_protein_sequence_by_name",
arguments={
"protein_name": protein_name
}
)
result_data = client.parse_result(result)
## Display results
print(f"\nProtein: {protein_name}")
print("=" * 80)
if "sequence" in result_data:
sequence = result_data["sequence"]
print(f"Amino Acid Sequence ({len(sequence)} residues):")
print(sequence)
# Format sequence in blocks of 60
print("\nFormatted Sequence:")
for i in range(0, len(sequence), 60):
position = i + 1
block = sequence[i:i+60]
print(f"{position:6d} {block}")
if "uniprot_id" in result_data:
print(f"\nUniProt ID: {result_data['uniprot_id']}")
if "protein_names" in result_data:
print(f"Protein Names: {result_data['protein_names']}")
if "organism" in result_data:
print(f"Organism: {result_data['organism']}")
if "function" in result_data:
print(f"Function: {result_data['function'][:200]}...")
await client.disconnect()
Extended Example: Multiple Protein Retrieval
## Retrieve multiple proteins
protein_list = ["p53", "BRCA1", "insulin", "hemoglobin"]
sequences = {}
for protein in protein_list:
result = await client.session.call_tool(
"get_protein_sequence_by_name",
arguments={"protein_name": protein}
)
data = client.parse_result(result)
if "sequence" in data:
sequences[protein] = {
"sequence": data["sequence"],
"length": len(data["sequence"]),
"uniprot_id": data.get("uniprot_id", "N/A")
}
## Display summary
print("\nProtein Sequence Summary:")
print(f"{'Protein':<15} {'UniProt ID':<12} {'Length':<10}")
print("-" * 40)
for name, info in sequences.items():
print(f"{name:<15} {info['uniprot_id']:<12} {info['length']:<10}")
Tool Description
Origene-UniProt Server:
get_protein_sequence_by_name: Retrieve protein sequence from UniProt database- Args:
protein_name(str): Protein common name, gene symbol, or UniProt ID
- Returns:
sequence(str): Amino acid sequence (one-letter code)uniprot_id(str): UniProt accession numberprotein_names(str): Official and alternative protein namesorganism(str): Source organismfunction(str): Protein function descriptionlength(int): Sequence length in residuesmass(float): Molecular mass (Da)
- Args:
Input/Output
Input:
protein_name: Protein identifier (flexible format)- Examples: "insulin", "P53", "BRCA1", "P01308"
- Supports: common names, gene symbols, UniProt IDs
Output:
- Protein sequence and comprehensive metadata
- Ready for downstream analysis (alignment, structure prediction, etc.)
Supported Query Types
- Common Names: "insulin", "hemoglobin", "actin"
- Gene Symbols: "TP53", "BRCA1", "EGFR"
- UniProt IDs: "P01308", "P04637"
- Protein Families: "kinase", "protease" (returns multiple entries)
Applications
Use retrieved sequences for:
- Protein alignment and homology analysis
- Structure prediction (AlphaFold, ESM Fold)
- Primer design for cloning
- Antibody epitope mapping
- Conservation analysis
- Mutation impact assessment
- Phylogenetic studies
Integration with Other Workflows
Combine with:
- Protein BLAST → Find homologs
- InterProScan → Identify domains
- AlphaFold → Predict 3D structure
- STRING → Find protein interactions
- OpenTargets → Link to diseases
Example: Complete Protein Analysis Pipeline
## 1. Retrieve sequence
result = await uniprot_client.session.call_tool(
"get_protein_sequence_by_name",
arguments={"protein_name": "BRCA1"}
)
sequence = uniprot_client.parse_result(result)["sequence"]
## 2. Find similar proteins (BLAST)
result = await biotools_client.session.call_tool(
"blast_search",
arguments={
"sequence": sequence,
"evalue": 1e-10,
"max_hits": 20
}
)
homologs = biotools_client.parse_result(result)
## 3. Identify domains (InterProScan)
result = await biotools_client.session.call_tool(
"interproscan_analyze",
arguments={
"sequence": sequence,
"databases": ["Pfam", "SMART"]
}
)
domains = biotools_client.parse_result(result)
## 4. Get disease associations (OpenTargets)
result = await opentargets_client.session.call_tool(
"get_target_associated_diseases",
arguments={"gene_symbol": "BRCA1"}
)
diseases = opentargets_client.parse_result(result)
print(f"Complete analysis for BRCA1:")
print(f"- Sequence length: {len(sequence)} amino acids")
print(f"- Homologs found: {len(homologs)}")
print(f"- Functional domains: {len(domains)}")
print(f"- Associated diseases: {len(diseases)}")
Error Handling
Common issues:
- Protein not found: Check spelling, try alternative names or UniProt ID
- Multiple matches: Use more specific identifier (UniProt ID preferred)
- No sequence available: Some entries may lack sequence data
- Network timeout: Retry with exponential backoff
Data Quality Notes
- UniProt is manually curated (Swiss-Prot) and computationally annotated (TrEMBL)
- Sequence quality: Swiss-Prot entries are highly reliable
- Updates: UniProt is updated regularly; sequences may change
- Isoforms: Multiple isoforms may exist; canonical sequence is returned by default
technical
- github
- SpectrAI-Initiative/InnoClaw
- stars
- 374
- license
- Apache-2.0
- contributors
- 16
- last commit
- 2026-04-20T01:27:21Z
- file
- .claude/skills/uniprot-protein-retrieval/SKILL.md