ۼ
DB
   Home > > PSPDB

ܹ ƹ̳ 罽 Ǿ ü̴. ڿ迡 ܹ µ ̴ ƹ̳ 20̸, ̵ ƹ̳ 1 ܹ ̶ Ѵ. 20 ƹ̳ θ 20 ǥ Ƿ, (A,C,D,E,...W,V,Y), Ư ܹ ڿ ǥ ִ(: AVHFGYTRRH). ˷ ܹ ʸ ̸, ̸ŭ پ ܹ ϴ ̴. ܹ ü ̷ ⺻ ̱⵵ , ٰ ü ȿ, ȣ, ü Ͽ Ȱ ϴ ̴. ΰ 츸 ϴ ü 10 ܹ ȣ ۿ Ͽ Ű ִ.
ֱ ΰ صǰ , ̿ õ û ִ. ϼ ǹ̴ ȿ ܹ ƹ̳ ִٴ ִ. ȿ ؿ ܹ ռǸ, ȿ 3 󰳷 Ǵµ, ̸ ܹ (native structure) θ. ܹ 3 󰳴 ܹ ϱ , ܹ ϱ ʼ ̸, ̴ ܹ ƹ̳ ̷ ؼ Ƽ Ǵ ߿ Ư¡̴. ڿ ܹ , ٺ Ǵ ܹ ɿ . ܹ 󰳿 , Ѱ谡 . ܹ κ ܹ 󰳿 س ̾߸ ó ٽ ִ.
ó ޼ ϰ ִ , ͺ̽ , ǻ α׷ ȰϿ, ڻ ͸ , мϰ, ̷κ  ο ϰ ϴ й̴. п, 󰳿 𸣴 ܹ ̹ 󰳳 ˰ ִ ܹ ν ϰ ִ. ܹ ſ ϸ 󿡼 ȭ ̶ ߷ ְ, 󰳿 ɵ ̶ ߷ Ƿ, ܹ 񱳴 ܹ 󰳳 ̸, ߿ϰ ٽ о̴.
ܹ 񱳴 κ ߴٰ ɸŭ ϴ ̴. ̸ 켱 κ ȭ Ȥ Ȯ Ѵ. ܹ AQVT, AET ִ , ȭ 迡 غ ִµ, ̵ ǥ ִ. ǥ ̶ θ.
AQVT    AQVT    AQVT    ...
AET-     AE-T    A- ET     
ִ ̶ , ù ° A ȭ Q E, V T ٲ T ̸ ݸ, ° A T ״ ְ Q E, V ̸ ϰ ִ. - ǥõ κ ƴ(gap)̶ Ѵ. ̷ Ȯ Ͽ ̵ Ȯ ãƾ Ѵ. ν, ġ Ư ƹ̳ ٸ Ư ƹ̳ ٲ Ȯ ٸ ġ Ͼ ϵ ̰, ġ ϴٴ ִ. Ư Ȯ Ư ƹ̳ ٸ ƹ̳ ٲ Ȯ, ƹ̳ ų Ȯ κ ִ. 谡 ִ ġ ƹ̳ a b Ÿ Ȯ ϰ, ƹ̳ a Ÿ Ȯ , g ƴ Ȯ . , ǥ , Ŀ 谡 Ȯ ƹ 谡 Ȯ ǥ ִ. ⼭ i gap ƴ ġ ǥϰ, j ƴ ǥ, ƴ ̸ ǹѴ. (odds ratio) Ҹ ũ Ŭ ȭ Ǿ Ȯ ũ. , ̶  ū ( ׸ ) ȭ 踦 شٰ ִ.
, ޾ Ÿ ǻ underflow ϱ Ͽ, α׸ Ͽ ȯϰ ȴ. Ϲ α (log odds ratio) Ҹ , Ŀ (score) θ.
÷ µ, gap κ ƹ̳ ֿ ̸ ٲ (substitution matrix)̶ θ, (score matrix) θ. (>0) ƴ (gap penalty) Ҹ. Ѵٴ score ãƳٴ ̸, ̴ ȭ ( ׸ ) Ȯ ãƳٴ ǹѴ. İ ƴ Ἥ ּ ȿ ã, dynamic programming̶ ˰ ߵǾ ִ.[] ּ Ͽ ؿ ̴ 谡 ֵȴ. , ̾ , 쿬 ŭ ũ Ȯ Ͽ, Ȯ ʹ ϸ ִٰ ǰ, ׷ Ѵ. Ȯ ŭ ϴٰ ϴ° (siginificance level)̶ ϴµ, ϰ , ŭ Ŭ ִٰ ° ϴ ΰ(threshold value) ǰ ȴ. Ư ˷ ܹ 󰳳 µ ſ ϴ. ̳ 󰳸 ˰ ϴ ܹ ̸ (query sequence) θ. ͺ̽ ִ ܹ Ͽ , ִٰ ŭ Ű ܹ ߰ߵǸ, ׵ 󰳳 󰳿 ߷ ֱ . ׷ ͺ̽ ˻ dynamic programming ϴ ִ. ͺ̽ ܹ ϰ Ǵ ̹Ƿ, ͺ̽ ŭ ݺϰ ȴ. ͺ̽ ִ ܹ ص ̸, ˷ ܹ 쿡 ʸ 鸸 ̸Ƿ, ּ Ȯ ã dynamic programming ð ʹ ɸ ȴ. ּ ƴϴ ð ȿ ã BLAST ˰ ͺ̽ ˻ θ ̰ ִ.
BLAST(Basic Local Alignment Search Tool)[1] (bioinformatics) ǰ ִ ˻̴. ̳ 󰳸 𸣰 ִ ٻ(DNA Ǵ RNA) ̳ ܹ , мϰ ϴ Ǵ ͺ̽(sequence database)κ ãƳ. ͺ̽ ã κ мϰ ϴ س. ͺ̽ ˻ ǰ ִ BLAST̴.
BLAST ϴ ִ. ù ° ̱ NCBIƮ(http://www.ncbi.nlm.nih.gov/blast/) ؼ BLAST ˻ ϴ ̴. мϰ ϴ BLAST (Ǵ ) ǻͿ ġϿ ˻ϴ ξ ȿ̴. ֱ BLAST ϵ(executables) NCBIƮ(ftp://ftp.ncbi.nlm.nih.gov/blast/executables/) ִ. BLAST (Ǵ ) ǻͿ ϱ ؼ ͺ̽鵵 NCBI(ftp://ftp.ncbi.nlm.nih.gov/blast/db/) ȴ. ܹ ٷ 쿡 nr.tar.gz(nr non-redundant ǹϸ ̹ ̻ ܹ ϰ ִ) ȴ. 󰳰 ̹ ˷(Ǵ Ư õ) ܹ ٷ pdb.tar.gz (pataa.tar.gz) Ѵ.
ǥ1: blastall ټ α׷
BLAST ϵ ߿ ϴ blastall̴. blastall ٸ ټ α׷ blastn, blastp, blastx,tblastn, tblastx ϰ ִ. BLAST Ϲ [2] ҰǾ ִ. blastall .
blastall -p blastp -d nr -i query
⿡ -p ϰ ϴ α׷ (ټ ϳ) ش. -d ˻ ͺ̽ ̸ ش(nr non-redundant ܹ ͺ̽ ǹѴ). -i мϰ ϴ FASTA · ִ ̸(ڰ Ƿ ̸ ִ) ش.
BLAST 谡 ִ ã ؼ (scoring matrix) ϴµ, ܹ BLOSUM62 ǥ ȴ(ǥ2 ). BLOSUM BLOcks SUbstitution Matrix ڷν, ƹ̳ ٸ ƹ̳ ٲ ɼ(޸ ǥڸ ƹ̳ 缺) Ÿ [3]. BLOSUM Ŀ ƹ̳ ٲ ִ 츦 ǹϸ, ƹ̳ ٲ ʴ 츦 ǹѴ. 0 Ư ǹ ƹ̳ 쿬 ٲ ִ 츦 ǹѴ. ؼ ģ(homology) ִٰ ִ. BLOSUM62 (ǥ2) 11̰( Ѱ : Ʈ(W) ڱڽ Ǵ ), ̳ʽ 4̴(8 찡 ִ). Ʈ(W) ֽз(N), ֽƽ (D), Ѹ(P) ٲ , ۶̽(G) ̼ҷ(I)̳ (L) ٲ , ֽƽ (D) (L) ٲ , ۷Ÿ (E) ý(C) ٲ , Ѹ(P) Ҿ˶(F) ٲ 찡 ̳ʽ 4 شȴ.
BLOSUM62 (scoring matrix)
ϹǷ (pairwise alignment) Ҹ ݸ, ̻ ϴ ̸ (multiple sequence alignment)̶ θ. ĺ ִµ, Ư ȭ ʴ ̸ (conserved region) ִ. ܹ ɿ ߿ ϴ κ ʱ ̿ Ǵ ſ ߿ϴ. Ϲ, ܹ ġ ƹ̳ ٸ ƹ̳ ٲ Ȯ ִ. ó ϱ ġ ƹ̳ ٸ ƹ̳ ٲ Ȯ ġ ϴٴ Ἥ ݸ ƹ̳ ġȯ Ȯ ġ ٸ , ̸ ġ ٸ (position specific scoring matrix: PSSM), Ȥ (sequence profile)̶ θ. ̷ PSSM ٽ Ŀ ִ.
BLAST ȮϿ PSI-BLAST ̿ ۾ ִ ̴. PSI-BLAST 켱 ġ ϴ BLAST ˰ Ἥ ͺ̽ ˻Ѵ. ̻ 󳽴 , ̷ ٽ Ͽ PSSM . ٽ PSSM Ἥ ͺ̽ ˻Ͽ ܹ ̻ ͵ 󳽴. ã ܹ 󿡼 ȭ ð ȭǾ ù ° ˻ ã ͵̴. ̷ ܹ ٽ Ͽ ο PSSM ̸ Ἥ ٽ ͺ̽ ˻ ִ. ̷ ܹ ٽ Ͽ ο PSSM ̸ Ἥ ٽ ͺ̽ ˻ ִ.
ҵ,BLAST ˻ ̴ Ŀ ƹ̳ ٲ ɼ ׻ Ǿ ִ ݸ, BLAST Ἥ ˻Ͽ ̵ ϰ , ġ ƹ̳ ٸ ƹ̳ ٲ 󵵸 ְ, ̷κ, ġ ƹ̳ ٸ ƹ̳ ٲ Ȯ ִ. ̴ ˻ ؼ ó ̷ ̴. ο Ἥ ˻ ۾ ٽ ְ, 󵿼 ؼ ˻ ʾҴ ο ãƳ ִ. ˻ κ ƹ̳ Ȯ ٽ ְ, ο ٽ ˻ ִ. PSI-BLAST(Position Specific Iterative Basic Local Alignment Search Tool) ̷ ư鼭 BLAST ˻ ۾ Ǯϴ ˰̴.
PSI-BLAST ù ° ˻ ġ ʴ¡ BLOSUM62 ϴ BLAST ϴ. PSI-BLAST ù ° ˻ PSSM . PSSM Ͽ ° PSI-BLAST˻ Ͽ PSSM . ̷ ݺϸ(iterated) Ȯ PSSM ִ.
PSI-BLAST ˻ ϱ ؼ BLAST ϳ blastpgp ̿Ѵ. blastpgp .
blastpgp -d nr -i query -j 3 -h 0.001 -C query.chk
⿡ -d -i blastall Ѵ. -j PSI-BLAST ˻ ݺ(iteration) Ƚ̴. -h PSSM ϴ Ӱ (threshold)̴. -h 0.001 ǥ(default) ȴ. -C ƹ̳ 󵵸 Ÿ (Frequency Matrix:FMTX) (query.chk) ϰ Ѵ. · Ǹ, ̸ PSSM ȯϿ ÿ ASCII Ϸ ϴ BLAST ϳ makemat̴.
PSI-BLAST ˻ Ͽ ȭ 谡 ִ , ˻ Ŀ, ٽ ̵ ϰ , ܹ  ġ ƹ̳ ٸ ƹ̳ ٲ Ȯ ´. ġ 20 ƹ̳ ٲ Ȯ Ƿ, ġ 0 1 20 Ǽ , ̵ ϸ 1 ȴ. checkpoint file Ǹ, ӵ о ֵ · Ǿ ִ. ̷ (Frequency Matrix:FMTX) ( ) x 20 ũ⸦ ķμ(׸1 ), ġ ƹ̳ Ȯ ̶ ִ. Ŀ ü ȭ ֱ , ̸ ٸ (Sequence profile)̶ θ.
(׸)
̹ Ͽ ˻ϱ ؼ Ȯ ƴ϶, ƹ̳ 󵵷 Ͽ α׸ · ȯ ־ Ѵ. BLAST ϳ makemat ٷ ۾ ϴ ̸, ٷ PSSM̴. ͺ̽ 3 󰳰 ˷ 50557 ܹ , · Ǿִ checkpoint file(Ȯ .chk) ASCII (Ȯ .fmtx) · ϸ, makemat Ͽ ̵κ ۼ PSSM (Ȯ .mtx) ϰ ִ.
PSSM Ἥ ͺ̽ ϳ ϴ , query protein profile ͺ̽ ϴ ̶ ִµ, ̸ profile-sequence alignment Ѵ. ̹ , profile-sequence alignment ܼ sequence-sequence alignment ؼ ãƳ ϴ ܹ ô 踦 ˾Ƴ ִ. ó ΰ ̱ ϴ Database˻ profile Ŀ profile ϴ ̸ profile-profile alignment θ ̰ ִ. ó ˻ Ȱϱ ؼ PSSM · Ǿ ִ ϴ.
ҵ 󱼿 ȭ ֱ , ˻ ܿ Ȱ о߰ پϸ, ü Ұϰ Ѵ.
ܹ (primary structure; DNAκ Ư ܹ ƹ̳ )κ ܹ (secondary structure) ϴ ܹ ̶ Ѵ. (tertiary structure) ߿ ϰ, 迡 ū ش.
ܹ з DSSP(dictionary of protein secondary structure) ǥ [1]. DSSP ټ(solvent accessibility) ǥ ϰ ֱ , ܹ ؼ DSSP ˰ ־ Ѵ. DSSP α׷ http://www.cmbi.kun.nl/gv/dssp/ ִ. DSSP α׷ ˷ ִ ܹ ټ ش. DSSP α׷ ܹ (G: 310 helix, H: alpha helix, I: pi helix, B: residue in isolated beta-bridge, E: extended strand, participates in beta ladder, T: hydrogen-bonded turn, S: bend, X: otherwise) зѴ. ܹ з з(H: helix, E: extended, C: coil) ٿ Ѵ(G, H, and I H; B and E E; T, S, and X C).

b-sheet
ǰ ִ ܹ α׷ PSIPRED̴ [2]. PSIPRED Ȯ(prediction accuracy) 80% ȴ. PSIPRED PSIPRED Protein Structure Prediction Server(http://bioinf.cs.ucl.ac.uk/psipred/psiform.html) ִ [3]. Ʈ ܹ ƹ̳ Էϸ ܹ ִ. ƹ̳ 鿡 ؼ PSIPRED (Ǵ ) ǻͿ ġϿ ϴ ξ ȿ̴. PSIPRED α׷ "ftp://bioinf.cs.ucl.ac.uk/pub/psipred/" ִ. PSIPRED ϱ PSI-BLAST ġؾ PSIPRED ۵Ѵ.
PSIPRED PSI-BLAST ִ "ġ(Ǵ Ư ) ϴ "(position-specific scoring matrix; ϰ PSSM̶ θ)̴. PSSM n×20̴. ⿡ n мϰ ϴ Ư ܹ ̰, 20 20 ٸ ƹ̳ Ÿ. PSIPRED PSSM Ѵ. ϳ ũ 15×20̸, Է (input vector) θ. ⿡ 15 â ũ(window size)̸, 15 ʿ . PSIPRED (David T. Jones) ǻ 15 , 15 ȿ̶ Ͽ. PSIPRED Ǵ Ű(neural network) ε, ν(pattern recognition) н(machine learning) ǥ Ǵ ̴.
Ű Ʒ(training) ؼ Ű Ű(parameters) Ѵ. Ʒ Ű ˷ ܹ ̿Ѵ. 15×20 Ҹ Է Ϳ ִµ, 15 ƹ̳ ߿  ġϰ ִ ° ƹ̳ (DSSP ؼ ) Է ش. Ʒ ̶ Է ͸ Ű ־ Է Ű ǵ Ű Ű ִ ̴. ʹ ϰ Ʒϸ Ʒÿ ܹ Ȯϰ ߰ ǹǷ, Ʒ ־ Ѵ. Ʒ ̾߸ Ű κ̰, Ʒÿ ð ҿȴ.
Ű Ʒÿ ð ҿ, ϴ Ʒ ȴ. ˷ ƹ̳ ־ PSIPRED ϴ (PSI-BLAST ̿Ͽ) PSSM . PSSMκ Է ͵  Ű ϴ ȴ. PSIPRED κ ð PSSM µ ҿѴ. ̷ ÿ Ȯ . PSIPRED ̴.
ܹ Ű ̿ܿ νİ н ˷ (support vector machine) ̿(nearest neighbor) Ǿ Դ. ̿ ܹ ߿ ǥ SVMpsi̴ [5]. SVMpsi PSIPRED Ǿ Է ͸ Ȱ Ѵ. Ʒ ð Ű ؼ ҿȴٴ ̴.
Ű Ʒ ʿѵ, Ʒ ð ҿǰ ϴ. Ʒ ʿ ν DZ⵵ ϴµ, ̶ ǥ Ǵ ̿ ̴. ̿ ̿ ܹ ߿ ǥ PREDICT̴ [6,7]. PREDICT ϴ Է ʹ PSIPRED SVMpsi ̿Ǵ Է Ϳ ϴ. ̿ ˷ Է Ϳ ˷ Է ͵鰣 Ÿ ؼ, ˷ Է ͵ ߿ Ÿ Է ͵ ߷. 100 Է ͵ ߷´ٰ غ. ߷ Է ͵ 캻. ߿ helix 20, extended 50, coil 30 ˷ Է extended̴. 15×20 Ҹ Է ͸ ǥϴ ° ƹ̳ extended ȴ.
ܹ غ ̿ ϴ . PSI-BLAST ̿ؼ PSSM ִٸ ̿ ̿ؼ ܹ ִ. ( PSSM) ͺ̽ ۾ ̱ (http://www.cheric.org), PSSM 𸣴 ܹ ϰ ̴.
뷫 100 ƹ̳ ̷ ִ ܹ  ִ. ݸ鿡 200 ̻ ƹ̳ ̷ ִ ܹ(ü ִ κ ܹ 200 ̻ ƹ̳ Ǿ ִ)  ִ(׸1 ).  (domain)̶ Ѵ(׸2 ). Ϲ ܹ ȿ ε ٸ ִ.
(׸1: ũ⿡
(x- : ƹ̳ ) [1].
ܹ (folding) ⺻ . ܹ (fold) з ü ܹ ϴ ƴ϶ ش ܹ  зѴ. ̷ з (fold) ȹ ȴ. ܹ з ͺ̽δ SCOP(Structure Classification Of Proteins; http://scop.mrc-lmb.cam.ac.uk/scop/index.html) CATH(Class, Architecture, Topology, and Homology; http://www.cathdb.info/latest/index.html) ִ.
׸2: ǵ ̷ִ ܹ
SCOP(1.67 release) PDB(Protein Data Bank) ִ 24037 ܹ 65122  ̵ ε (fold) зϰ ִ(ǥ1 ). ν ǥ1 class ߿ Multi-domain proteins ִµ, ̵ ʰ з ̴. ǥ1 fold ܹ ǹϸ, superfamily ܹ Īϰ, family 缺 25%̻ ܹ ǹѴ. , SCOP alpha-helix ̷ ܹ( Ȯϰ ǥϸ ) 202 (fold) зϰ ִµ, globin-like fold̴. fold Ư¡ ܹ ߽ 6 alpha-helix ̷ִٴ ̴. globin-like fold globin-like superfamily alpha-helical ferredoxin superfamily . ٽ ڴ 4 family ڴ 2 family . globin-like superfamily ϴ family߿ globins familyν hemoglobin ܹ myoglobin ܹ ϰ ִ.
ǥ1: SCOP
ƹ̳ (sequence) ܹ (domain boundary) ãƳ ƴ. ̿(bioinformatics) Ǯ ߿ ϳ̴. ܹ ٸ ֱ , ε ؾ ȴ. ܹ ϱ ؼ ش ܹ 迡 ̸ ˰ ־ Ѵ. 迡 迡 ش.
ܹ ũ . ù ° κ  ̵ мϿ 踦 ϴ ̴. ǥ δ SnapDRAGON [2] RosettaDom ִ. ̵ µ ʹ ð ҺѴٴ ִ.
° δ κ ٷ 踦 ϴ ִµ, ù ° ؼ ð ҸDZ ߵǾ. ܼ δ ũ (׸1 ) ϴ DGS(Domain Guess by Size) ִ [1]. DGS ߿ ٻν ̴. δ PSIPRED ܹ ̿ϴ DomSSEA [3] ִ κ linker(׸2 ) мϴ DomCut [4] ִ. پ ִ PSI-BLAST ִ ġ(Ǵ Ư ) ϴ ġ(position-specific scoring matrix; ϰ PSSM̶ θ) ϴ ̴. PSSM Ͽ 踦 DOMAINATION [5]ε, ణ ̴. δ PSSM Ű(neural network) ̷ PPRODO(Prediction of PROtein DOmain boundaries) [6] ִ. ģ (homology) ؼ 踦 ãش.