ܹ ƹ̳ 罽 Ǿ ü̴. ڿ迡 ܹ µ ̴ ƹ̳ 20̸, ̵ ƹ̳ 1 ܹ ̶ Ѵ. 20 ƹ̳ θ 20 ǥ Ƿ, (A,C,D,E,...W,V,Y), Ư ܹ ڿ ǥ ִ(: AVHFGYTRRH). ˷ ܹ ʸ ̸, ̸ŭ پ ܹ ϴ ̴. ܹ ü ̷ ⺻ ̱ , ٰ ü ȿ, ȣ, ü Ͽ Ȱ ϴ ̴. ΰ 츸 ϴ ü 10 ܹ ȣ ۿ Ͽ Ű ִ.
ֱ ΰ صǰ , ̿ õ û ִ. ϼ ǹ̴ ȿ ܹ ƹ̳ ִٴ ִ. ȿ ؿ ܹ ռǸ, ȿ 3 Ǵµ, ̸ ܹ (native structure) θ. ܹ 3 ܹ ϱ , ܹ ϱ ʼ ̸, ̴ ܹ ƹ̳ ̷ ؼ Ƽ Ǵ ߿ Ư¡̴. ڿ ܹ , ٺ Ǵ ܹ ɿ . ܹ , Ѱ谡 . ܹ κ ܹ س ̾߸ ó ٽ ִ.
ó ϰ ִ , ͺ̽ , ǻ α ȰϿ, ڻ , мϰ, ̷κ ο ϰ ϴ й̴. п, ܹ ̹ ˰ ִ ܹ ν ϰ ִ. ܹ ſ ϸ ȭ ̶ ߷ ְ, ɵ ̶ ߷ Ƿ, ܹ ܹ ̸, ߿ϰ ٽ о̴.
ܹ κ ߴٰ ɸŭ ϴ ̴. ̸ 켱 κ ȭ Ȥ Ȯ Ѵ. ܹ AQVT, AET ִ , ȭ 迡 غ ִµ, ̵ ǥ ִ. ǥ ̶ θ.
AQVT AQVT AQVT ...
AET- AE-T A- ET
ִ ̶ , ù ° A ȭ Q E, V T ٲ T ̸ ݸ, ° A T ״ ְ Q E, V ̸ ϰ ִ. - ǥõ κ ƴ(gap)̶ Ѵ. ̷ Ȯ Ͽ ̵ Ȯ ãƾ Ѵ. ν, ġ Ư ƹ̳ ٸ Ư ƹ̳ ٲ Ȯ ٸ ġ Ͼ ϵ ̰, ġ ϴٴ ִ. Ư Ȯ Ư ƹ̳ ٸ ƹ̳ ٲ Ȯ, ƹ̳ ų Ȯ κ ִ. 谡 ִ ġ ƹ̳ a b Ÿ Ȯ  ϰ, ƹ̳ a Ÿ Ȯ  , g ƴ Ȯ  .  ,  ǥ , Ŀ 谡 Ȯ ƹ 谡 Ȯ  ǥ ִ. ⼭ i gap ƴ ġ ǥϰ, j ƴ ǥ,  ƴ ̸ ǹѴ. (odds ratio) Ҹ ũ Ŭ ȭ Ǿ Ȯ ũ. , ̶ ū ( ) ȭ 踦 شٰ ִ.
, Ÿ ǻ underflow ϱ Ͽ, α Ͽ ȯϰ ȴ. Ϲ α (log odds ratio) Ҹ , Ŀ (score) θ.
÷ µ, gap κ ƹ̳ ֿ  ̸ ٲ (substitution matrix)̶ θ, (score matrix) θ.  (>0) ƴ (gap penalty) Ҹ. Ѵٴ score ãƳٴ ̸, ̴ ȭ ( ) Ȯ ãƳٴ ǹѴ. İ ƴ Ἥ ּ ȿ ã, dynamic programming̶ ˰ ߵǾ ִ.[] ּ Ͽ ؿ ̴ 谡 ֵȴ. , ̾ , 쿬 ŭ ũ Ȯ Ͽ, Ȯ ʹ ϸ ִٰ
ǰ, Ѵ. Ȯ ŭ ϴٰ ϴ° (siginificance level)̶ ϴµ, ϰ , ŭ Ŭ ִٰ ° ϴ ΰ(threshold value) ǰ ȴ. Ư ˷ ܹ µ ſ ϴ. ̳ ˰ ϴ ܹ ̸ (query sequence) θ. ͺ̽ ִ ܹ Ͽ , ִٰ ŭ Ű ܹ ߰ߵǸ, ߷ ֱ . ͺ̽ ˻ dynamic programming ϴ ִ. ͺ̽ ܹ ϰ Ǵ ̹Ƿ, ͺ̽ ŭ ݺϰ ȴ. ͺ̽ ִ ܹ ص ̸, ˷ ܹ 쿡 ʸ 鸸 ̸Ƿ, ּ Ȯ ã dynamic programming ð ʹ ɸ ȴ. ּ ƴϴ ð ȿ ã BLAST ˰ ͺ̽ ˻ θ ̰ ִ.
BLAST(Basic Local Alignment Search Tool)[1] (bioinformatics) ǰ ִ ˻̴. ̳ ִ ٻ(DNA Ǵ RNA) ̳ ܹ , мϰ ϴ Ǵ ͺ̽(sequence database)κ ãƳ. ͺ̽ ã κ мϰ ϴ س. ͺ̽ ˻ ǰ ִ BLAST̴.
ǥ1: blastall ټ α
BLAST ϵ ߿ ϴ blastall̴. blastall ٸ ټ α blastn, blastp, blastx,tblastn, tblastx ϰ ִ. BLAST Ϲ [2] ҰǾ ִ. blastall .
blastall -p blastp -d nr -i query
-p ϰ ϴ α (ټ ϳ) ش. -d ˻ ͺ̽ ̸ ش(nr non-redundant ܹ ͺ̽ ǹѴ). -i мϰ ϴ FASTA · ִ ̸(ڰ Ƿ ̸ ִ) ش.
BLAST 谡 ִ ã ؼ (scoring matrix) ϴµ, ܹ BLOSUM62 ǥ ȴ(ǥ2 ). BLOSUM BLOcks SUbstitution Matrix ڷν, ƹ̳ ٸ ƹ̳ ٲ ɼ( ǥڸ ƹ̳ 缺) Ÿ [3]. BLOSUM Ŀ ƹ̳ ٲ ִ 츦 ǹϸ, ƹ̳ ٲ ʴ 츦 ǹѴ. 0 Ư ǹ ƹ̳ 쿬 ٲ ִ 츦 ǹѴ. ؼ ģ(homology) ִٰ ִ. BLOSUM62 (ǥ2) 11̰( Ѱ : Ʈ(W) ڱڽ Ǵ ), ̳ʽ 4̴(8 찡 ִ). Ʈ(W) ֽз(N), ֽƽ (D), Ѹ(P) ٲ , ۶̽(G) ̼ҷ(I)̳ (L) ٲ , ֽƽ (D) (L) ٲ , ۷Ÿ (E) ý(C) ٲ , Ѹ(P) Ҿ˶(F) ٲ 찡 ̳ʽ 4 شȴ. BLOSUM62 (scoring matrix)
ϹǷ (pairwise alignment) Ҹ ݸ, ̻ ϴ ̸ (multiple sequence alignment)̶ θ. ĺ ִµ, Ư ȭ ʴ ̸ (conserved region) ִ. ܹ ɿ ߿ ϴ κ ʱ ̿ Ǵ ſ ߿ϴ. Ϲ, ܹ ġ ƹ̳ ٸ ƹ̳ ٲ Ȯ ִ. ó ϱ ġ ƹ̳ ٸ ƹ̳ ٲ Ȯ ġ ϴٴ Ἥ ݸ ƹ̳ ġȯ Ȯ ġ ٸ , ̸ ġ ٸ (position specific scoring matrix: PSSM), Ȥ (sequence profile)̶ θ. ̷ PSSM ٽ Ŀ ִ.
BLAST ȮϿ PSI-BLAST ̿ ۾ ִ ̴. PSI-BLAST 켱 ġ ϴ BLAST ˰ Ἥ ͺ̽ ˻Ѵ. ̻ , ̷ ٽ Ͽ PSSM . ٽ PSSM Ἥ ͺ̽ ˻Ͽ ܹ ̻ ͵ . ã ܹ ȭ ð ȭǾ ù ° ˻ ã ͵̴. ̷ ܹ ٽ Ͽ ο PSSM ̸ Ἥ ٽ ͺ̽ ˻ ִ. ̷ ܹ ٽ Ͽ ο PSSM ̸ Ἥ ٽ ͺ̽ ˻ ִ.
ҵ,BLAST ˻ ̴ Ŀ ƹ̳ ٲ ɼ Ǿ ִ ݸ, BLAST Ἥ ˻Ͽ ̵ ϰ , ġ ƹ̳ ٸ ƹ̳ ٲ ְ, ̷κ, ġ ƹ̳ ٸ ƹ̳ ٲ Ȯ ִ. ̴ ˻ ؼ ó ̷ ̴. ο Ἥ ˻ ۾ ٽ ְ, ؼ ˻ ʾҴ ο ãƳ ִ. ˻ κ ƹ̳ Ȯ ٽ ְ, ο ٽ ˻ ִ. PSI-BLAST(Position Specific Iterative Basic Local Alignment Search Tool) ̷ ư鼭 BLAST ˻ ۾ Ǯϴ ˰̴.
PSI-BLAST ù ° ˻ ġ ʴ¡ BLOSUM62 ϴ BLAST ϴ. PSI-BLAST ù ° ˻ PSSM . PSSM Ͽ ° PSI-BLAST˻ Ͽ PSSM . ̷ ݺϸ(iterated) Ȯ PSSM ִ.
PSI-BLAST ˻ ϱ ؼ BLAST ϳ blastpgp ̿Ѵ. blastpgp . blastpgp -d nr -i query -j 3 -h 0.001 -C query.chk
-d -i blastall Ѵ. -j PSI-BLAST ˻ ݺ(iteration) Ƚ̴. -h PSSM ϴ Ӱ (threshold)̴. -h 0.001 ǥ(default) ȴ. -C ƹ̳ Ÿ (Frequency Matrix:FMTX) (query.chk) ϰ Ѵ. · Ǹ, ̸ PSSM ȯϿ ÿ ASCII Ϸ ϴ BLAST ϳ makemat̴.
PSI-BLAST ˻ Ͽ ȭ 谡 ִ , ˻ Ŀ, ٽ ̵ ϰ , ܹ ġ ƹ̳ ٸ ƹ̳ ٲ Ȯ ´. ġ 20 ƹ̳ ٲ Ȯ Ƿ, ġ 0 1 20 Ǽ , ̵ ϸ 1 ȴ. checkpoint file Ǹ, ӵ о ֵ · Ǿ ִ. ̷ (Frequency Matrix:FMTX) ( ) x 20 ũ⸦ ķμ(1 ), ġ ƹ̳ Ȯ ̶ ִ. Ŀ ü ȭ ֱ , ̸ ٸ (Sequence profile)̶ θ.
()
̹ Ͽ ˻ϱ ؼ Ȯ ƴ϶, ƹ̳ Ͽ α · ȯ ־ Ѵ. BLAST ϳ makemat ٷ ۾ ϴ ̸, ٷ PSSM̴. ͺ̽ 3 ˷ 50557 ܹ , · Ǿִ checkpoint file(Ȯ .chk) ASCII (Ȯ .fmtx) · ϸ, makemat Ͽ ̵κ ۼ PSSM (Ȯ .mtx) ϰ ִ.
PSSM Ἥ ͺ̽ ϳ ϴ , query protein profile ͺ̽ ϴ ̶ ִµ, ̸ profile-sequence alignment Ѵ. ̹ , profile-sequence alignment ܼ sequence-sequence alignment ؼ ãƳ ϴ ܹ ô 踦 ˾Ƴ ִ. ó ΰ ̱ ϴ Database˻ profile Ŀ profile ϴ ̸ profile-profile alignment θ ̰ ִ. ó ˻ Ȱϱ ؼ PSSM · Ǿ ִ ϴ.
ҵ ȭ ֱ , ˻ ܿ Ȱ о߰ پϸ, ü Ұϰ Ѵ.
ܹ (primary structure; DNAκ Ư ܹ ƹ̳ )κ ܹ (secondary structure) ϴ ܹ ̶ Ѵ. (tertiary structure) ߿ ϰ, 迡 ū ش.
ܹ з DSSP(dictionary of protein secondary structure) ǥ [1]. DSSP ټ(solvent accessibility) ǥ ϰ ֱ , ܹ ؼ DSSP ˰ ־ Ѵ. DSSP α http://www.cmbi.kun.nl/gv/dssp/ ִ. DSSP α ˷ ִ ܹ ټ ش. DSSP α ܹ (G: 310 helix, H: alpha helix, I: pi helix, B: residue in isolated beta-bridge, E: extended strand, participates in beta ladder, T: hydrogen-bonded turn, S: bend, X: otherwise) зѴ. ܹ з з(H: helix, E: extended, C: coil) ٿ Ѵ(G, H, and I  H; B and E  E; T, S, and X  C).
b-sheet
PSIPRED PSI-BLAST ִ "ġ(Ǵ Ư ) ϴ "(position-specific scoring matrix; ϰ PSSM̶ θ)̴. PSSM n×20̴. n мϰ ϴ Ư ܹ ̰, 20 20 ٸ ƹ̳ Ÿ. PSIPRED PSSM Ѵ. ϳ ũ 15×20̸, Է (input vector) θ. 15 â ũ(window size)̸, 15 ʿ . PSIPRED (David T. Jones) ǻ 15 , 15 ȿ̶ Ͽ. PSIPRED Ǵ Ű(neural network) ε, ν(pattern recognition) н(machine learning) ǥ Ǵ ̴.
Ű Ʒ(training) ؼ Ű Ű(parameters) Ѵ. Ʒ Ű ˷ ܹ ̿Ѵ. 15×20 Ҹ Է Ϳ ִµ, 15 ƹ̳ ߿ ġϰ ִ ° ƹ̳ (DSSP ؼ ) Է ش. Ʒ ̶ Է Ű ־ Է Ű ǵ Ű Ű ִ ̴. ʹ ϰ Ʒϸ Ʒÿ ܹ Ȯϰ ߰ ǹǷ, Ʒ ־ Ѵ. Ʒ ̾߸ Ű κ̰, Ʒÿ ð ҿȴ.
Ű Ʒÿ ð ҿ, ϴ Ʒ ȴ. ˷ ƹ̳ ־ PSIPRED ϴ (PSI-BLAST ̿Ͽ) PSSM . PSSMκ Է ͵ Ű ϴ ȴ. PSIPRED κ ð PSSM µ ҿѴ. ̷ ÿ Ȯ . PSIPRED ̴.
ܹ Ű ̿ܿ νİ н ˷ (support vector machine) ̿(nearest neighbor) Ǿ Դ. ̿ ܹ ߿ ǥ SVMpsi̴ [5]. SVMpsi PSIPRED Ǿ Է Ȱ Ѵ. Ʒ ð Ű ؼ ҿȴٴ ̴.
Ű Ʒ ʿѵ, Ʒ ð ҿǰ ϴ. Ʒ ʿ ν DZ ϴµ, ̶ ǥ Ǵ ̿ ̴. ̿ ̿ ܹ ߿ ǥ PREDICT̴ [6,7]. PREDICT ϴ Է ʹ PSIPRED SVMpsi ̿Ǵ Է Ϳ ϴ. ̿ ˷ Է Ϳ ˷ Է ͵鰣 Ÿ ؼ, ˷ Է ͵ ߿ Ÿ Է ͵ ߷. 100 Է ͵ ߷´ٰ غ. ߷ Է ͵ 캻. ߿ helix 20, extended 50, coil 30 ˷ Է extended̴. 15×20 Ҹ Է ǥϴ ° ƹ̳ extended ȴ.
뷫 100 ƹ̳ ̷ ִ ܹ ִ. ݸ鿡 200 ̻ ƹ̳ ̷ ִ ܹ(ü ִ κ ܹ 200 ̻ ƹ̳ Ǿ ִ) ִ(1 ). (domain)̶ Ѵ(2 ). Ϲ ܹ ȿ ε ٸ ִ.
(1: ũ
(x- : ƹ̳ ) [1].
2: ǵ ̷ִ ܹ
SCOP(1.67 release) PDB(Protein Data Bank) ִ 24037 ܹ 65122 ̵ ε (fold) зϰ ִ(ǥ1 ). ν ǥ1 class ߿ Multi-domain proteins ִµ, ̵ ʰ з ̴. ǥ1 fold ܹ ǹϸ, superfamily ܹ Īϰ, family 缺 25%̻ ܹ ǹѴ. , SCOP alpha-helix ̷ ܹ( Ȯϰ ǥϸ ) 202 (fold) зϰ ִµ, globin-like fold̴. fold Ư¡ ܹ ߽ 6 alpha-helix ̷ִٴ ̴. globin-like fold globin-like superfamily alpha-helical ferredoxin superfamily . ٽ ڴ 4 family ڴ 2 family . globin-like superfamily ϴ family߿ globins familyν hemoglobin ܹ myoglobin ܹ ϰ ִ.
ǥ1: SCOP
ƹ̳ (sequence) ܹ (domain boundary) ãƳ ƴ. ̿(bioinformatics) Ǯ ߿ ϳ̴. ܹ ٸ ֱ , ε ؾ ȴ. ܹ ϱ ؼ ش ܹ 迡 ̸ ˰ ־ Ѵ. 迡 迡 ش.
ܹ ũ . ù ° κ ̵ мϿ 踦 ϴ ̴. ǥ δ SnapDRAGON [2] RosettaDom ִ. ̵ µ ʹ ð ҺѴٴ ִ.
° δ κ ٷ 踦 ϴ ִµ, ù ° ؼ ð ҸDZ ߵǾ. ܼ δ ũ (1 ) ϴ DGS(Domain Guess by Size) ִ [1]. DGS ߿ ٻν ̴. δ PSIPRED ܹ ̿ϴ DomSSEA [3] ִ κ linker(2 ) мϴ DomCut [4] ִ. پ ִ PSI-BLAST ִ ġ(Ǵ Ư ) ϴ ġ(position-specific scoring matrix; ϰ PSSM̶ θ) ϴ ̴. PSSM Ͽ 踦 DOMAINATION [5]ε, ణ ̴. δ PSSM Ű(neural network) ̷ PPRODO(Prediction of PROtein DOmain boundaries) [6] ִ. ģ (homology) ؼ 踦 ãش.
|