tna鍜屾皑鍩洪吀缁撳悎
馃尳馃尳鐪熺殑鏄€庝箞鍋氭€庝箞濂藉悆锛屽彛鎰熺敇鐢滈矞瀚╋紝杩樺惈鏈夊绉嶇淮浠栧懡绫荤墿璐ㄥ拰姘ㄥ熀閰哥被鐗╄川銆備粖澶╁氨鏉ョ敤鐜夌背鍋氫竴閬撳甯稿ソ鍚冪殑涓嬮キ鑿滐紝瀛滅劧鐜夌背銆備竴鐩樼敤浜嗗崄鍏冧笉鍒帮綖
馃巰椋熸潗锛氿煂界帀绫炽€佸皬钁便€佸瓬鐒躲€佽荆妞掗潰銆佺洂銆佹按锛岄鐢ㄦ补銆?br/>馃巰鍋氭硶锛?br/>1.姘村紑鍚庝笅鐜夌背绮掔叜2锝?鍒嗛挓锛屾崬鍑哄鐢ㄣ€?br/>2.鐜夌背绮掟煂藉姞鍏?鍕烘穩绮夛紝鎼呮媽鍧囧寑銆?br/>3.鐑攨涓嬫补锛屼笅鐜夌背缈荤倰锛屽姞鍏ュ皯璁哥洂鉃曞皬鍗婂嫼瀛滅劧绮夆灂澶у崐鍕吼煂讹笍馃尪锔忚荆妞掗潰锛岀炕鐐掑潎鍖€鍚庤捣閿呰鐩樸€?br/>4.鎾掍笂钁辫姳锛屽嵆鍙紑鍚冨暒锝?br/>鐪熺殑鍙堥鍙堜笅楗紝涓€瀹氳璇曡瘯锛屼袱鏍圭帀绫冲氨鍙互鐐掍竴鐩樺暒銆傚缓璁鍋氫竴鐐癸紝涓嶇劧涓嶅鍚?/div>
鍘熼棿闅斿簭鍒椾复杩戝熀搴?/strong>
鐢熶俊鎶€鏈?/span>
杩欓噷闈㈡秹鍙婂埌鐨勪竴浜涘悕璇嶇殑鍚箟锛屽洜涓烘病鏈変竴涓‘鍒囩殑涓枃鍚箟锛屾墍浠ュ湪鍚庨潰鎴戜細鐢ㄥ叾涓竴绉嶄腑鏂囧惈涔夋潵琛ㄧず锛?em>Consensus Sequences : 鍏辨湁鎬у簭鍒楋紱涓€鑷存€у簭鍒楋紱涓€鑷村簭鍒?em>Sequence Motifs : 搴忓垪鍩哄簭Consensus Sequences : 涓€鑷存€у簭鍒?em>The Motif Finding Problem : 鍩哄簭鎼滃闂
搴忓垪鍩哄簭鍙婂叾鐢熺墿瀛︽剰涔?/h1>
搴忓垪鍩哄簭(Sequence Motifs) 鏄垎甯冨湪鍩哄洜缁勪腑鎴栧熀鍥犵粍鍐呯殑鏍搁吀搴忓垪锛屽叿鏈夋垨鎺ㄦ祴鍏锋湁鏌愪簺璋冭妭鎴栫粨鏋勭敓鐗╁鍔熻兘銆?/p>
鍦ㄥ熀鍥犵粍鐨勪笉鍚岄儴鍒嗭紙濡傚鏄惧瓙銆佸唴鍚瓙鍜孌NA涓笉缂栫爜铔嬬櫧璐ㄥ簭鍒楃殑鐗囨锛変腑鍙戠幇鐨勫熀搴忓叿鏈変笉鍚岀殑鍔熻兘銆?/p>
澶栨樉瀛?鍩哄洜缁勭殑缂栫爜閮ㄥ垎) 涓瓨鍦ㄧ殑鍩哄簭鍐冲畾浜嗚泲鐧借川鎴栨爣璁拌泲鐧界殑缁撴瀯锛岃繖浜涜泲鐧借川鎴栨爣璁拌泲鐧藉皢琚彂閫佸埌缁嗚優鐨勬煇浜涢儴鍒嗕互杩涜纾烽吀鍖栫瓑杩囩▼銆?/p>
鍐呭惈瀛?鏋勬垚鍩哄洜缁勭殑闈炵紪鐮侀儴鍒? 涓殑鍩哄簭閫氬父鏄喅瀹氬熀鍥犺〃杈鹃噺鍜岃泲鐧借川缁撳悎浣嶇偣鐨勮皟鎺у簭鍒椼€?/p>
satellite DNA 鏄潃涓濈矑鍜屽紓鏌撹壊璐ㄧ殑涓昏鎴愬垎锛屾槸鍩哄洜缁?junk 閮ㄥ垎涓彂鐜扮殑鍩哄簭鐨勪竴涓緥瀛愩€?/p>
搴忓垪鍩哄簭涓庝竴鑷存€у簭鍒?/h1>
搴忓垪鍩哄簭鐨勪笉鍚屽嚭鐜板彲鑳藉郊姝や笉鍚岋紝鍗充娇瀹冧滑鎵ц鐩稿悓鐨勫姛鑳姐€傚洜姝わ紝鎴戜滑瀹氫箟浜嗕竴缁勫簭鍒楃殑涓€鑷存€у簭鍒椼€?/p>
缁欏畾涓€缁勫簭鍒楋紝涓€鑷村簭鍒楋紙涔熺О涓烘爣鍑嗗簭鍒楋級鏄€氳繃鍦ㄦ瘡涓綅缃彁鍙栨渶甯歌鐨勬牳閰?姘ㄥ熀閰告畫鍩鸿€岃幏寰楃殑搴忓垪銆?/p>
渚嬪锛孉GAT銆乀GAC鍜孉CAC鐨勪竴鑷村簭鍒楁槸AGAC銆?/p>
灏嗗簭鍒椾竴涓帴涓€涓湴鍐欏湪鍙︿竴涓笅闈㈠彲浠ユ洿瀹规槗鍦扮湅鍒拌繖涓€鐐癸紙A鏄綅缃?涓渶甯歌鐨勬畫鍩猴紝G鏄綅缃?涓渶甯歌鐨勬畫鍩猴紝渚濇绫绘帹锛夈€?/p>
AGAT := Sequence 1TGAC := Sequence 2ACAC := Sequence 3 AGAC := Consensus sequence
鍩哄簭鍜屼竴鑷存€у簭鍒楃殑琛ㄧず
閴翠簬motifs鏄湪澶氫釜浣嶇疆鍙戠幇鐨勫瓙涓叉ā寮忥紝姣忎釜浣嶇疆鍙兘鏈変篃鍙兘娌℃湁绐佸彉锛屽洜姝ゅ瓨鍦ㄨ〃绀鸿繖浜涘簭鍒楃殑鐗瑰畾鏂瑰紡銆?/p>
鏂囨湰琛ㄧず
鎴戜滑灏嗕娇鐢ㄤ互涓嬫牳閰稿簭鍒楃ず渚嬫潵浠嬬粛琛ㄧず鐨勫悇涓柟闈€備緥瀛愶紝
T [GA] N Y {T} [CT] R
鍦ㄦ琛ㄧず涓紝A銆乀銆丟 鍜?C 琛ㄧず鍥涚鍙兘鐨勬牳鑻烽吀纰卞熀锛氳吅鍢屽懁銆佽兏鑵哄槯鍟躲€佽優鍢у暥鍜岄笩鍢屽懁銆?/span>N 琛ㄧず浠讳綍鍚爱纰卞熀锛?/span>Y 鍜?R 鍒嗗埆琛ㄧず鍢у暥锛圱鎴朇锛夊拰鍢屽懁锛圓鎴朑锛夈€?/span>[GA] 鎰忓懗鐫€锛屽湪閭d釜浣嶇疆锛岃涔堟槸 G锛岃涔堟槸 A銆?/span>
涔熺粡甯哥敤鈥?strong>RRACH鈥濇潵琛ㄧず瀹冦€傝繖涓€?strong>RRACH鈥濆氨鏄竴绉嶄竴鑷存€у簭鍒楃殑琛ㄧず鏂瑰紡
绠€骞剁⒈鍩烘甯哥⒈鍩?/span>RA/GYC/TMA/CKG/TSG/CWA/THA/T/CBG/T/CVG/A/CDG/A/TNA/T/C/G
鍥惧舰琛ㄧず
杩樻湁涓€绉嶆樉绀轰竴鑷村簭鍒楃殑鍥惧舰鏂规硶(consensus logo method)銆傚叡璇嗘爣蹇?consensus logo)浼犺揪浜嗘湁鍏冲簭鍒楀熀搴忔瘡涓綅缃殑淇濆畧鎬х殑淇℃伅銆?/p>
鍏辨湁鏍囪瘑浣跨敤璇ヤ綅缃叡鏈夌⒈鍩虹殑楂樺害鏉ユ弿杩版瘡涓綅缃殑淇濆畧绋嬪害锛堟敞鎰忎繚瀹堢▼搴︿笌姣忎釜浣嶇疆姣忎釜鏍歌嫹閰哥殑棰戠巼涓嶅悓锛夈€?/p>
鍦ㄦ垜浠殑鏃ュ父搴旂敤涓紝鎴戜滑缁忓父浼氱湅鍒?motif 杩欎釜璇嶇殑鍑虹幇锛屽線寰€浼氫即闅忕潃杩欐牱涓€寮爈ogo灞曠ず鍥俱€?/p>
杩欐牱鐨刲ogo缁忓父鐢ㄤ簬鎻忚堪搴忓垪鐗瑰緛锛屽DNA涓殑铔嬬櫧璐ㄧ粨鍚堜綅鐐圭瓑銆?/p>
motif logo鐢辨瘡涓綅缃殑涓€鍫嗗瓧姣嶇粍鎴愩€傚瓧姣嶇殑鐩稿澶у皬琛ㄧず瀹冧滑鍦ㄥ簭鍒椾腑鐨勯鐜囥€?/p>
姣忎釜瀛楁瘝鐨勯珮搴︿笌璇ヤ綅缃殑鐩稿簲纰卞熀鐨勫嚭鐜伴鐜囨垚姝f瘮锛屽父浠its涓哄崟浣嶃€?/p>
姣忎釜浣嶇疆鐨勫瓧姣嶆寜鐓т繚瀹堟€т粠澶у埌灏忔帓鍒楋紝鍙互鏂逛究鐨勪粠椤剁鐨勫瓧姣嶈瘑鍒繚瀹堝簭鍒椼€?/strong>
鍩哄簭鎼滃闂
鎵€璋?鍩哄簭鎼滃闂(Motif finding) 鏄寚鍦ㄧ敓鐗╀綋DNA搴忓垪褰撲腑瀵绘壘鏈夐噸瑕佸姛鑳界殑鍩哄洜鐗囨锛岃繖浜涚墖娈靛湪鐢熺墿鐨勬紨鍖栬繃绋嬩腑琚繚鐣欎簡涓嬫潵锛岀敓鐗╀綋涓殑纰卞熀瀵归珮杈炬暟浜夸釜锛屽浣曞湪浼楀搴忓垪涓壘鍑轰竴娈甸噸瑕佷笖鏈夋剰涔夌殑鍩哄洜搴忓垪锛屼究鎴愪负浜嗕竴涓噸瑕佺殑闂銆?/p>
鍦ㄨ绠椾笂锛?em>Motif finding鍙互瀹氫箟涓猴細
缁欏畾T 鏉¢暱搴︿负 N 鐨勫簭鍒楋紝鎵惧埌鍑虹幇鍦ㄦ瘡T涓簭鍒椾腑鐨勯暱搴︿负L鐨勬渶浣虫ā寮忋€?/p>
涓嶅悓绫诲瀷鐨勫緱鍒嗗彲浠ュ府鍔╂垜浠缁欏畾鐨勬ā寮忚繘琛岃瘎鍒嗭紝骞堕€夋嫨鏈€浣崇殑妯″紡銆?/p>
瀵逛簬涓嬮潰鐨勪緥瀛愶紝鎴戜滑灏嗚€冭檻 T = 5銆丯 = 100 鍜?L = 8銆備篃灏辨槸璇达紝鎴戜滑鏈?5 涓暱搴︿负 100 鐨勫簭鍒楋紝鎴戜滑甯屾湜浠庢瘡涓簭鍒椾腑鎵惧埌涓€涓暱搴︿负 8 鐨勫簭鍒楀熀搴忥紝浣垮緱杩欎簺鍩哄簭褰兼鐩镐技銆?/p>
Motif 鍊欓€夌粨鏋滆瘎鍒?/h1>
棣栧厛锛屾垜浠皢鐪嬪埌濡備綍涓虹粰瀹氱殑涓€缁?motifs 璇勫垎銆?/p>
涔熷氨鏄锛屾垜浠亣璁炬垜浠粠姣忎釜 T 搴忓垪涓緱鍒颁竴浜涢暱搴︿负 L 鐨勫簭鍒楋紝杩欎簺搴忓垪鏄垜浠殑 motif candidates 銆?/p>
閴翠簬姝わ紝鎴戜滑灏嗙湅鍒板浣曞杩欎簺鍊欓€変富棰樿繘琛岃瘎鍒嗐€傚湪涓嬩竴鑺備腑锛屾垜浠皢鐪嬪埌濡備綍鍦ㄥ師濮嬪簭鍒椾腑鎵惧埌鍩哄簭銆?/p>
鍋囪鎴戜滑鏈変互涓嬩粠涓€缁勫簭鍒椾腑鑾峰緱鐨勫€欓€夊熀搴忥紙5 涓?motif candidates 锛屾瘡涓暱搴︿负 8锛夈€?/p>
|
Alignment |
A G G T A C T T |
|
C C A T A C G T |
|
|
A C G T T A G T |
|
|
A C G T C C A T |
|
|
C C G T A C G G |
5 motifs 锛屾瘡涓暱搴︿负8銆傜孩鑹插瓧姣嶈〃绀哄亸绂诲叡璇?deviation from consensus)
浠庣粰瀹氱殑鍩哄簭涓紝鎴戜滑棣栧厛鏋勫缓涓€涓?profile matrix 锛屽畠鍙槸姣忎釜浣嶇疆姣忎釜鏍歌嫹閰哥⒈鍩虹殑棰戠巼銆?/p>
鍥犳锛屽浜庢垜浠笂闈㈢殑绀轰緥锛屼互涓嬫槸鎴戜滑鐨勯厤缃?profile matrix 鐨勬牱瀛愶紝
|
Profile |
A |
3 0 1 0 3 1 1 0 |
|
G |
2 4 0 0 1 4 0 0 |
|
|
C |
0 1 4 0 0 0 3 1 |
|
|
T |
0 0 0 5 1 0 1 4 |
姣忎竴鍒椾腑鏈€甯歌鐨勫瓧姣嶇粍鍚堝湪涓€璧凤紝涓烘垜浠彁渚涗簡鍏辫瘑瀛楃涓层€傚洜姝わ紝鍦ㄦ湰渚嬩腑锛屾垜浠殑涓€鑷存€у簭鍒楀湪绗竴涓綅缃湁A锛屽洜涓哄畠鍦ㄧ1鍒椾腑鍑虹幇鐨勬鏁版渶澶氾紝绫讳技鍦帮紝鎴戜滑灏嗗湪绗簩涓綅缃湁C锛屼緷姝ょ被鎺紝浠ヨ幏寰椾互涓嬩竴鑷存€у簭鍒楋細ACGTACGT銆?/span>
杩欑粍 motif candidates 鐨勪竴涓彲鑳藉緱鍒嗘槸绐佸彉鎬绘暟锛堝湪涓婇潰鐨勪緥瀛愪腑锛屽嵆2+1+1+0+2+1+2+1=10锛夈€傛垜浠殑鐩爣鏄敖閲忓噺灏戣繖涓垎鏁般€?/p>
鏇村悎閫傜殑鍒嗘暟鏄?profile matrix 鐨勭喌(entropy)銆?/p>
鐔?entropy)鏄 閲忔瘡涓綅缃畧鎭掔▼搴︾殑鎸囨爣銆傞珮鐔垫剰鍛崇潃浣庡畧鎭掞紝浣庣喌鎰忓懗鐫€楂樺畧鎭掋€傝 prob(R锛宭) 涓哄墿浣橰鍑虹幇鍦ㄤ綅缃甽鐨勬鐜囷紝鍗?prob(R, l) = count(R, l) / T 銆?/p>
渚嬪锛屽浜庣粰瀹氱殑涓婅堪 profile matrix 锛?prob锛圕锛?锛?2/5 銆?/p>
鎴戜滑灏嗗垎鍒祴閲忔瘡涓綅缃殑鐔点€傚叾瀹氫箟濡備笅锛?/p>
prob (R,l) log prob (R,l)
涔熷氨鏄锛屾垜浠鎵€鏈変綅缃殑鎵€鏈夋畫鐣欑墿鐨?prob(R,l)log(prob(R,l)) 姹傚拰銆?/p>
瀵绘壘鍩哄簭锛堥殢鏈虹畻娉曪級
鎴戜滑灏嗕娇鐢ㄤ竴绉嶉殢鏈鸿凯浠g畻娉曟潵瀵绘壘 motifs 銆傛垜浠娇鐢ㄩ殢鏈鸿凯浠g畻娉曪紝鍥犱负灏濊瘯 profile matrix鐨勬墍鏈夊彲鑳藉€兼槸涓嶅彲琛岀殑銆?/p>
褰撹瘎鍒嗗嚱鏁颁粎浣跨敤涓€鑷存€у簭鍒楄€屼笉鏄?profile matrix 鏃讹紝鏈夋洿蹇殑绠楁硶锛屼絾鐔靛嚱鏁版槸涓€涓瘮绾补鍩轰簬涓€鑷存€у簭鍒楃殑璇勫垎鍑芥暟鏇村ソ鐨勮瘎鍒嗗嚱鏁帮紝瀹冧笉鑰冭檻涓€鑷存€у簭鍒楃殑姣忎釜浣嶇疆鐨勪繚瀹堟€с€?/p>
- 鍦ㄦ瘡涓粰瀹氬簭鍒椾腑闅忔満閫夋嫨鍒濆 motifs 銆傝繖缁欎簡鎴戜滑涓€缁勫€欓€?motifs M銆?/li>
- 璁$畻 motifs 鍊欓€塎鐨?profile matrix P锛堝鍓嶄竴鑺傛墍杩帮級銆?/li>
- 瀵逛簬姣忎釜搴忓垪 S锛岀粰瀹氶厤缃?profile matrix P锛屾壘鍒版渶浣崇殑 motifs 锛堜笅闈㈡湁鏇磋缁嗙殑鎻忚堪锛夈€傝繖涓烘垜浠彁渚涗簡涓€缁勬柊鐨?motifs 鍊欓€?M銆?/li>
- 濡傛灉鏂扮殑鍊欓€?motifs 闆嗕笌鍓嶄竴缁勪笉鍚岋紝鍒欒繑鍥炲埌 2銆傚惁鍒欏仠姝€?/li>
瀵逛簬绗?3 姝ワ紝鎴戜滑鍙互瀵圭粰瀹氱殑閰嶇疆 profile matrix P 鐨勬瘡涓?L-mer 杩涜璇勫垎銆傝瘎鍒嗗涓嬶細
log prob (R,l)
鍏朵腑 sum 鍦?motifs 鐨勬畫鍩轰笂锛岃€?prob 鐢辫疆寤撶煩闃靛畾涔夈€備緥濡傦紝瀵逛簬搴忓垪 ACAGACAT锛屽垎鏁板皢涓?-[log(prob(A, 1)) + log(prob(C, 2)) + log(prob(A, 3)) + … + log( 姒傜巼锛圱锛?锛夛級]銆?/p>
鍏朵腑 sum 瓒呰繃 motifs 鐨勬畫鍩猴紝 prob 鐢?profile matrix 瀹氫箟銆?/p>
渚嬪锛屽浜庡簭鍒?span style=”font-size: 15px;”>ACAGACAT锛屽垎鏁板皢涓?/p>
[log(prob(A,1)) + log(prob(C,2)) + log(prob(A,3)) + 鈥?+ log(prob(T,8))]
闅忔満绠楁硶鏀舵暃閫熷害闈炲父蹇€傜敱浜庢垜浠殑鏈€缁堢粨鏋滃彇鍐充簬寮€濮嬫椂杩涜鐨勯殢鏈哄垵濮嬪寲锛屽洜姝ゆ垜浠€氬父澶氭杩愯鏁翠釜杩囩▼锛屽苟浠庢墍鏈夎繍琛屼腑閫夋嫨鐔垫渶浣庣殑motif闆?/p>
鍚夊竷鏂噰鏍风畻娉?/h1>
瀵逛笂杩伴殢鏈虹畻娉曠◢浣滀慨鏀癸紝浣胯绠楁硶鐨勬€ц兘鏇村姞鍑鸿壊銆傝繖涓増鏈绉颁负 Gibb's Sampler 绠楁硶銆?/p>
鎵ц Gibb 閲囨牱鐨勭▼搴忥紙涓庝互鍓嶇畻娉曠殑涓昏鍙樺寲浠ョ矖浣撴樉绀猴級锛?/p>
瀵逛笂杩伴殢鏈虹畻娉曠◢浣滀慨鏀癸紝浣胯绠楁硶鐨勬€ц兘鏇村姞鍑鸿壊銆傝繖涓増鏈绉颁负 鍚夊竷鏂噰鏍风畻娉?Gibb鈥檚 Sampling Algorithm)
鎵ц鍚夊竷鏂噰鏍风殑绋嬪簭锛堜笌浠ュ墠绠楁硶鐨勪富瑕佸彉鍖栦互绮椾綋鏄剧ず锛夛細
- 闅忔満閫夋嫨姣忎釜缁欏畾鐨?T 搴忓垪鐨勮捣濮嬩綅缃拰鍩哄簭L-mers銆?/li>
- T 搴忓垪涓?strong>闅忔満閫夋嫨涓€涓?/strong>銆?/li>
- 涓哄墿浣欑殑 T-1 搴忓垪鍒涘缓 profile matrix 銆?/li>
- 瀵逛簬绉婚櫎搴忓垪涓殑姣忎釜浣嶇疆锛屾牴鎹?profile matrix 璁$畻姣忎釜 L-mer 鐨勫垎鏁般€?/li>
- 浠庤搴忓垪涓彇鏍疯浣跨敤鐨勬柊 L-mer 銆傛瘡涓?L-mer 鐨勯噰鏍锋潈閲嶄负 escore 銆傦紙鎴戜滑浣跨敤 escore 锛屽洜涓哄垎鏁版槸瀵规暟姒傜巼(log probability)锛夈€?/li>
- 浠庣2姝ュ紑濮嬮噸澶嶏紝鐩村埌閰嶇疆 profile matrix 鐨勫緱鍒嗗仠姝㈡彁楂樸€?/li>
Gibb閲囨牱绠楁硶姣斿墠涓€绉嶇畻娉曟洿濂斤紝鍥犱负瀹冩洿绋冲畾銆傛敹鏁涢渶瑕佹洿闀跨殑鏃堕棿锛屼絾鎴戜滑涓嶅繀澶氭杩愯瀹冿紝瀹冨嚑涔庢€绘槸缁欏嚭鐩稿悓鐨勭粨鏋溿€?/p>
鍩哄簭(Motif)銆佷竴鑷存€у簭鍒椾笌瀵绘壘鍩哄簭鐨勭畻娉?鍘熼棿闅斿簭鍒椾复杩戝熀搴?/strong>
添加微信免费咨询