In order to produce Optimal alignment, gaps are used so that as many identical or similar characters as possible are into vertical register.
It is a powerful tool when exploring functional, structural and evolutionary data of DNA or protein.
Global vs local
Global: comparing the whole length of the sequence up to both ends.Introduce gaps to matching as many characters as possible
Local: concentrates on the area(es) of the sequences where the longest matches are found.
Three Principle methods:
Dot Matrix analysis
DP algorithm
Word or K-tuple method
Dot matrix:
uses graph to display possible alignments. The possible alignments(s) will be shoen on the graph as a diagonal line running from top left to buttom right and vice versa.
Advantage: Shows the direct and inverted repeats easily
Shows the presence of insertion and deletions.
Disadvantage: Do not show the actual alignment.
Dot matrix:
2 approaches: Basic and filtering
Basic | Filtering |
Sequence A is listed on the top of matrix Sequence B is listed on the left side of Starting with the first character of B, compare which every single characters of A, then repeat with the second character, third character and so forth | Sliding window can be used 2 sequences are compared at the same time A dot is printed on the graph only if a certain minimal number of matches (stringency) occur when comparing these windows( ie window size) |
Direct repeats show as diagonal lines running for top left to bottom right.
Invert repeats show as diagonal lines running from bottom right to top left
NW algorithm: (for global sequence alignment)
Scoring system is important for optimal alignment. 3 scores: Match, Mismatch and gap
Score matrix and backtracking:
Need to know:
Match=? MISMATCH=? GAP=?
Calculate the score of 3 directions: ↖,↑and ←
Put down the highest score in the box
沒有留言:
張貼留言