2010年10月17日 星期日

Mini project part 1

Selected topic:
Sequence alignment

Review of Journal article:
Title: Using a mutual information-based site transition network to map the genetic evolution of influenza A/H3N2 virus
Author: Zhen Xia, Gulei Jin, Jun Zhu and Ruhong Zhou
Bioinformatics, Vol. 25, no.18, 2009, pages 2309-2317

Mapping of antigenic and genetic evolution pathways of Influenza A is important in vaccine development and prevention of the virus outbreak. Study of 4000 A/N3H2 Hemagglutinin (HA) sequences from 1968 to 2008 was done to model the evolutionary path of the virus, and to identify potential mutations in the future. The mutual information method was used to diagnosis the co-occurring mutations and correlations between each amino acid sites of N3H2 HA sequences, and to form a site transition network (STN). The effectiveness and accuracy of STN is compatible with phylogenetic tree and antigenic maps, with reduced cost.

The study indicates 63 out of 312 sites of the HA sequences have high interactions among themselves. STN demonstrates these 63 sites and their clear trajectories on modeling the antigenic transition during the evolution of Influenza. The study shows that the probability for a site to mutate to a new amino acid that did not happen before is low, and mutations have strong preference of mutation sites. Study of historical mutations allows the author to predict the future mutations, which possibly produce the next antigen change and the resultant amino acid of the new strain. With this information, predictions were made for each year of 1999-2008.  The accuracy of the prediction is approximately 70% in average. Clustering was performed to reveals information on different simultaneous multi-site mutation in antigenic drifts. Prediction for year 2009-2010 was made. It requires time to prove its accuracy. In this study, locations, seasons, new vaccine and other pressure have not taken into accounts. Future study should be made with consideration of these pressures.


Research question and objectives:
Is one vaccine enough for all human races of the world?

Use sequence alignment to analyze 2 sets of H3N2 HA sequences of 2 different human races/ continents and see if there is any difference in them.
                                                                              

2010年10月12日 星期二

Lecture 2

When comparing 2 (pairwise) or more (multiple) DNA or protein sequences by searching for a  series of individual characters or character patterns that are in the same order in the sequences, it is called Sequence alignment.

In order to produce Optimal alignment, gaps are used so that as many identical or similar characters as possible are into vertical register.

It is a powerful tool when exploring functional, structural and evolutionary data of  DNA or protein.

Global vs local

Global: comparing the whole length of the sequence up to both ends.Introduce gaps to matching as many characters as possible

Local: concentrates on the area(es) of the sequences where the longest matches are found.

Three Principle methods:
Dot Matrix analysis
DP algorithm
Word or K-tuple method

Dot matrix:
uses graph to display possible alignments. The possible alignments(s) will be shoen on the graph as a diagonal line running from top left to buttom right and vice versa.

Advantage: Shows the direct and inverted repeats easily
                  Shows the presence of insertion and deletions.
Disadvantage: Do not show the actual alignment.

Dot matrix:

2 approaches: Basic and filtering

Basic
Filtering
Sequence A is listed on the top of matrix Sequence B is listed on the left side of Starting with the first character of B, compare which every single characters of A, then repeat with the second character, third character and so forth
Sliding window can be used
2 sequences are compared at the same time
A dot is printed on the graph only if a certain minimal number of matches (stringency) occur when comparing these windows( ie window size)



Direct repeats show as diagonal lines running for top left to bottom right.
Invert repeats show as diagonal lines running from bottom right to top left

NW algorithm: (for global sequence alignment)
Scoring system is important for optimal alignment. 3 scores: Match, Mismatch and gap
       
Score matrix and backtracking:

Need to know:
Match=?          MISMATCH=?         GAP=?

Calculate the score of 3 directions: ,and
Put down the highest score in the box