Tuesday, 31 August 2010

Entire log-book

This is the entire log-book (a diary you can say) that I made during the time doing internship in Sanger. Although it was not in oganised order, the log was short enough for me to keep reminding me what is going on. Hopefully, I can understand it after one year.

Fri Jul 9 14:35:38 BST 2010
RESOLVED:
- Completed the implementation of Dynamic Programming.
- Finding the optimal path: the pair of edges should be put on.
ISSUES:
1) Output a new ARG in the same format as of the input: On the way of path tracing, if we find a break of an edge, we split the current node into two by using a recombination. The difficulty here is adding graph nodes without interference to the existing data structure.

Mon Jul 12 11:15:34 BST 2010
ISSUES:
2) I doubt that we can put on one of the edges of a recombination (i.e. If we have: re x y z t, then I wonder if we can put on either (x, y) or (x, z) because we cannot make any operation with either y or z. In addtion, after more experiments with real test cases, if we have a recombination x -> y, z then when (x, y) is a chosen edge, then y has 2 parents and has x as the only child.

Tue Jul 13 09:20:48 BST 2010
RESOLVED:
- Removed all edges (x, y) that y is one of x's parents from a coalescense.
- Confirmed with Quang, don't put on an edge of a recombination.
ISSUES:
3) This could lead to an improvement in running time because it means the number of edges are considerably lower, as low as 600 - 700 edges.

Wed Jul 14 11:31:37 BST 2010
RESOLVED:
- About yesterday's issue: sacrifice running time for the simplicity, just ignore the edges whose realStart has 2 parents. We still iterate through the recombination-edges, but that's require less work than removing them from the whole tree edge list.
- The new ARG produced seems to be okay, however, the order of events produced is not sorted yet.
ISSUES:
4) Does edge compression lead to a correct result? Say, we have a chain "x - y - z - t" so that they all have the same ending locations, however, the mutations happen at different positions at x-y, y-z and z-t. So, in the result, they are considered as one edge but the actual position to be put at are different.

Thu Jul 15 09:49:23 BST 2010
RESOLVED:
- Corrected and verified the order of the new events added to the orginial ARG. The output is put in the third argument of QCALL.1.0.
ISSUES:
- None is observed.

Mon Jul 19 16:03:40 BST 2010
RESOLVED:
- Some small bugs in methods init() and printNewARG() to print out correctly the revised graph. The ARG structure is also more carefully checked by determining the number of roots (of course, we expect 1 root only).
ISSUES:
5) Required to output the strings of the two added haploids. Will require to know which letters in the chain were added.

Tue Jul 20 10:46:16 BST 2010
ISSUES:
6) No reduction of edges anymore, ISSUE 4) happened in test case Long5. Adviced Quang and received aggreement.
7) ISSUE 2) is to be RESOLVED, Quang proposed a solution to connect a recombination.

Thu Jul 22 16:05:42 BST 2010
RESOLVED:
- ISSUES 6 and 7, 'nice' (but not quite correct yet) result produced.
ISSUE:
8) The problem now is in the case we add to the same edge.

Wed Jul 28 09:02:42 BST 2010
RESOLVED:
- All previous ISSUES have been resolved.
ISSUED:
- Testing with real data.

Thu Jul 29 11:23:10 BST 2010
ISSUED:
- The actual test contains some misses (ie. 0 instead of ACGT), so the program need to adjust to cope with it.
- Also, all the ARGs are arranged immediately below the data sequence. So, the program also need to change to deal with it.
RESOLVED:
- Resolved today's second issue.

Mon Aug 2 09:25:06 BST 2010
ISSUED:
- The actual test contains some 0, some capital symbols that could swap with the other symbol of the pair.
SOLUTIONS:
- fillHap has been removed, it is not flexible to maintain the whole string for each node. Instead, we traverse to determine the edges and assign the symbol for the nodes from the leaves up to the root.

Wed Aug 4 15:11:44 BST 2010
RESOLVED:
- fillHap has been successfully removed, traverse has been amended, fix has been created to fix nucleotides from traversals.
- All other methods have been amended as well. Test cases in ARGs.data1 run smoothly and ready for the next steps.

Thu Aug 5 08:59:09 BST 2010
RESOLVED:
- There was a '-1' bug that prevented the program running correctly. It is smoothy now!

Tue Aug 10 10:16:41 BST 2010
- Final stage: Running and testing

Mon Aug 16 10:45:44 BST 2010
- Haven't updated for a while. My program predicts quite badly for person NA12006 (83%), reasonably for NA07000(90%) and a bit higher than expected for NA07056 (99%), NA11892 (99%) and NA12873 (98%)

Wed Aug 18 15:16:59 BST 2010
Have read and understood the use of unix shell programming and Makefile. Will have a major change soon.

Thu Aug 26 13:23:17 BST 2010
Writing report and slides.

Tue Aug 31 09:42:56 BST 2010
Technically, the presentation went well and the work is seemingly promising to be carried to the next year. I'll just need to tidy up the source code a little bit. This is also the final entry of log-book for the Summer internship here at Sanger.

No comments:

Post a Comment