2007年6月15日 星期五

[工作誌]6/11~6/15 Weekly Summary

6/11~6/15

Weekly Summary

In order to solve the problem about gene normalization, I read some references about latent semantic analysis and wrote some computer programs in this week.

Since I have not written program nearly 18 months, it costs me a lot of time to practice how to write a program with good performance.

Besides, I am not familiar with programming environment in Visual Studio 2005.

Therefore, I bought a computer book about visual studio 2005 from Tenlong bookstore(天瓏) to learn how to code in Visual Studio 2005 environment.

Furthermore, I spend 2 hour (8~9am, 17~18) to practices English listening ability via Studio Classroom program every day.

On Monday, I use C++ and php to write a baseline program (vector-space model).
I hope to improve this baseline program with some approaches.

On Wednesday, I presented my idea about using latent semantic indexing in human gene normalization in lab meeting. Professor Hsu suggested an alternative solution about word sense disambiguation.

On Thursday, all of our lab's members attended annual ABC (Advanced Bioinformatics Core) retreat in National Yang-Ming University. In this retreat, each PI (Principal Investigator) introduced their project briefly for new staffs. After the workshop, all ABC members played Wii together.

[語文誌]PI: Principal investigator

A principal investigator (PI) is the lead scientist for a particular well-defined science project, such as an astronomical observing campaign, laboratory study or clinical trial.

In the context of federal funding from agencies such as NASA or the NSF, the PI is the person who takes direct responsibility for completion of a funded project, directing the research and reporting directly to the funding agency. For small projects (which might involve 1-5 people) the PI is typically the person who conceived of the investigation, but for larger projects (such as the construction of scientific spacecraft or observatories) the PI may be selected by a team to obtain the best strategic advantage for the project.

In the context of a clinical trial a PI may be an academic working with grants from NIH or other funding agencies, or may be effectively a contractor for a pharmaceutical company working on testing the safety and efficacy of new medicines.

[語文誌]The state of the art

The state of the art is the highest level of development, as of a device, technique, or scientific field, achieved at a particular time.

The phrase "state of the art" should be hyphenated when it is used as an adjective, e.g.: "This machine is an example of state-of-the-art technology", but not when used as a noun, as in the sentence below.

"The state of the art in this field is mostly related to the X technology".

2007年6月7日 星期四

[工作誌]Vista makes something changed.

Vista makes something changed.

Yesterday, I got a new notebook with Vista operation system from lab (Toshiba M500 with Intel Core Duo CPU T2350 1.86GHZ and 1.5GB memory).

Vista is a new operation system, but some software can't work in this platform.

Therefore, I have to install some new software and learn some new approaches in Vista.

1. An upset thing is that Visual Studio 6.0 can't work in Vista environment. I wrote C/C++ programming language in VS 6.0 before, but it couldn't work in Vista. Therefore, I need to learn how to program in Visual Studio 2005.

2. The Chinese input method (Array 40; 行列40), I learned more than 10 years, cannot operate in Vista environment. Because Chinese typing is important in my daily work, I cannot type Chinese characters without Chinese input method. Therefore, the way I can do is learning new Chinese input method. Besides, an alternative way is typing English in routine work. Maybe it is also a good chance to practice English. For example, I can use English to write blog, to chat with my friends in MSN etc.

2007年6月6日 星期三

[工作誌]BioCreative II: Gene Normalization Task

On 2007/5/15, I finished my duty in the military service and leaved from Army.

On 2007/5/21, I joined Professor Chun-Nan Hsu's research group at Institute of Information Science, Academia Sinica, Taiwan.

Currently, Professor Hsu assigned me a research topic about Gene Normalization.

This topic is a competition task of the BioCreative II. BioCreative II challege consists three tasks:
(1) Gene mention tagging (GM),
(2) Gene normalization (GN) and
(3) Extraction of protein-protein interaction from test (PPI).

Hsu's group proposed two methods in GM task and achieved the second and third highest scores.

Furthmore, they also submitted a method in GN task. However, this method only got rank 14 over 21 participanting groups. Therefore, Professor Hsu hope I can propose some exciting approaches and get high performance (high recall and precision) in GN task.

The goal of the task is required to return the EntrezGene ID corresponding to the human genes and direct gene products appearing in a given MEDLINE abstract.
Given a master list of human EntrezGene identifiers with some common gene and protein names (synonyms) for each identifier in the master list, system is required to return a list of EntrezGene inentifiers and corresponding test excerpts for each human gene or gene product mentioned in the abstract. Since the dictionary for gene synonyms has some noise, it is a difficult task to identify gene name with a correct EntrezGene ID.