The type of version control system (Git vs SVN vs CVS vs Mercurial) used would likely not have much impact while working within a single lab environment (with one or two coders). I started thinking more about this when moving some of my more "public" code from an SVN-based Unfuddle over to GitHub. Git is more suited when working in a larger group, and then there are the concept or forking/pulling is great. The only thing stopping me from moving wholesale is the pricing option for private repos (Unfuddle is free). I also like that Bioconductor can bridge its SVN repo with Github (so I can share my code easily as well) and I also enjoy the whole user experience of the site.
As for now, I'm on both. I'm thinking of moving my private repos to BitBucket, which supports Git and Mercurial. The ability to work with a local repo (Git) while on the road is very useful.
Some further self-reading : http://stackoverflow.com/questions/871/why-is-git-better-than-subversion
Friday, March 28, 2014
Version control systems
Tuesday, March 18, 2014
Adding the optional IH tag to SAM files
One of the major complaints about the 2 most often used aligners BWA and Bowtie is its failure to report the NH or IH tag.
The IH tag is an indicator of the number of stored alignments in the SAM file that contains the current query (i.e. the read). This is meaningful for multi-mapped reads if you want to know to how many locations the same read has been mapped (eg. assuming your Bowtie parameter "k" has been set to more than 1).
I've written an awk oneliner that will add this tag to your SAM file. What it does is to iterate the file twice, first to tabulate counts, and second to write the extra tag.
The IH tag is an indicator of the number of stored alignments in the SAM file that contains the current query (i.e. the read). This is meaningful for multi-mapped reads if you want to know to how many locations the same read has been mapped (eg. assuming your Bowtie parameter "k" has been set to more than 1).
I've written an awk oneliner that will add this tag to your SAM file. What it does is to iterate the file twice, first to tabulate counts, and second to write the extra tag.
Saturday, March 15, 2014
Fixing svn in RStudio (Mac OS)
After updating Rstudio and R to version 3.0.3, I lost the "svn" option under version control.
Rstudio started with these messages (a clue to fixing the problem!)
Rstudio started with these messages (a clue to fixing the problem!)
During startup - Warning messages: 1: Setting LC_CTYPE failed, using "C" 2: Setting LC_COLLATE failed, using "C" 3: Setting LC_TIME failed, using "C" 4: Setting LC_MESSAGES failed, using "C"The Internationalization of the R.app was causing this problem and a simple
system("defaults write org.R-project.R force.LANG en_US.UTF-8")on the R command line and restarting Rstudio was all that was needed.
Thursday, March 6, 2014
Filtering FASTQ files for unique reads
Filtering for duplicate reads in fastq files may be important if your application requires considering unique entries for counting etc.
Brent Pederson wrote a very quick script utilizing Bloom filters for this purpose (read more at : http://hackmap.blogspot.sg/2010/10/bloom-filter-ing-repeated-reads.html). The installation process might not be clear for those not familiar with code, so I'll try and explain the process step-by-step here.
To run the fastq_unique.py script, you'ld need three things:
Brent Pederson wrote a very quick script utilizing Bloom filters for this purpose (read more at : http://hackmap.blogspot.sg/2010/10/bloom-filter-ing-repeated-reads.html). The installation process might not be clear for those not familiar with code, so I'll try and explain the process step-by-step here.
To run the fastq_unique.py script, you'ld need three things:
- Perl module Bloom Faster
- either install through cpan or manual download
- Python module nose (pybloomfaster tests)
- installation directions on the nose page
- Brent's wrapper pybloomfaster
- download the master zip
sudo python setup.py install
Installing python modules with setuptools or pip
Remember to set your http (and https) proxy!
Running into errors like this:
Or this:
Is simply a matter of setting your http proxy because PYPI redirects to https. Check your environment by:
If it returns empty, nothing has been set. Set them using:
And it should now work.
Running into errors like this:
sudo pip install nose Cannot fetch index base URL http://pypi.python.org/simple/
Or this:
sudo easy_install nose Scanning index of all packages (this may take a while) Reading http://pypi.python.org/simple/ Download error: [Errno -2] Name or service not known -- Some packages may not be found!
Is simply a matter of setting your http proxy because PYPI redirects to https. Check your environment by:
env | grep -i http
If it returns empty, nothing has been set. Set them using:
set http_proxy=http://localhost:8080 set https_proxy=http://localhost:8080
And it should now work.
Monday, March 3, 2014
Designing user interfaces for biological data (and for biologists!)
Here's a nice slide presentation out of the VIZBI conference regarding UI design and considerations.
http://www.slideshare.net/francisrowlanduk/vizbi-2013-ux-design-tutorial
http://www.slideshare.net/francisrowlanduk/vizbi-2013-ux-design-tutorial
Subscribe to:
Posts (Atom)