onezerobio: Filtering FASTQ files for unique reads

Thursday, March 6, 2014

Filtering FASTQ files for unique reads

Filtering for duplicate reads in fastq files may be important if your application requires considering unique entries for counting etc.

Brent Pederson wrote a very quick script utilizing Bloom filters for this purpose (read more at : http://hackmap.blogspot.sg/2010/10/bloom-filter-ing-repeated-reads.html). The installation process might not be clear for those not familiar with code, so I'll try and explain the process step-by-step here.

To run the fastq_unique.py script, you'ld need three things:

Perl module Bloom Faster

either install through cpan or manual download

Python module nose (pybloomfaster tests)

installation directions on the nose page

Brent's wrapper pybloomfaster

download the master zip
```
sudo python setup.py install
```
```
 
```
```
 
```

onezerobio

Thursday, March 6, 2014

Filtering FASTQ files for unique reads

No comments:

Post a Comment