Posts

Showing posts from October, 2017

Quick notes on barcode distances

Image
We briefly discussed the distance of two random barcodes. Conclusion: A barcode of 15 nt is not long enough and a barcode of 20nt should be enough and we can very safely combine barcodes within 2nt. Below are some quick calculations for the expected distances. Assuming p=25% of each base pair for barcode with length [len], the distance [k] follows a binomial distribution. Therefore, mean distance, mean = n*p standard deviation of distance, sd = sqrt(n*p*(1-p)) for a particular distance k=i Probability(dist = k) = len!/(len-i)!/i! * (1-p)^i * p^(len-i) See below for a summary of probabilities for expected distances when we have different length of barcodes varying from 14 to 22, Number of pairwise comparisons Total = N*(N-1)/2, N is the total number of barcodes (let's use N=1 million) use Probability times Total, we have the number of barcode pairs following the following counts In reality, because we have another barcode for TS, we have ~100,000 ...

Quick notes about PCR amplification for Barcode sequencing

Image
Feel free to share the blog and discuss^_^ Several guidelines Guideline for a general reference: 2014-Measuring the activity of protein variants on a large scale using deep mutational scanning Having too many PCR amplification cycles may not be very good, because 1) Error increases linearly with the number of cycles under usual conditions, (but the product would increase exponentially). 2) Over amplification towards the end of PCR may have reduced accuracy. About the bubble product Overamplification of PCR may cause bubble product to form, resulting in higher error rate but won't cause problems for sequencing (such as clustering, because denatured libraries have complete adapters on either side for sequence). Figure 1. Illumina bubble product example For actual amplicon sequencing, the secondary peak will be closer due to smaller bubbles. Figure 2. My previous amplicon sequencing bubble product. The left and right ones are the lower and upper...

Sequencing primer design for barcode sequencing

Image
General rules and other notes: 1. Why do we need to add random N for the forward primer 5-7 cycles for R1 is used for both R1 and R2 to locate the cluster and color matrix during sequencing. May need to use less dense clusters. Transitions from high to low diversity sometimes matter for getting a high sequence quality, as told by an Illumina scientist. 2. Spike-in guideline The newer software allows us to forgo costly control lanes but instead use extra phiX. For amplicons, at least 5-10% is needed to overcome low diversity issue 3. About random Ns/degenerate oligos/mixed base; (iDT Note) When mixed bases are used, the oligo is a mixture of the bases in certain positions. There are two ways of having random Ns, machine mix and hand mix. machine mix: T is up to 30% Hand mix: is about 24% The length limitations for hand mix oligos is and was 100nt. Machine mix will only do 25% each. Machine mix tends to over-represent G slightly and under-represent A. ...