Quick notes on barcode distances
We briefly discussed the distance of two random barcodes. Conclusion: A barcode of 15 nt is not long enough and a barcode of 20nt should be enough and we can very safely combine barcodes within 2nt. Below are some quick calculations for the expected distances. Assuming p=25% of each base pair for barcode with length [len], the distance [k] follows a binomial distribution. Therefore, mean distance, mean = n*p standard deviation of distance, sd = sqrt(n*p*(1-p)) for a particular distance k=i Probability(dist = k) = len!/(len-i)!/i! * (1-p)^i * p^(len-i) See below for a summary of probabilities for expected distances when we have different length of barcodes varying from 14 to 22, Number of pairwise comparisons Total = N*(N-1)/2, N is the total number of barcodes (let's use N=1 million) use Probability times Total, we have the number of barcode pairs following the following counts In reality, because we have another barcode for TS, we have ~100,000 ...