5  Assessing the effect of trimming the adaptor sequences

Having compared different approaches of using RNA-seq quantification methods as described above (Chapter 4), we investigated the effect of trimming or no-trimming the adaptor sequences on the performance of predicting fetal sex. For this purpose, we used the 10 million down-sampled reads from the 100 randomly selected samples and ran cutadapt (v2.7 with Python 3.6.8) (11) with the following parameters:

Listing 5.1: cutadapt command
cutadapt -j 32 
         -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC 
         -A AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT 
         -q 20 
         -O 8 
         -m 20

We employed “Salmon (SA mode)” approach for both trimmed and un-trimmed input reads, then compared the number of read quantified on the 42 chrY genes, and the number of false classifications of fetal sex. Again, a sample is predicted as male if at least one read was quantified on any of the 42 protein-coding chrY genes as described before. We found that, regardless of trimming or not, the number of TP, TN, FP, and FN remained the same and the sum of read quantified on the 42 chrY genes also remained very similar (R=0.999, P<2.2x10-16, Pearson’s correlation test with two-sided, as in Section 7.4) , even though there were two female samples with a slightly higher sum of chrY reads when quantified using trimmed input reads (one 12wkGA sample with a total of 1.887 reads from non-trimmed input and 2.886 from trimmed input; one 20wkGA sample with a total of 0.918 read from non-trimmed input and 0.919 read from trimmed input).

In conclusion, we employed “Salmon (SA mode)” without trimming the input raw sequencing reads.