SEQ13

February 14, 2016

Processing Sequence files using dDocent

create a directory for the files
- michelles 2016-02-07 14:31:48sequencing$mkdir hiseq_2015_12_18_SEQ13
Receive files from sequencer
- https://htseq.princeton.edu/cgi-bin/login.pl?redirect_url=https://htseq.princeton.edu/cgi-bin/dashboard.pl
  - mrs349, usual pw
- Used the bulk download option
Once the files are done downloading, move them out of the temp folders and delete the temp folders

Count raw reads
- $ zcat clownfish-ddradseq-seq13-for-158-cycles-hjylybcxx_1_read_1_passed_filter.fastq.gz | wc -l | awk '{print$1/4}'
- 115318774
- $ zcat clownfish-ddradseq-seq13-for-158-cycles-hjylybcxx_2_read_1_passed_filter.fastq.gz | wc -l | awk '{print$1/4}'
- 112086716
Update where files are saved on amphiprion in sample_data file, Sequencing sheet, amphiprion folder column
Make a working directory - make separate pool directories to keep the process radtags output separate
- michelles 2016-02-14 19:22:01 02-apcl-ddocent $ mkdir 13seq
  michelles 2016-02-14 19:22:15 12seq $ cd ../13seq/
  michelles 2016-02-14 19:22:19 13seq $ mkdir bcsplit pool1 pool2 pool3 pool4 logs samples scripts
  michelles 2016-02-14 19:22:45 13seq $ cd bcsplit/
  michelles 2016-02-14 19:22:48 bcsplit $ mkdir 1lane 2lane
In your logs directory, create an index file
P057 ACTTGA

P058 GATCAG

P059 GGCTAC

P060 CTTGTA
Create names files with the sample name tab separated from the barcode assigned to that sample.
Create a barcodes file
Run barcode splitter with nohup
- 1lane $ nohup ~/14_programs/paired_sequence_utils/barcode_splitter.py --bcfile ../../logs/index-13seq --idxread 2 --suffix .fastq.gz /local/shared/pinsky_lab/sequencing/hiseq_2015_12_18_SEQ13/clownfish-ddradseq-seq13-for-158-cycles-hjylybcxx_1_read_1_passed_filter.fastq.gz /local/shared/pinsky_lab/sequencing/hiseq_2015_12_18_SEQ13/clownfish-ddradseq-seq13-for-158-cycles-hjylybcxx_1_read_2_index_read_passed_filter.fastq.gz
- 2lane$ nohup ~/14_programs/paired_sequence_utils/barcode_splitter.py --bcfile ../../logs/index-13seq --idxread 2 --suffix .fastq.gz /local/shared/pinsky_lab/sequencing/hiseq_2015_12_18_SEQ13/clownfish-ddradseq-seq13-for-158-cycles-hjylybcxx_2_read_1_passed_filter.fastq.gz /local/shared/pinsky_lab/sequencing/hiseq_2015_12_18_SEQ13/clownfish-ddradseq-seq13-for-158-cycles-hjylybcxx_2_read_2_index_read_passed_filter.fastq.gz

February 15, 2016

Cat the 2 lanes into one file for process radtags
- $ cat lane2/P057-read-1.fastq.gz lane1/P057-read-1.fastq.gz > ../pool1/P057.fastq.gz
- repeat for all
Run process rad tags script after using nano to adjust for Pool
- /local/home/michelles/02-apcl-ddocent/12seq/scripts
- michelles 2016-02-13 09:22:20scripts$ nano 53process.sh
- #!/bin/bash
- process_radtags -b ../logs/barcodes -c -q --renz_1 pstI --renz_2 mluCI \
- -i gzfastq --adapter_1 ACACTCTTTCCCTACACGACGCTCTTCCGATCT \
- -f ../Pool1/P053.fastq.gz -o ./
- mv process_radtags.log ../logs/53process.out
- cat ./logs/53process.out | mail -s "process 53 complete" michelle.stuart@rutgers.edu

- michelles 2016-02-15 09:48:52 pool1 $ nohup ../scripts/57process.sh - finished at 11:17am
- michelles 2016-02-15 09:49:24 pool2 $ nohup ../scripts/58process.sh - finished at 11:07am
- michelles 2016-02-15 09:49:46 pool3 $ nohup ../scripts/59process.sh - finished at 10:57am
- michelles 2016-02-15 09:50:06 pool4 $ nohup ../scripts/60process.sh - finished at 11:13am
Rename the samples for dDocent
- michelles 2016-02-15 10:58:47 pool3 $ sh rename.for.dDocent_se_gz ../logs/names-59
- michelles 2016-02-15 11:18:35 pool2 $ sh rename.for.dDocent_se_gz ../logs/names-58
- michelles 2016-02-15 11:20:22 pool4 $ sh rename.for.dDocent_se_gz ../logs/names-60
- michelles 2016-02-15 11:20:59 pool1 $ sh rename.for.dDocent_se_gz ../logs/names-57

Move the named samples into the samples directory or if working in dDocent, move them to the working directory
- michelles 2016-02-15 10:59:59 pool3 $ mv A* ../samples/
- michelles 2016-02-15 11:18:59 pool2 $ mv A* ../samples/
- michelles 2016-02-15 11:20:32 pool4 $ mv A* ../samples/
- michelles 2016-02-15 11:21:04 pool1 $ mv A* ../samples/
In a folder containing only the new samples *.F.fq.gz, copy in reference.fasta from original dataset
- michelles 2016-02-15 11:21:42 samples $ cp ~/02-apcl-ddocent/jonsfiles/reference.fasta ./
Type dDocent on the command line of your working directory

$dDocent

Variables used in dDocent Run at Mon Feb 15 11:22:55 EST 2016

Number of Processors

Maximum Memory

Trimming

yes

Assembly?

Type_of_Assembly

Clustering_Similarity%

Mapping_Reads?

yes

Mapping_Match_Value

Mapping_MisMatch_Value

Mapping_GapOpen_Penalty

Calling_SNPs?

michelle.stuart@rutgers.edu

Finished at 3:28pm

When dDocent has finished, copy (apparently symlinking doesn’t work on the -RG.bam files) the *.F.fq.gz, *.R1.fq.gz, *-RG.bam, and *-RG.bam.bai files to the main analysis folder

$ ln -s APCL_1* ~/02-apcl-ddocent/APCL_analysis/15-02-2016/

When I did this, I noticed that seq12 had not finished dDocent - re-ran seq12.

dDocent February 16, 2016

June 30, 2016

michelles 2016-06-30 12:04:29 13seq $ cp ~/13-stacks_analysis_scripts/readprocesslog.py ./scripts/

michelles 2016-06-30 12:04:47 13seq $ ~/13-stacks_analysis_scripts/readprocesslog.py

Enter the path and file name of the log, i.e. ./logs/16process.out: ./logs/57process.out

P057	ACTTGA
P058	GATCAG
P059	GGCTAC
P060	CTTGTA