SEQ12

February 7, 2016

create a directory for the files (mkdir), use the naming scheme type_of_sequencer_year_month_day_SEQ##: michelles 2016-02-07 14:28:45 sequencing $ mkdir hiseq_2015_12_18_SEQ12
michelles 2016-02-07 14:31:48 sequencing $ mkdir hiseq_2015_12_18_SEQ13
Receive files from sequencer
- https://htseq.princeton.edu/cgi-bin/login.pl?redirect_url=https://htseq.princeton.edu/cgi-bin/dashboard.pl
- Click on the sequencing run of interest in the box on the left that says “Recently Entered Samples"
- In the box titled Sample Provenance, click on the link following "This library was utilized within the following output(s):” - repeat for each lane
- Right click on the first read and “copy link"
- In amphiprion, in the directory you made in the previous step, type:
  - curl -L -O and paste the link, it should look like this
    - michelles 2016-02-07 16:00:37 ~ $ curl -L -O https://htseq.princeton.edu/cgi-bin/assay/oriFiles.pl?Download=download&assay_id=1895&file=SEQ12_160205_SN387_0802_AHJL3TBCXX_s_1_1_reads_passed_filter.fastq.gz&filename=clownfish-ddradseq-seq12-for-158-cycles-hjl3tbcxx_1_read_1_passed_filter.fastq.gz
  - Didn’t work, went into the physical location of amphiprion and downloaded via Firefox
- Repeat for each read within this lane, then go back and repeat for each lane
Count raw reads
- michelles 2016-02-07 16:02:18 hiseq_2015_12_18_SEQ12 $ zcat clownfish-ddradseq-seq12-for-158-cycles-hjl3tbcxx_1_read_1_passed_filter.fastq.gz | wc -l | awk '{print$1/4}'
- 98179108
- michelles 2016-02-07 16:06:19 hiseq_2015_12_18_SEQ12 $ zcat clownfish-ddradseq-seq12-for-158-cycles-hjl3tbcxx_2_read_1_passed_filter.fastq.gz | wc -l | awk '{print$1/4}'
- 94793182
Update where files are saved on amphiprion in sample_data file, Sequencing sheet, amphiprion folder column
Make a working directory - make separate pool directories to keep the process radtags output separate
- michelles 2016-02-07 16:14:48 02-apcl-ddocent $ mkdir 12seq
- $ cd 12seq/
  $ mkdir bcsplit Pool1 Pool2 Pool3 Pool4 logs
  $ cd bcsplit
  $ mkdir lane1 lane2
In your logs directory, create an index file that is the Pool name tab separated from the index used on that pool. The easiest way to do this is copy and paste from google sheets into a nano document: In the sample_data file, on the Names tab, type the pool numbers into the Pool ID column in the format below. The spreadsheet will look up the proper indexes for you. Then copy and paste into a blank nano document, save as index-seq##
- P053 ACTTGA
  
  P054 GATCAG
  
  P055 GGCTAC
  
  P056 CTTGTA
$ mkdir logs
$ cd logs/
$ nano
michelles 2016-02-07 16:19:08 logs $ cd ..
michelles 2016-02-07 16:19:24 bcsplit $ pwd
/local/home/michelles/02-apcl-ddocent/12seq/bcsplit
michelles 2016-02-07 16:19:28 bcsplit $ mv logs ..
Create a names file with the sample name tab separated from the barcode assigned to that sample. The easiest way to make a names file is to copy and paste from google sheets. Copy the ligation ID’s from the pool and paste them into the names tab, copy and paste the result into a nano document in the logs directory.
Create a barcodes file in your logs directory: from the sample_data file, highlight the barcodes column only on the barcodes sheet and paste into nano, do not hit enter after the final barcode, save as “barcodes”.
Run barcode splitter on 1st lane with nohup - notice the only difference between the filenames is in red below.

michelles 2016-02-07 16:27:03 lane1 $ nohup barcode_splitter.py --bcfile ../../logs/index_seq12 --idxread 2 --suffix .fastq.gz /local/shared/pinsky_lab/sequencing/hiseq_2015_12_18_SEQ12/clownfish-ddradseq-seq12-for-158-cycles-hjl3tbcxx_1_read_1_passed_filter.fastq.gz /local/shared/pinsky_lab/sequencing/hiseq_2015_12_18_SEQ12/clownfish-ddradseq-seq12-for-158-cycles-hjl3tbcxx_1_read_2_index_read_passed_filter.fastq.gz &
[1] 26418

nohup: failed to run command `barcode_splitter.py': No such file or directory

not sure why it isn’t seeing the script, it is in my path but for some reason there is an error, think the tar is causing the trouble, delete tar? NO, move to different location

michelles 2016-02-07 16:42:34 lane1 $ nohup ~/14_programs/paired_sequence_utils/barcode_splitter.py --bcfile ../../logs/index_seq12 --idxread 2 --suffix .fastq.gz /local/shared/pinsky_lab/sequencing/hiseq_2015_12_18_SEQ12/clownfish-ddradseq-seq12-for-158-cycles-hjl3tbcxx_1_read_1_passed_filter.fastq.gz /local/shared/pinsky_lab/sequencing/hiseq_2015_12_18_SEQ12/clownfish-ddradseq-seq12-for-158-cycles-hjl3tbcxx_1_read_2_index_read_passed_filter.fastq.gz

nohup: ignoring input and appending output to `nohup.out'

Run barcode splitter on 2nd lane with nohup
michelles 2016-02-07 16:44:52 ~ $ nohup ~/14_programs/paired_sequence_utils/barcode_splitter.py --bcfile ../../logs/index_seq12 --idxread 2 --suffix .fastq.gz /local/shared/pinsky_lab/sequencing/hiseq_2015_12_18_SEQ12/clownfish-ddradseq-seq12-for-158-cycles-hjl3tbcxx_2_read_1_passed_filter.fastq.gz /local/shared/pinsky_lab/sequencing/hiseq_2015_12_18_SEQ12/clownfish-ddradseq-seq12-for-158-cycles-hjl3tbcxx_2_read_2_index_read_passed_filter.fastq.gz
nohup: ignoring input and appending output to `nohup.out'

February 11, 2016

Cat the 2 lanes into one file for process radtags
- michelles 2016-02-11 13:41:48 bcsplit $ cat lane2/P053-read-1.fastq.gz lane1/P053-read-1.fastq.gz > ../Pool1/P053.fastq.gz
- $ cat lane2/P054-read-1.fastq.gz lane1/P054-read-1.fastq.gz > ../Pool2/P054.fastq.gz
- $ cat lane2/P055-read-1.fastq.gz lane1/P055-read-1.fastq.gz > ../Pool3/P055.fastq.gz
- $ cat lane2/P054-read-1.fastq.gz lane1/P054-read-1.fastq.gz > ../Pool2/P054.fastq.gz

February 13, 2016

Run process rad tags script after using nano to adjust for Pool
- /local/home/michelles/02-apcl-ddocent/12seq/scripts
- michelles 2016-02-13 09:22:20scripts$ nano 53process.sh
- #!/bin/bash
- process_radtags -b ../logs/barcodes -c -q --renz_1 pstI --renz_2 mluCI \
- -i gzfastq --adapter_1 ACACTCTTTCCCTACACGACGCTCTTCCGATCT \
- -f ../Pool1/P053.fastq.gz -o ./
- mv process_radtags.log ../logs/53process.out
- cat ./logs/53process.out | mail -s "process 53 complete" michelle.stuart@rutgers.edu
- michelles 2016-02-13 09:18:13 Pool1 $ nohup ../scripts/53process.sh
- michelles 2016-02-13 09:20:30 Pool2 $ nohup ../scripts/54process.sh
- michelles 2016-02-13 09:21:09 Pool3 $ nohup ../scripts/55process.sh
- michelles 2016-02-13 09:22:19 Pool4 $ nohup ../scripts/56process.sh
Rename the samples for dDocent
- michelles 2016-02-13 14:10:24 Pool1 $ sh rename.for.dDocent_se_gz ../logs/names-53
- michelles 2016-02-13 14:11:16 Pool2 $ sh rename.for.dDocent_se_gz ../logs/names-54
- michelles 2016-02-13 14:11:30 Pool3 $ sh rename.for.dDocent_se_gz ../logs/names-55
- michelles 2016-02-13 14:11:41 Pool4 $ sh rename.for.dDocent_se_gz ../logs/names-56
Move the named samples into the samples directory or if working in dDocent, move them to the working directory
- michelles 2016-02-13 14:17:5012seq$ mv Pool4/A* ./samples/
In a folder containing only the new samples *.F.fq.gz
copy in reference.fasta from original dataset
- michelles 2016-02-13 14:19:45 12seq $ cp ~/02-apcl-ddocent/jonsfiles/reference.fasta samples/

February 15, 2016

Type dDocent on the command line of your working directory

$dDocent

Variables used in dDocent Run at Mon Feb 15 15:35:06 EST 2016

Number of Processors

Maximum Memory

Trimming

yes

Assembly?

Type_of_Assembly

Clustering_Similarity%

Mapping_Reads?

yes

Mapping_Match_Value

Mapping_MisMatch_Value

Mapping_GapOpen_Penalty

Calling_SNPs?

michelle.stuart@rutgers.edu

February 16, 2016

When dDocent has finished, the *.F.fq.gz, *.R1.fq.gz, *-RG.bam, and *-RG.bam.bai files to the main analysis folder

michelles 2016-02-16 07:41:15 samples $ ln -s APCL_1* ~/02-apcl-ddocent/APCL_analysis/15-02-2016/

Change directories to the main analysis folder and call SNPS:

Type

dDocent - for the rest of the combined SEQ run, see dDocent February 16, 2016

P053	ACTTGA
P054	GATCAG
P055	GGCTAC
P056	CTTGTA