February 14, 2016
- create a directory for the files
- michelles 2016-02-07 14:31:48sequencing$mkdir hiseq_2015_12_18_SEQ13
- Receive files from sequencer
- Once the files are done downloading, move them out of the temp folders and delete the temp folders
- Count raw reads
- $ zcat clownfish-ddradseq-seq13-for-158-cycles-hjylybcxx_1_read_1_passed_filter.fastq.gz | wc -l | awk '{print$1/4}'
- 115318774
- $ zcat clownfish-ddradseq-seq13-for-158-cycles-hjylybcxx_2_read_1_passed_filter.fastq.gz | wc -l | awk '{print$1/4}'
- 112086716
- Update where files are saved on amphiprion in sample_data file, Sequencing sheet, amphiprion folder column
- Make a working directory - make separate pool directories to keep the process radtags output separate
- michelles 2016-02-14 19:22:01 02-apcl-ddocent $ mkdir 13seq
michelles 2016-02-14 19:22:15 12seq $ cd ../13seq/
michelles 2016-02-14 19:22:19 13seq $ mkdir bcsplit pool1 pool2 pool3 pool4 logs samples scripts
michelles 2016-02-14 19:22:45 13seq $ cd bcsplit/
michelles 2016-02-14 19:22:48 bcsplit $ mkdir 1lane 2lane
- In your logs directory, create an index file
-
P057 |
ACTTGA |
P058 |
GATCAG |
P059 |
GGCTAC |
P060 |
CTTGTA |
- Create names files with the sample name tab separated from the barcode assigned to that sample.
- Create a barcodes file
- Run barcode splitter with nohup
- 1lane $ nohup ~/14_programs/paired_sequence_utils/barcode_splitter.py --bcfile ../../logs/index-13seq --idxread 2 --suffix .fastq.gz /local/shared/pinsky_lab/sequencing/hiseq_2015_12_18_SEQ13/clownfish-ddradseq-seq13-for-158-cycles-hjylybcxx_1_read_1_passed_filter.fastq.gz /local/shared/pinsky_lab/sequencing/hiseq_2015_12_18_SEQ13/clownfish-ddradseq-seq13-for-158-cycles-hjylybcxx_1_read_2_index_read_passed_filter.fastq.gz
- 2lane$ nohup ~/14_programs/paired_sequence_utils/barcode_splitter.py --bcfile ../../logs/index-13seq --idxread 2 --suffix .fastq.gz /local/shared/pinsky_lab/sequencing/hiseq_2015_12_18_SEQ13/clownfish-ddradseq-seq13-for-158-cycles-hjylybcxx_2_read_1_passed_filter.fastq.gz /local/shared/pinsky_lab/sequencing/hiseq_2015_12_18_SEQ13/clownfish-ddradseq-seq13-for-158-cycles-hjylybcxx_2_read_2_index_read_passed_filter.fastq.gz
February 15, 2016
- Cat the 2 lanes into one file for process radtags
- $ cat lane2/P057-read-1.fastq.gz lane1/P057-read-1.fastq.gz > ../pool1/P057.fastq.gz
- repeat for all
- Run process rad tags script after using nano to adjust for Pool
- /local/home/michelles/02-apcl-ddocent/12seq/scripts
- michelles 2016-02-13 09:22:20scripts$ nano 53process.sh
- #!/bin/bash
- process_radtags -b ../logs/barcodes -c -q --renz_1 pstI --renz_2 mluCI \
- -i gzfastq --adapter_1 ACACTCTTTCCCTACACGACGCTCTTCCGATCT \
- -f ../Pool1/P053.fastq.gz -o ./
- mv process_radtags.log ../logs/53process.out
- cat ./logs/53process.out | mail -s "process 53 complete" michelle.stuart@rutgers.edu
-
- michelles 2016-02-15 09:48:52 pool1 $ nohup ../scripts/57process.sh - finished at 11:17am
- michelles 2016-02-15 09:49:24 pool2 $ nohup ../scripts/58process.sh - finished at 11:07am
- michelles 2016-02-15 09:49:46 pool3 $ nohup ../scripts/59process.sh - finished at 10:57am
- michelles 2016-02-15 09:50:06 pool4 $ nohup ../scripts/60process.sh - finished at 11:13am
- Rename the samples for dDocent
- michelles 2016-02-15 10:58:47 pool3 $ sh rename.for.dDocent_se_gz ../logs/names-59
- michelles 2016-02-15 11:18:35 pool2 $ sh rename.for.dDocent_se_gz ../logs/names-58
- michelles 2016-02-15 11:20:22 pool4 $ sh rename.for.dDocent_se_gz ../logs/names-60
- michelles 2016-02-15 11:20:59 pool1 $ sh rename.for.dDocent_se_gz ../logs/names-57
- Move the named samples into the samples directory or if working in dDocent, move them to the working directory
- michelles 2016-02-15 10:59:59 pool3 $ mv A* ../samples/
- michelles 2016-02-15 11:18:59 pool2 $ mv A* ../samples/
- michelles 2016-02-15 11:20:32 pool4 $ mv A* ../samples/
- michelles 2016-02-15 11:21:04 pool1 $ mv A* ../samples/
- In a folder containing only the new samples *.F.fq.gz, copy in reference.fasta from original dataset
- michelles 2016-02-15 11:21:42 samples $ cp ~/02-apcl-ddocent/jonsfiles/reference.fasta ./
- Type dDocent on the command line of your working directory
$dDocent
Variables used in dDocent Run at Mon Feb 15 11:22:55 EST 2016
Number of Processors
20
Maximum Memory
0
Trimming
yes
Assembly?
no
Type_of_Assembly
Clustering_Similarity%
Mapping_Reads?
yes
Mapping_Match_Value
1
Mapping_MisMatch_Value
4
Mapping_GapOpen_Penalty
6
Calling_SNPs?
no
Email
michelle.stuart@rutgers.edu
Finished at 3:28pm
When dDocent has finished, copy (apparently symlinking doesn’t work on the -RG.bam files) the *.F.fq.gz, *.R1.fq.gz, *-RG.bam, and *-RG.bam.bai files to the main analysis folder
$ ln -s APCL_1* ~/02-apcl-ddocent/APCL_analysis/15-02-2016/
When I did this, I noticed that seq12 had not finished dDocent - re-ran seq12.
June 30, 2016
michelles 2016-06-30 12:04:29 13seq $ cp ~/13-stacks_analysis_scripts/readprocesslog.py ./scripts/
michelles 2016-06-30 12:04:47 13seq $ ~/13-stacks_analysis_scripts/readprocesslog.py
Enter the path and file name of the log, i.e. ./logs/16process.out: ./logs/57process.out