barcode splitter

 1104  barcode_splitter.py --bcfile ../../logs/index-seq15 --idxread 2 --suffix .fastq.gz /local/shared/pinsky_lab/sequencing/hiseq_2016_04_21_SEQ15/Clownfish-ddRADseq-SEQ15-for-158-cycles-HMTNCBCXX_1_Read_1_passed_filter.fastq.gz /local/shared/pinsky_lab/sequencing/hiseq_2016_04_21_SEQ15/Clownfish-ddRADseq-SEQ15-for-158-cycles-HMTNCBCXX_1_Read_2_Index_Read_passed_filter.fastq.gz &
 1105  cd ../lane2/
 1106* barcode_splitter.py --bcfile ../../logs/index-seq15 --idxread 2 --suffix .fastq.gz /local/shared/pinsky_lab/sequencing/hiseqhttp://barcode_splitter.py/_2016_04_21_SEQ15/Clownfish-ddRADseq-SEQ15-for-158-cycles-HMTNCBCXX_2_Read_1_passed_filter.fastq.gz /local/shared/pinsky_lab/sequencing/hiseq_2016_04_21_SEQ15/Clownfish-ddRADseq-SEQ15-for-158-cycles-HMTNCBCXX_2_Read_2_Index_Read_passed_filter.fastq
 1107  top
 1108  cd ../../../
 1109  cd 16seq/bcsplit/lane1/
 1110  barcode_splitter.py --bcfile ../../logs/index-seq15 --idxread 2 --suffix .fastq.gz /local/shared/pinsky_lab/sequencing/hiseq_2016_06_01_SEQ16/Clownfish-ddRADseq-SEQ16-for-158-cycles-HT2T3BCXX_1_Read_1_passed_filter.fastq.gz /local/shared/pinsky_lab/sequencing/hiseq_2016_06_01_SEQ16/Clownfish-ddRADseq-SEQ16-for-158-cycles-HT2T3BCXX_1_Read_2_Index_Read_passed_filter.fastq.gz &
 1111  barcode_splitter.py --bcfile ../../logs/index-seq16 --idxread 2 --suffix .fastq.gz /local/shared/pinsky_lab/sequencing/hiseq_2016_06_01_SEQ16/Clownfish-ddRADseq-SEQ16-for-158-cycles-HT2T3BCXX_1_Read_1_passed_filter.fastq.gz /local/shared/pinsky_lab/sequencing/hiseq_2016_06_01_SEQ16/Clownfish-ddRADseq-SEQ16-for-158-cycles-HT2T3BCXX_1_Read_2_Index_Read_passed_filter.fastq.gz &
 1112  cd ..
 1113  cd lane2/
 1114  barcode_splitter.py --bcfile ../../logs/index-seq16 --idxread 2 --suffix .fastq.gz /local/shared/pinsky_lab/sequencing/hiseq_2016_06_01_SEQ16/Clownfish-ddRADseq-SEQ16-for-158-cycles-HT2T3BCXX_2_Read_1_passed_filter.fastq.gz /local/shared/pinsky_lab/sequencing/hiseq_2016_06_01_SEQ16/Clownfish-ddRADseq-SEQ16-for-158-cycles-HT2T3BCXX_2_Read_2_Index_Read_passed_filter.fastq.gz &

There was no nohup out because nohup isn’t following my path for some reason.

The output of barcode splitter is a log file, the names of all of the pools split into read-1 and read-2 fastq.gz files and unnamed reads that the program was unable to assign to an index.  Read-2 and unnamed read files can be deleted.  

Cat the 2 lanes into one file for process radtags


 Move the process logs into the logs folder

Analyze the sequencing statististics to make sure everything looks like it is on the right track: use the  readprocesslog.py script

Rename the process radtags output to sample names

Trim and map the reads
michelles 2016-07-04 15:39:11 samples $ dDocent
dDocent 2.18

Contact jpuritz@gmail.com with any problems

 
Checking for required software

All required software is installed!
192 individuals are detected. Is this correct? Enter yes or no and press [ENTER]
yes
Proceeding with 192 individuals
dDocent detects 40 processors available on this system.
Please enter the maximum number of processors to use for this analysis.
0
Incorrect. Please enter the number of processing cores on this computer
15
dDocent detects 252G maximum memory available on this system.
Please enter the maximum memory to use for this analysis. The size can be postfixed with
K, M, G, T, P, k, m, g, t, or p which would multiply the size with 1024, 1048576, 1073741824,
1099511627776, 1125899906842624, 1000, 1000000, 1000000000, 1000000000000, or 1000000000000000 respectively.
For example, to limit dDocent to ten gigabytes, enter 10G or 10g
0

Do you want to quality trim your reads?
Type yes or no and press [ENTER]?
yes

Do you want to perform an assembly?
Type yes or no and press [ENTER]?
no

Reference contigs need to be in a file named reference.fasta

Do you want to map reads?  Type yes or no and press [ENTER]
yes
BWA will be used to map reads.  You may need to adjust -A -B and -O parameters for your taxa.
Would you like to enter a new parameters now? Type yes or no and press [ENTER]
yes
Please enter new value for A (match score).  It should be an integer.  Default is 1.
1
Please enter new value for B (mismatch score).  It should be an integer.  Default is 4.
4
Please enter new value for O (gap penalty).  It should be an integer.  Default is 6.
6
Do you want to use FreeBayes to call SNPs?  Please type yes or no and press [ENTER]
no

Please enter your email address.  dDocent will email you when it is finished running.
Don't worry; dDocent has no financial need to sell your email address to spammers.
michelle.stuart@rutgers.edu


At this point, all configuration information has been entered and dDocent may take several hours to run.
It is recommended that you move this script to a background operation and disable terminal input and output.
All data and logfiles will still be recorded.
To do this:
Press control and Z simultaneously
Type 'bg' without the quotes and press enter
Type 'disown -h' again without the quotes and press enter

Now sit back, relax, and wait for your analysis to finish.
Removing the _1 character and replacing with /1 in the name of every sequence
^Z
[1]+  Stopped                 dDocent
michelles 2016-07-04 15:40:01 samples $ bg
[1]+ dDocent &
michelles 2016-07-04 15:40:03 samples $ disown -h
michelles 2016-07-04 15:40:06 samples $

tar -zcvf seq15.tar.gz ../15seq/

Next scp
scp -r /local/home/michelles/02-apcl-ddocent/compressed_dDocent_input/seq15.tar.gz mrs349@elf.rdi2.rutgers.edu:/project1/mlp195-001/compressed_dDocent_input/

dDocent on Amphiprion seq03-16
dDocent on ELF seq03-16



Before adding seq15 to the main data analysis, have to trim and map the reads.  Going to do this on amphiprion and ELF to see how it goes.

copy the reference.fasta over
/local/home/michelles/02-apcl-ddocent/12seq/samples
michelles 2016-06-28 08:14:32 samples $ cp reference.fasta ../../15seq/samples/

In the samples folder (for seq15)
michelles 2016-06-28 08:15:25 samples $ dDocent
dDocent 2.18

Contact jpuritz@gmail.com with any problems

 
Checking for required software

All required software is installed!
192 individuals are detected. Is this correct? Enter yes or no and press [ENTER]
yes
Proceeding with 192 individuals
dDocent detects 40 processors available on this system.
Please enter the maximum number of processors to use for this analysis.
20
dDocent detects 252G maximum memory available on this system.
Please enter the maximum memory to use for this analysis. The size can be postfixed with
K, M, G, T, P, k, m, g, t, or p which would multiply the size with 1024, 1048576, 1073741824,
1099511627776, 1125899906842624, 1000, 1000000, 1000000000, 1000000000000, or 1000000000000000 respectively.
For example, to limit dDocent to ten gigabytes, enter 10G or 10g
0

Do you want to quality trim your reads?
Type yes or no and press [ENTER]?
yes

Do you want to perform an assembly?
Type yes or no and press [ENTER]?
no

Reference contigs need to be in a file named reference.fasta

Do you want to map reads?  Type yes or no and press [ENTER]
yes
BWA will be used to map reads.  You may need to adjust -A -B and -O parameters for your taxa.
Would you like to enter a new parameters now? Type yes or no and press [ENTER]
yes
Please enter new value for A (match score).  It should be an integer.  Default is 1.
1
Please enter new value for B (mismatch score).  It should be an integer.  Default is 4.
4
Please enter new value for O (gap penalty).  It should be an integer.  Default is 6.
6
Do you want to use FreeBayes to call SNPs?  Please type yes or no and press [ENTER]
no

Please enter your email address.  dDocent will email you when it is finished running.
Don't worry; dDocent has no financial need to sell your email address to spammers.
michelle.stuart@rutgers.edu


At this point, all configuration information has been entered and dDocent may take several hours to run.
It is recommended that you move this script to a background operation and disable terminal input and output.
All data and logfiles will still be recorded.
To do this:
Press control and Z simultaneously
Type 'bg' without the quotes and press enter
Type 'disown -h' again without the quotes and press enter

Now sit back, relax, and wait for your analysis to finish.
Removing the _1 character and replacing with /1 in the name of every sequence
^Z
[1]+  Stopped                 dDocent
michelles 2016-06-28 08:16:46 samples $ bg
[1]+ dDocent &
michelles 2016-06-28 08:16:50 samples $ disown -h
Finished at 8:52 - took ~30 minutes


Get seq read count numbers

cp ~/13-stacks_analysis_scripts/readprocesslog.py ./scripts/ 
./scripts/readprocesslog.py 

dDocent on Amphiprion June 28, 2016