SEQ15

create a directory for the files
- michelles 2016-06-22 12:32:41 sequencing $ mkdir hiseq_2016_04_21_SEQ15
  michelles 2016-06-22 12:36:23 sequencing $ mkdir hiseq_2016_06_01_SEQ16
Receive files from sequencer
- https://htseq.princeton.edu/cgi-bin/login.pl?redirect_url=https://htseq.princeton.edu/cgi-bin/dashboard.pl
- Click on the sequencing run of interest in the box on the left that says “Recently Entered Samples"
- In the box titled Sample Provenance, click on the link following "This library was utilized within the following output(s):” - repeat for each lane
- In the “Data and Statistics” box, in the bottom right corner is a green button that says “Batch Download Data Files"
- Click checkmarks next to the #_read_1_passed_filter.fastq.gz and #_read_2_passed_filter.fastq.gz
- Click “Prepare selected files for download” and copy the link
- In amphiprion, in the directory you made in the previous step, paste the link (can open 4 windows and do all 4 at once)
  - wget -r https://htseq.princeton.edu/tmp/zbl9rsZMYMw1YCtNO/
  - wget -r https://htseq.princeton.edu/tmp/njec3iwfxtn9q69hl/
  - wget -r https://htseq.princeton.edu/tmp/nWDCkEDOoZ8dDJ2gX
  - wget -r https://htseq.princeton.edu/tmp/urH68Kc0rAPLlZWAv/
- Repeat for all lanes
Update where files are saved on amphiprion in sample_data file, Sequencing sheet, amphiprion folder column
Make a working directory - make separate pool directories to keep the process radtags output separate
- michelles 2016-06-22 12:47:19 02-apcl-ddocent $ mkdir 16seq 15seq
- michelles 2016-06-22 12:47:35 02-apcl-ddocent $ cd 15seq/
  michelles 2016-06-22 12:48:07 15seq $ mkdir bcsplit Pool1 Pool2 Pool3 Pool4
  michelles 2016-06-22 12:48:22 15seq $ cd ../16seq/
  michelles 2016-06-22 12:48:35 16seq $ mkdir bcsplit Pool1 Pool2 Pool3 Pool4
- michelles 2016-06-22 12:48:36 16seq $ cd bcsplit/
  michelles 2016-06-22 12:48:56 bcsplit $ mkdir lane1 lane2
  michelles 2016-06-22 12:49:10 bcsplit $ cd ../../15seq/bcsplit/
  michelles 2016-06-22 12:49:19 bcsplit $ mkdir lane1 lane2
- michelles 2016-06-22 12:49:42 15seq $ mkdir logs
  michelles 2016-06-22 12:49:47 15seq $ mkdir ../16seq/logs
In your logs directory, create an index file that is the Pool name tab separated from the index used on that pool. The easiest way to do this is copy and paste from google sheets into a nano document: In the sample_data file, on the Names tab, type the pool numbers into the Pool ID column in the format below. The spreadsheet will look up the proper indexes for you. Then copy and paste into a blank nano document, save as index-seq##
- P012 ATCACG
- P013 TGACCA
- P014 CAGATC
- P015 TAGCTT
Create a names file with the sample name tab separated from the barcode assigned to that sample. The easiest way to make a names file is to copy and paste from google sheets. Copy the ligation ID’s from the pool and paste them into the names tab, copy and paste the result into a nano document in the logs directory.
Create a barcodes file in your logs directory: from the sample_data file, highlight the barcodes column only on the barcodes sheet and paste into nano, do not hit enter after the final barcode, save as “barcodes”.

1104 barcode_splitter.py --bcfile ../../logs/index-seq15 --idxread 2 --suffix .fastq.gz /local/shared/pinsky_lab/sequencing/hiseq_2016_04_21_SEQ15/Clownfish-ddRADseq-SEQ15-for-158-cycles-HMTNCBCXX_1_Read_1_passed_filter.fastq.gz /local/shared/pinsky_lab/sequencing/hiseq_2016_04_21_SEQ15/Clownfish-ddRADseq-SEQ15-for-158-cycles-HMTNCBCXX_1_Read_2_Index_Read_passed_filter.fastq.gz &

1105 cd ../lane2/

1106* barcode_splitter.py --bcfile ../../logs/index-seq15 --idxread 2 --suffix .fastq.gz /local/shared/pinsky_lab/sequencing/hiseqhttp://barcode_splitter.py/_2016_04_21_SEQ15/Clownfish-ddRADseq-SEQ15-for-158-cycles-HMTNCBCXX_2_Read_1_passed_filter.fastq.gz /local/shared/pinsky_lab/sequencing/hiseq_2016_04_21_SEQ15/Clownfish-ddRADseq-SEQ15-for-158-cycles-HMTNCBCXX_2_Read_2_Index_Read_passed_filter.fastq

1107 top
1108 cd ../../../
1109 cd 16seq/bcsplit/lane1/
1110 barcode_splitter.py --bcfile ../../logs/index-seq15 --idxread 2 --suffix .fastq.gz /local/shared/pinsky_lab/sequencing/hiseq_2016_06_01_SEQ16/Clownfish-ddRADseq-SEQ16-for-158-cycles-HT2T3BCXX_1_Read_1_passed_filter.fastq.gz /local/shared/pinsky_lab/sequencing/hiseq_2016_06_01_SEQ16/Clownfish-ddRADseq-SEQ16-for-158-cycles-HT2T3BCXX_1_Read_2_Index_Read_passed_filter.fastq.gz &
1111 barcode_splitter.py --bcfile ../../logs/index-seq16 --idxread 2 --suffix .fastq.gz /local/shared/pinsky_lab/sequencing/hiseq_2016_06_01_SEQ16/Clownfish-ddRADseq-SEQ16-for-158-cycles-HT2T3BCXX_1_Read_1_passed_filter.fastq.gz /local/shared/pinsky_lab/sequencing/hiseq_2016_06_01_SEQ16/Clownfish-ddRADseq-SEQ16-for-158-cycles-HT2T3BCXX_1_Read_2_Index_Read_passed_filter.fastq.gz &
1112 cd ..
1113 cd lane2/

1114 barcode_splitter.py --bcfile ../../logs/index-seq16 --idxread 2 --suffix .fastq.gz /local/shared/pinsky_lab/sequencing/hiseq_2016_06_01_SEQ16/Clownfish-ddRADseq-SEQ16-for-158-cycles-HT2T3BCXX_2_Read_1_passed_filter.fastq.gz /local/shared/pinsky_lab/sequencing/hiseq_2016_06_01_SEQ16/Clownfish-ddRADseq-SEQ16-for-158-cycles-HT2T3BCXX_2_Read_2_Index_Read_passed_filter.fastq.gz &

michelles 2016-07-04 15:39:11 samples $ dDocent
dDocent 2.18

Contact jpuritz@gmail.com with any problems

Checking for required software

All required software is installed!
192 individuals are detected. Is this correct? Enter yes or no and press [ENTER]
yes
Proceeding with 192 individuals
dDocent detects 40 processors available on this system.
Please enter the maximum number of processors to use for this analysis.
0
Incorrect. Please enter the number of processing cores on this computer
15
dDocent detects 252G maximum memory available on this system.
Please enter the maximum memory to use for this analysis. The size can be postfixed with
K, M, G, T, P, k, m, g, t, or p which would multiply the size with 1024, 1048576, 1073741824,
1099511627776, 1125899906842624, 1000, 1000000, 1000000000, 1000000000000, or 1000000000000000 respectively.
For example, to limit dDocent to ten gigabytes, enter 10G or 10g
0

Do you want to quality trim your reads?
Type yes or no and press [ENTER]?
yes

Do you want to perform an assembly?
Type yes or no and press [ENTER]?
no

Reference contigs need to be in a file named reference.fasta

Do you want to map reads? Type yes or no and press [ENTER]
yes
BWA will be used to map reads. You may need to adjust -A -B and -O parameters for your taxa.
Would you like to enter a new parameters now? Type yes or no and press [ENTER]
yes
Please enter new value for A (match score). It should be an integer. Default is 1.
1
Please enter new value for B (mismatch score). It should be an integer. Default is 4.
4
Please enter new value for O (gap penalty). It should be an integer. Default is 6.
6
Do you want to use FreeBayes to call SNPs? Please type yes or no and press [ENTER]
no

Please enter your email address. dDocent will email you when it is finished running.
Don't worry; dDocent has no financial need to sell your email address to spammers.
michelle.stuart@rutgers.edu

At this point, all configuration information has been entered and dDocent may take several hours to run.
It is recommended that you move this script to a background operation and disable terminal input and output.
All data and logfiles will still be recorded.
To do this:
Press control and Z simultaneously
Type 'bg' without the quotes and press enter
Type 'disown -h' again without the quotes and press enter

Now sit back, relax, and wait for your analysis to finish.
Removing the _1 character and replacing with /1 in the name of every sequence
^Z
[1]+ Stopped dDocent
michelles 2016-07-04 15:40:01 samples $ bg
[1]+ dDocent &
michelles 2016-07-04 15:40:03 samples $ disown -h

michelles 2016-06-28 08:15:25 samples $ dDocent
dDocent 2.18

Contact jpuritz@gmail.com with any problems

Checking for required software

All required software is installed!
192 individuals are detected. Is this correct? Enter yes or no and press [ENTER]
yes
Proceeding with 192 individuals
dDocent detects 40 processors available on this system.
Please enter the maximum number of processors to use for this analysis.
20
dDocent detects 252G maximum memory available on this system.
Please enter the maximum memory to use for this analysis. The size can be postfixed with
K, M, G, T, P, k, m, g, t, or p which would multiply the size with 1024, 1048576, 1073741824,
1099511627776, 1125899906842624, 1000, 1000000, 1000000000, 1000000000000, or 1000000000000000 respectively.
For example, to limit dDocent to ten gigabytes, enter 10G or 10g
0

Do you want to quality trim your reads?
Type yes or no and press [ENTER]?
yes

Do you want to perform an assembly?
Type yes or no and press [ENTER]?
no

Reference contigs need to be in a file named reference.fasta

Do you want to map reads? Type yes or no and press [ENTER]
yes
BWA will be used to map reads. You may need to adjust -A -B and -O parameters for your taxa.
Would you like to enter a new parameters now? Type yes or no and press [ENTER]
yes
Please enter new value for A (match score). It should be an integer. Default is 1.
1
Please enter new value for B (mismatch score). It should be an integer. Default is 4.
4
Please enter new value for O (gap penalty). It should be an integer. Default is 6.
6
Do you want to use FreeBayes to call SNPs? Please type yes or no and press [ENTER]
no

Please enter your email address. dDocent will email you when it is finished running.
Don't worry; dDocent has no financial need to sell your email address to spammers.
michelle.stuart@rutgers.edu

At this point, all configuration information has been entered and dDocent may take several hours to run.
It is recommended that you move this script to a background operation and disable terminal input and output.
All data and logfiles will still be recorded.
To do this:
Press control and Z simultaneously
Type 'bg' without the quotes and press enter
Type 'disown -h' again without the quotes and press enter

Now sit back, relax, and wait for your analysis to finish.
Removing the _1 character and replacing with /1 in the name of every sequence
^Z
[1]+ Stopped dDocent
michelles 2016-06-28 08:16:46 samples $ bg
[1]+ dDocent &