- create a directory for the files
- michelles 2016-06-22 12:32:41 sequencing $ mkdir hiseq_2016_04_21_SEQ15
michelles 2016-06-22 12:36:23 sequencing $ mkdir hiseq_2016_06_01_SEQ16
- Receive files from sequencer
- https://htseq.princeton.edu/cgi-bin/login.pl?redirect_url=https://htseq.princeton.edu/cgi-bin/dashboard.pl
- Click on the sequencing run of interest in the box on the left that says “Recently Entered Samples"
- In the box titled Sample Provenance, click on the link following "This library was utilized within the following output(s):” - repeat for each lane
- In the “Data and Statistics” box, in the bottom right corner is a green button that says “Batch Download Data Files"
- Click checkmarks next to the #_read_1_passed_filter.fastq.gz and #_read_2_passed_filter.fastq.gz
- Click “Prepare selected files for download” and copy the link
- In amphiprion, in the directory you made in the previous step, paste the link (can open 4 windows and do all 4 at once)
- Repeat for all lanes
- Update where files are saved on amphiprion in sample_data file, Sequencing sheet, amphiprion folder column
- Make a working directory - make separate pool directories to keep the process radtags output separate
- michelles 2016-06-22 12:47:19 02-apcl-ddocent $ mkdir 16seq 15seq
- michelles 2016-06-22 12:47:35 02-apcl-ddocent $ cd 15seq/
michelles 2016-06-22 12:48:07 15seq $ mkdir bcsplit Pool1 Pool2 Pool3 Pool4
michelles 2016-06-22 12:48:22 15seq $ cd ../16seq/
michelles 2016-06-22 12:48:35 16seq $ mkdir bcsplit Pool1 Pool2 Pool3 Pool4
- michelles 2016-06-22 12:48:36 16seq $ cd bcsplit/
michelles 2016-06-22 12:48:56 bcsplit $ mkdir lane1 lane2
michelles 2016-06-22 12:49:10 bcsplit $ cd ../../15seq/bcsplit/
michelles 2016-06-22 12:49:19 bcsplit $ mkdir lane1 lane2
- michelles 2016-06-22 12:49:42 15seq $ mkdir logs
michelles 2016-06-22 12:49:47 15seq $ mkdir ../16seq/logs
- In your logs directory, create an index file that is the Pool name tab separated from the index used on that pool. The easiest way to do this is copy and paste from google sheets into a nano document: In the sample_data file, on the Names tab, type the pool numbers into the Pool ID column in the format below. The spreadsheet will look up the proper indexes for you. Then copy and paste into a blank nano document, save as index-seq##
- P012 ATCACG
- P013 TGACCA
- P014 CAGATC
- P015 TAGCTT
- Create a names file with the sample name tab separated from the barcode assigned to that sample. The easiest way to make a names file is to copy and paste from google sheets. Copy the ligation ID’s from the pool and paste them into the names tab, copy and paste the result into a nano document in the logs directory.
- Create a barcodes file in your logs directory: from the sample_data file, highlight the barcodes column only on the barcodes sheet and paste into nano, do not hit enter after the final barcode, save as “barcodes”.
1104 barcode_splitter.py --bcfile ../../logs/index-seq15 --idxread 2 --suffix .fastq.gz /local/shared/pinsky_lab/sequencing/hiseq_2016_04_21_SEQ15/Clownfish-ddRADseq-SEQ15-for-158-cycles-HMTNCBCXX_1_Read_1_passed_filter.fastq.gz /local/shared/pinsky_lab/sequencing/hiseq_2016_04_21_SEQ15/Clownfish-ddRADseq-SEQ15-for-158-cycles-HMTNCBCXX_1_Read_2_Index_Read_passed_filter.fastq.gz &
1105 cd ../lane2/
1106* barcode_splitter.py --bcfile ../../logs/index-seq15 --idxread 2 --suffix .fastq.gz /local/shared/pinsky_lab/sequencing/hiseqhttp://barcode_splitter.py/_2016_04_21_SEQ15/Clownfish-ddRADseq-SEQ15-for-158-cycles-HMTNCBCXX_2_Read_1_passed_filter.fastq.gz /local/shared/pinsky_lab/sequencing/hiseq_2016_04_21_SEQ15/Clownfish-ddRADseq-SEQ15-for-158-cycles-HMTNCBCXX_2_Read_2_Index_Read_passed_filter.fastq
1107 top
1108 cd ../../../
1109 cd 16seq/bcsplit/lane1/
1110 barcode_splitter.py --bcfile ../../logs/index-seq15 --idxread 2 --suffix .fastq.gz /local/shared/pinsky_lab/sequencing/hiseq_2016_06_01_SEQ16/Clownfish-ddRADseq-SEQ16-for-158-cycles-HT2T3BCXX_1_Read_1_passed_filter.fastq.gz /local/shared/pinsky_lab/sequencing/hiseq_2016_06_01_SEQ16/Clownfish-ddRADseq-SEQ16-for-158-cycles-HT2T3BCXX_1_Read_2_Index_Read_passed_filter.fastq.gz &
1111 barcode_splitter.py --bcfile ../../logs/index-seq16 --idxread 2 --suffix .fastq.gz /local/shared/pinsky_lab/sequencing/hiseq_2016_06_01_SEQ16/Clownfish-ddRADseq-SEQ16-for-158-cycles-HT2T3BCXX_1_Read_1_passed_filter.fastq.gz /local/shared/pinsky_lab/sequencing/hiseq_2016_06_01_SEQ16/Clownfish-ddRADseq-SEQ16-for-158-cycles-HT2T3BCXX_1_Read_2_Index_Read_passed_filter.fastq.gz &
1112 cd ..
1113 cd lane2/
1114 barcode_splitter.py --bcfile ../../logs/index-seq16 --idxread 2 --suffix .fastq.gz /local/shared/pinsky_lab/sequencing/hiseq_2016_06_01_SEQ16/Clownfish-ddRADseq-SEQ16-for-158-cycles-HT2T3BCXX_2_Read_1_passed_filter.fastq.gz /local/shared/pinsky_lab/sequencing/hiseq_2016_06_01_SEQ16/Clownfish-ddRADseq-SEQ16-for-158-cycles-HT2T3BCXX_2_Read_2_Index_Read_passed_filter.fastq.gz &
There was no nohup out because nohup isn’t following my path for some reason.
The output of barcode splitter is a log file, the names of all of the pools split into read-1 and read-2 fastq.gz files and unnamed reads that the program was unable to assign to an index. Read-2 and unnamed read files can be deleted.
Cat the 2 lanes into one file for process radtags
- michelles 2016-06-22 23:47:10 bcsplit $ cat ./lane1/P061-read-1.fastq.gz ./lane2/P061-read-1.fastq.gz > ../Pool1/P061.fastq.gz
- michelles 2016-06-22 23:48:58 bcsplit $ cat ./lane1/P062-read-1.fastq.gz ./lane2/P062-read-1.fastq.gz > ../Pool2/P062.fastq.gz
- michelles 2016-06-22 23:49:35 bcsplit $ cat ./lane1/P063-read-1.fastq.gz ./lane2/P063-read-1.fastq.gz > ../Pool3/P063.fastq.gz
- michelles 2016-06-22 23:49:56 bcsplit $ cat ./lane1/P064-read-1.fastq.gz ./lane2/P064-read-1.fastq.gz > ../Pool4/P064.fastq.gz
- michelles 2016-07-02 09:15:38 scripts $ chmod u+x 61process.sh 62process.sh 63process.sh 64process.sh
- michelles 2016-07-02 09:15:48 scripts $ cd ..
- michelles 2016-07-02 09:15:53 15seq $ nohup scripts/61process.sh &
- [5] 30968
- michelles 2016-07-02 09:16:02 15seq $ nohup: ignoring input and appending output to `nohup.out'
- michelles 2016-07-02 09:16:03 15seq $ nohup scripts/62process.sh &
- [6] 30970
- michelles 2016-07-02 09:16:10 15seq $ nohup: ignoring input and appending output to `nohup.out'
- michelles 2016-07-02 09:16:14 15seq $ nohup scripts/63process.sh &
- [7] 30972
- michelles 2016-07-02 09:16:21 15seq $ nohup: ignoring input and appending output to `nohup.out'
- michelles 2016-07-02 09:16:22 15seq $ nohup scripts/64process.sh &
- [8] 30974
Move the process logs into the logs folder
- michelles 2016-07-04 14:34:35 15seq $ mv Pool1/process_radtags.log ./logs/process61.log
michelles 2016-07-04 14:37:51 15seq $ mv Pool2/process_radtags.log ./logs/process62.log
michelles 2016-07-04 14:38:01 15seq $ mv Pool3/process_radtags.log ./logs/process63.log
michelles 2016-07-04 14:38:13 15seq $ mv Pool4/process_radtags.log ./logs/process64.log
Analyze the sequencing statististics to make sure everything looks like it is on the right track: use the
readprocesslog.py script
- michelles 2016-07-04 14:38:31 15seq $ ~/13-stacks_analysis_scripts/readprocesslog.py
Enter the path and file name of the log, i.e. ./logs/16process.out: ./logs/process61.log
- repeat for all pools
- Grab the tsv with Fetch
- Keep track of read statistics from sequencing run
Rename the process radtags output to sample names
- michelles 2016-07-04 15:20:36 Pool1 $ sh rename.for.dDocent_se_gz ../logs/names_61
- michelles 2016-07-04 15:20:58 Pool1 $ mv APCL_15* ../samples/
- Repeat for all pools
- rm -r Pool*
Trim and map the reads
michelles 2016-07-04 15:39:11 samples $ dDocent
dDocent 2.18
Contact jpuritz@gmail.com with any problems
Checking for required software
All required software is installed!
192 individuals are detected. Is this correct? Enter yes or no and press [ENTER]
yes
Proceeding with 192 individuals
dDocent detects 40 processors available on this system.
Please enter the maximum number of processors to use for this analysis.
0
Incorrect. Please enter the number of processing cores on this computer
15
dDocent detects 252G maximum memory available on this system.
Please enter the maximum memory to use for this analysis. The size can be postfixed with
K, M, G, T, P, k, m, g, t, or p which would multiply the size with 1024, 1048576, 1073741824,
1099511627776, 1125899906842624, 1000, 1000000, 1000000000, 1000000000000, or 1000000000000000 respectively.
For example, to limit dDocent to ten gigabytes, enter 10G or 10g
0
Do you want to quality trim your reads?
Type yes or no and press [ENTER]?
yes
Do you want to perform an assembly?
Type yes or no and press [ENTER]?
no
Reference contigs need to be in a file named reference.fasta
Do you want to map reads? Type yes or no and press [ENTER]
yes
BWA will be used to map reads. You may need to adjust -A -B and -O parameters for your taxa.
Would you like to enter a new parameters now? Type yes or no and press [ENTER]
yes
Please enter new value for A (match score). It should be an integer. Default is 1.
1
Please enter new value for B (mismatch score). It should be an integer. Default is 4.
4
Please enter new value for O (gap penalty). It should be an integer. Default is 6.
6
Do you want to use FreeBayes to call SNPs? Please type yes or no and press [ENTER]
no
Please enter your email address. dDocent will email you when it is finished running.
Don't worry; dDocent has no financial need to sell your email address to spammers.
michelle.stuart@rutgers.edu
At this point, all configuration information has been entered and dDocent may take several hours to run.
It is recommended that you move this script to a background operation and disable terminal input and output.
All data and logfiles will still be recorded.
To do this:
Press control and Z simultaneously
Type 'bg' without the quotes and press enter
Type 'disown -h' again without the quotes and press enter
Now sit back, relax, and wait for your analysis to finish.
Removing the _1 character and replacing with /1 in the name of every sequence
^Z
[1]+ Stopped dDocent
michelles 2016-07-04 15:40:01 samples $ bg
[1]+ dDocent &
michelles 2016-07-04 15:40:03 samples $ disown -h
michelles 2016-07-04 15:40:06 samples $
tar -zcvf seq15.tar.gz ../15seq/
Next scp
scp -r /local/home/michelles/02-apcl-ddocent/compressed_dDocent_input/seq15.tar.gz mrs349@elf.rdi2.rutgers.edu:/project1/mlp195-001/compressed_dDocent_input/
Before adding seq15 to the main data analysis, have to trim and map the reads. Going to do this on amphiprion and ELF to see how it goes.
copy the reference.fasta over
/local/home/michelles/02-apcl-ddocent/12seq/samples
michelles 2016-06-28 08:14:32 samples $ cp reference.fasta ../../15seq/samples/
In the samples folder (for seq15)
michelles 2016-06-28 08:15:25 samples $ dDocent
dDocent 2.18
Contact jpuritz@gmail.com with any problems
Checking for required software
All required software is installed!
192 individuals are detected. Is this correct? Enter yes or no and press [ENTER]
yes
Proceeding with 192 individuals
dDocent detects 40 processors available on this system.
Please enter the maximum number of processors to use for this analysis.
20
dDocent detects 252G maximum memory available on this system.
Please enter the maximum memory to use for this analysis. The size can be postfixed with
K, M, G, T, P, k, m, g, t, or p which would multiply the size with 1024, 1048576, 1073741824,
1099511627776, 1125899906842624, 1000, 1000000, 1000000000, 1000000000000, or 1000000000000000 respectively.
For example, to limit dDocent to ten gigabytes, enter 10G or 10g
0
Do you want to quality trim your reads?
Type yes or no and press [ENTER]?
yes
Do you want to perform an assembly?
Type yes or no and press [ENTER]?
no
Reference contigs need to be in a file named reference.fasta
Do you want to map reads? Type yes or no and press [ENTER]
yes
BWA will be used to map reads. You may need to adjust -A -B and -O parameters for your taxa.
Would you like to enter a new parameters now? Type yes or no and press [ENTER]
yes
Please enter new value for A (match score). It should be an integer. Default is 1.
1
Please enter new value for B (mismatch score). It should be an integer. Default is 4.
4
Please enter new value for O (gap penalty). It should be an integer. Default is 6.
6
Do you want to use FreeBayes to call SNPs? Please type yes or no and press [ENTER]
no
Please enter your email address. dDocent will email you when it is finished running.
Don't worry; dDocent has no financial need to sell your email address to spammers.
michelle.stuart@rutgers.edu
At this point, all configuration information has been entered and dDocent may take several hours to run.
It is recommended that you move this script to a background operation and disable terminal input and output.
All data and logfiles will still be recorded.
To do this:
Press control and Z simultaneously
Type 'bg' without the quotes and press enter
Type 'disown -h' again without the quotes and press enter
Now sit back, relax, and wait for your analysis to finish.
Removing the _1 character and replacing with /1 in the name of every sequence
^Z
[1]+ Stopped dDocent
michelles 2016-06-28 08:16:46 samples $ bg
[1]+ dDocent &
michelles 2016-06-28 08:16:50 samples $ disown -h
Finished at 8:52 - took ~30 minutes
Get seq read count numbers
cp ~/13-stacks_analysis_scripts/readprocesslog.py ./scripts/
./scripts/readprocesslog.py