February 7, 2016
- create a directory for the files (mkdir), use the naming scheme type_of_sequencer_year_month_day_SEQ##: michelles 2016-02-07 14:28:45 sequencing $ mkdir hiseq_2015_12_18_SEQ12
michelles 2016-02-07 14:31:48 sequencing $ mkdir hiseq_2015_12_18_SEQ13
- Receive files from sequencer
- https://htseq.princeton.edu/cgi-bin/login.pl?redirect_url=https://htseq.princeton.edu/cgi-bin/dashboard.pl
- Click on the sequencing run of interest in the box on the left that says “Recently Entered Samples"
- In the box titled Sample Provenance, click on the link following "This library was utilized within the following output(s):” - repeat for each lane
- Right click on the first read and “copy link"
- In amphiprion, in the directory you made in the previous step, type:
- curl -L -O and paste the link, it should look like this
- Didn’t work, went into the physical location of amphiprion and downloaded via Firefox
- Repeat for each read within this lane, then go back and repeat for each lane
- Count raw reads
- michelles 2016-02-07 16:02:18 hiseq_2015_12_18_SEQ12 $ zcat clownfish-ddradseq-seq12-for-158-cycles-hjl3tbcxx_1_read_1_passed_filter.fastq.gz | wc -l | awk '{print$1/4}'
- 98179108
- michelles 2016-02-07 16:06:19 hiseq_2015_12_18_SEQ12 $ zcat clownfish-ddradseq-seq12-for-158-cycles-hjl3tbcxx_2_read_1_passed_filter.fastq.gz | wc -l | awk '{print$1/4}'
- 94793182
- Update where files are saved on amphiprion in sample_data file, Sequencing sheet, amphiprion folder column
- Make a working directory - make separate pool directories to keep the process radtags output separate
- michelles 2016-02-07 16:14:48 02-apcl-ddocent $ mkdir 12seq
- $ cd 12seq/
$ mkdir bcsplit Pool1 Pool2 Pool3 Pool4 logs
$ cd bcsplit
$ mkdir lane1 lane2
- In your logs directory, create an index file that is the Pool name tab separated from the index used on that pool. The easiest way to do this is copy and paste from google sheets into a nano document: In the sample_data file, on the Names tab, type the pool numbers into the Pool ID column in the format below. The spreadsheet will look up the proper indexes for you. Then copy and paste into a blank nano document, save as index-seq##
-
P053 |
ACTTGA |
P054 |
GATCAG |
P055 |
GGCTAC |
P056 |
CTTGTA |
- $ mkdir logs
$ cd logs/
$ nano
michelles 2016-02-07 16:19:08 logs $ cd ..
michelles 2016-02-07 16:19:24 bcsplit $ pwd
/local/home/michelles/02-apcl-ddocent/12seq/bcsplit
michelles 2016-02-07 16:19:28 bcsplit $ mv logs ..
- Create a names file with the sample name tab separated from the barcode assigned to that sample. The easiest way to make a names file is to copy and paste from google sheets. Copy the ligation ID’s from the pool and paste them into the names tab, copy and paste the result into a nano document in the logs directory.
- Create a barcodes file in your logs directory: from the sample_data file, highlight the barcodes column only on the barcodes sheet and paste into nano, do not hit enter after the final barcode, save as “barcodes”.
- Run barcode splitter on 1st lane with nohup - notice the only difference between the filenames is in red below.
michelles 2016-02-07 16:27:03 lane1 $ nohup barcode_splitter.py --bcfile ../../logs/index_seq12 --idxread 2 --suffix .fastq.gz /local/shared/pinsky_lab/sequencing/hiseq_2015_12_18_SEQ12/clownfish-ddradseq-seq12-for-158-cycles-hjl3tbcxx_1_read_1_passed_filter.fastq.gz /local/shared/pinsky_lab/sequencing/hiseq_2015_12_18_SEQ12/clownfish-ddradseq-seq12-for-158-cycles-hjl3tbcxx_1_read_2_index_read_passed_filter.fastq.gz &
[1] 26418
nohup: failed to run command `barcode_splitter.py': No such file or directory
not sure why it isn’t seeing the script, it is in my path but for some reason there is an error, think the tar is causing the trouble, delete tar? NO, move to different location
michelles 2016-02-07 16:42:34 lane1 $ nohup ~/14_programs/paired_sequence_utils/barcode_splitter.py --bcfile ../../logs/index_seq12 --idxread 2 --suffix .fastq.gz /local/shared/pinsky_lab/sequencing/hiseq_2015_12_18_SEQ12/clownfish-ddradseq-seq12-for-158-cycles-hjl3tbcxx_1_read_1_passed_filter.fastq.gz /local/shared/pinsky_lab/sequencing/hiseq_2015_12_18_SEQ12/clownfish-ddradseq-seq12-for-158-cycles-hjl3tbcxx_1_read_2_index_read_passed_filter.fastq.gz
nohup: ignoring input and appending output to `nohup.out'
Run barcode splitter on 2nd lane with nohup
michelles 2016-02-07 16:44:52 ~ $ nohup ~/14_programs/paired_sequence_utils/barcode_splitter.py --bcfile ../../logs/index_seq12 --idxread 2 --suffix .fastq.gz /local/shared/pinsky_lab/sequencing/hiseq_2015_12_18_SEQ12/clownfish-ddradseq-seq12-for-158-cycles-hjl3tbcxx_2_read_1_passed_filter.fastq.gz /local/shared/pinsky_lab/sequencing/hiseq_2015_12_18_SEQ12/clownfish-ddradseq-seq12-for-158-cycles-hjl3tbcxx_2_read_2_index_read_passed_filter.fastq.gz
nohup: ignoring input and appending output to `nohup.out'
February 11, 2016
- Cat the 2 lanes into one file for process radtags
- michelles 2016-02-11 13:41:48 bcsplit $ cat lane2/P053-read-1.fastq.gz lane1/P053-read-1.fastq.gz > ../Pool1/P053.fastq.gz
- $ cat lane2/P054-read-1.fastq.gz lane1/P054-read-1.fastq.gz > ../Pool2/P054.fastq.gz
- $ cat lane2/P055-read-1.fastq.gz lane1/P055-read-1.fastq.gz > ../Pool3/P055.fastq.gz
- $ cat lane2/P054-read-1.fastq.gz lane1/P054-read-1.fastq.gz > ../Pool2/P054.fastq.gz
February 13, 2016
- Run process rad tags script after using nano to adjust for Pool
- /local/home/michelles/02-apcl-ddocent/12seq/scripts
- michelles 2016-02-13 09:22:20scripts$ nano 53process.sh
- #!/bin/bash
- process_radtags -b ../logs/barcodes -c -q --renz_1 pstI --renz_2 mluCI \
- -i gzfastq --adapter_1 ACACTCTTTCCCTACACGACGCTCTTCCGATCT \
- -f ../Pool1/P053.fastq.gz -o ./
- mv process_radtags.log ../logs/53process.out
- cat ./logs/53process.out | mail -s "process 53 complete" michelle.stuart@rutgers.edu
- michelles 2016-02-13 09:18:13 Pool1 $ nohup ../scripts/53process.sh
- michelles 2016-02-13 09:20:30 Pool2 $ nohup ../scripts/54process.sh
- michelles 2016-02-13 09:21:09 Pool3 $ nohup ../scripts/55process.sh
- michelles 2016-02-13 09:22:19 Pool4 $ nohup ../scripts/56process.sh
- Rename the samples for dDocent
- michelles 2016-02-13 14:10:24 Pool1 $ sh rename.for.dDocent_se_gz ../logs/names-53
- michelles 2016-02-13 14:11:16 Pool2 $ sh rename.for.dDocent_se_gz ../logs/names-54
- michelles 2016-02-13 14:11:30 Pool3 $ sh rename.for.dDocent_se_gz ../logs/names-55
- michelles 2016-02-13 14:11:41 Pool4 $ sh rename.for.dDocent_se_gz ../logs/names-56
- Move the named samples into the samples directory or if working in dDocent, move them to the working directory
- michelles 2016-02-13 14:17:5012seq$ mv Pool4/A* ./samples/
- In a folder containing only the new samples *.F.fq.gz
- copy in reference.fasta from original dataset
- michelles 2016-02-13 14:19:45 12seq $ cp ~/02-apcl-ddocent/jonsfiles/reference.fasta samples/
February 15, 2016
- Type dDocent on the command line of your working directory
$dDocent
Variables used in dDocent Run at Mon Feb 15 15:35:06 EST 2016
Number of Processors
20
Maximum Memory
0
Trimming
yes
Assembly?
no
Type_of_Assembly
Clustering_Similarity%
Mapping_Reads?
yes
Mapping_Match_Value
1
Mapping_MisMatch_Value
4
Mapping_GapOpen_Penalty
6
Calling_SNPs?
no
Email
michelle.stuart@rutgers.edu
February 16, 2016
When dDocent has finished, the *.F.fq.gz, *.R1.fq.gz, *-RG.bam, and *-RG.bam.bai files to the main analysis folder
michelles 2016-02-16 07:41:15 samples $ ln -s APCL_1* ~/02-apcl-ddocent/APCL_analysis/15-02-2016/
Change directories to the main analysis folder and call SNPS:
Type