- Changing the process pool scripts to move to the correct location.
- Started process radtags, should be done around 4:30pm (during lab meeting)
- started bcsplit, should be done around 7:30pm tonight
7-13-2015
- make an index file for barcode splitter In the bcsplit directory:
- In google sheets, make a file that contains the Pool ID’s in the first column and corresponding illumine index in the second column, separated by tabs.
P016 | ATCACG |
P017 | ACAGTG |
P018 | GCCAAT |
P019 | GATCAG |
- Run barcode splitter on 1st lane
- [michelles@amphiprion 00_working]$ NOHUP barcode_splitter.py --bcfile SEQ07_index.tsv --idxread 2 --suffix .fastq.gz /local/shared/pinsky_lab/sequencing/hiseq_2015_05_01_SEQ07/clownfish-ddradseq-seq07-for-231-cycles-h3mgjbcxx_1_read_1_passed_filter.fastq.gz /local/shared/pinsky_lab/sequencing/hiseq_2015_05_01_SEQ07/clownfish-ddradseq-seq07-for-231-cycles-h3mgjbcxx_1_read_2_index_read_passed_filter.fastq.gz
- can ctrl-z, bg to run in background (or nohup)
- rename unmatched files:
- [michelles@amphiprion bcsplit]$ mv unmatched-read-1.fastq.gz seq07-unmatched-read-1.fastq.gz
[michelles@amphiprion bcsplit]$ mv unmatched-read-2.fastq.gz seq07-unmatched-read-2.fastq.gz
- move all of these files into a "lane 1 directory” or lane 2 will overwrite
- [michelles@amphiprion bcsplit]$ mkdir lane1
[michelles@amphiprion bcsplit]$ mv P01* ./lane1/
- Run barcode splitter on 2nd lane
- [michelles@amphiprion bcsplit]$ nohup barcode_splitter.py --bcfile SEQ07_index.tsv --idxread 2 --suffix .fastq.gz /local/shared/pinsky_lab/sequencing/hiseq_2015_05_01_SEQ07/clownfish-ddradseq-seq07-for-231-cycles-h3mgjbcxx_2_read_1_passed_filter.fastq.gz /local/shared/pinsky_lab/sequencing/hiseq_2015_05_01_SEQ07/clownfish-ddradseq-seq07-for-231-cycles-h3mgjbcxx_2_read_2_index_read_passed_filter.fastq.gz
- combine all of the results:
- [michelles@amphiprion bcsplit]$ cat ./seq07-unmatched-read-1.fastq.gz ./unmatched-read-1.fastq.gz > seq07-unmatched-read-1.fastq.gz
cat: ./seq07-unmatched-read-1.fastq.gz: input file is output file - cat ./seq07-unmatched-read-1.fastq.gz ./unmatched-read-1.fastq.gz > SEQ07-unmatched-read-1.fastq.gz
- [michelles@amphiprion bcsplit]$ cat ./seq07-unmatched-read-2.fastq.gz ./unmatched-read-2.fastq.gz > SEQ07-unmatched-read-2.fastq.gz
[michelles@amphiprion bcsplit]$ cat ./lane1/P016-read-1.fastq.gz ./P016-read-1.fastq.gz > PO16.fastq.gz - [michelles@amphiprion bcsplit]$ cat ./lane1/P017-read-1.fastq.gz ./P017-read-1.fastq.gz > P017.fastq.gz
- [michelles@amphiprion bcsplit]$ cat ./lane1/P018-read-1.fastq.gz ./P018-read-1.fastq.gz > P018.fastq.gz
- [michelles@amphiprion bcsplit]$ cat ./lane1/P019-read-1.fastq.gz ./P019-read-1.fastq.gz > P019.fastq.gz
- [michelles@amphiprion bcsplit]$ cd lane1/
- [michelles@amphiprion lane1]$ rm *-2.fastq.gz
- Created a barcodes file: from the sample_data spreadsheet, copied the barcodes column from the barcodes tab and pasted into nano, saved as barcodes in 00_working directory
- make directories for Pools
- [michelles@amphiprion 00_working]$ mkdir 16Pool 17Pool 18Pool 19Pool 30Pool 31Pool 32Pool 33Pool
- Run process_radtags on pools
- #!/bin/bash
process_radtags -b barcodes -c -q -r \
--renz_1 pstI --renz_2 mluCI \
-i gzfastq \
--adapter_1 ACACTCTTTCCCTACACGACGCTCTTCCGATCT \
-f ./bcsplit/P016.fastq.gz \
-o ./16Pool
951 nohup ./16process.sh
952 less nohup.out
954 nohup ./16process.sh
955 less nohup.out
956 rm nohup.out
958 ls
959 nohup ./16process.sh
960 cd 00_working/19Pool/
961 cp ../18Pool/18process.sh ./
964 nohup ./19process.sh
965 top
966 cd 00_working/
967 cp ./17Pool/17process.sh ./18Pool/
968 cd 18Pool/
969 ls
972 ls
974 nohup ./18process.sh
975 cd 00_working/
977 cp ../19Pool/19process.sh ./
- make a new names_barcodes tab delimited file:
- from google docs copy names and barcodes into a nano file in the correct pool directory
- Rename files and Move into one directory
- [michelles@amphiprion 16Pool]$ sh rename.for.dDocent_se_gz 16Pool_names.tsv
- [michelles@amphiprion 16Pool]$ cd ../17Pool/
- [michelles@amphiprion 17Pool]$ sh rename.for.dDocent_se_gz 17Pool_names.tsv
- [michelles@amphiprion 17Pool]$ cd ../18Pool/
[michelles@amphiprion 18Pool]$ sh rename.for.dDocent_se_gz 18Pool_names.tsv - [michelles@amphiprion 18Pool]$ cd ../19Pool/
[michelles@amphiprion 19Pool]$ sh rename.for.dDocent_se_gz 19Pool_names.tsv
- [michelles@amphiprion 00_working]$ mv ./16Pool/APCL13_* ./samples/
[michelles@amphiprion 00_working]$ mv ./17Pool/APCL13_* ./samples/
[michelles@amphiprion 00_working]$ mv ./18Pool/APCL13_1* ./samples/
[michelles@amphiprion 00_working]$ mv ./19Pool/APCL14_* ./samples/
[michelles@amphiprion 00_working]$ mv ./16Pool/16* ./logs/
[michelles@amphiprion 00_working]$ mv ./17Pool/17* ./logs/
[michelles@amphiprion 00_working]$ mv ./18Pool/18* ./logs/
[michelles@amphiprion 00_working]$ mv ./19Pool/19* ./logs/
- Run ustacks of seq07 and seq08 together
- #!/bin/bash
i=70001
for file in ./samples/*.fq.gz
do
ustacks -t gzfastq -p 20 -d -r -m 3 -M 5 --max_locus_stacks 3 -i $i \
-f ${file} -o ./lax-stacks
let "i+=1";
done
- change name of nohup.out
- mv nohup.out ./logs/seq07_08_ustacks.out
- Do not run cstacks again at this time. Could run cstacks with new files to generate a more comprehensive catalog but we can do that in the future. Currently using the catalog that is made up of the samples with the most loci from SEQ03, SEQ04, SEQ05 generated post rxstacks
- copy catalog from the lax-rxstacks folder to the lax-stacks folder
- [michelles@amphiprion 00_working]$ cp ./lax-rxstacks/batch_1.catalog.* ./lax-stacks/
- Run sstacks (rxstacks needs the matches files). Use the rx-catalogs
- [michelles@amphiprion 00_working]$ nohup ./sstacks-lax.sh
- #!/bin/bash
for file in $(ls -1 ./lax-stacks/*.tags.tsv.gz \
| grep -v catalog | perl -pe 's/\.tags\.tsv.gz//')
do
sstacks -p 20 -b 1 -c ./lax-stacks/batch_1 \
-s $file \
-o ./lax-stacks/
- Run rxstacks:
- copy catalog from the lax-rxstacks folder to the lax-stacks folder
- [michelles@amphiprion 00_working]$ cp ./lax-rxstacks/batch_1.catalog.* ./lax-stacks/
- [michelles@amphiprion 00_working]$ nohup ./rxstacks.sh
- #!/bin/bash
#Run rxstacks, batch 1, 10 threads - the catalog files have to \
#be in the ./lax-stacks directory, copy them there now if needed.
rxstacks -b 1 -t 25 \
--conf_filter --conf_lim 0.25 \
--model_type bounded --bound_high 0.1 \
--prune_haplo \
--lnl_lim -8.0 \
--lnl_dist \
-P ./lax-stacks \
-o ./lax-rxstacks
- change name of nohup.out to 05seq_rxstacks.out
- Run sstacks: nohup ./rxsstacks.sh
- #!/bin/bash
#Run sstacks
for file in $(ls -1 ./lax-rxstacks/*.tags.tsv \
| grep -v catalog | perl -pe 's/\.tags\.tsv//')
do
sstacks -p 25 -b 1 -c ./lax-rxstacks/batch_1 \
-s $file \
-o ./lax-rxstacks/
done
- change name of nohup.out to 05seq_sstacks.out
- Move samples to the lax-stacks and rx-stacks directories in philippines/genotyping for all to use
- Run populations on all APCL samples (not just this seq batch)- don’t filter this time around (no -m or -r, etc)
- #!/bin/bash
# Calculate population statistics and export several output files.
populations -b 1 -P ./lax-rxstacks/ -s -t 15
- change name of nohup.out to YYMMDD_populations.out
- Create a mysql database
- [michelles@amphiprion ~]$ mysql -plarvae168
- mysql> create database seq05;
- mysql> show databases;
- mysql> exit
- Apply stacks configuration to database:
- [michelles@amphiprion ~]$ mysql -plarvae168 seq05 < ~/local/share/stacks/sql/stacks.sql
- Load radtags to database
- [michelles@amphiprion rxstacks]$ load_radtags.pl -D seq05 -p ./ -b 1 -c -B
- Index database
- Move the matches, snaps, alleles, and tags files into a combined folder to run all populations together - can this be done with the export script? - Can these files be stored in mysql and then extracted just for this run?
- Run populations to produce gene pop file - can’t run all of the samples from multiple runs in the same populations - unless you change the “i” in the script to something that will generate a unique ID???
- [michelles@amphiprion 01_seq04]$ cd rxstacks/
- [michelles@amphiprion rxstacks]$ nohup populations -b 1 -P ./ -t 10 -r 90 -m 30 -s --genepop where m is the coverage and r is the percent of the population. Start at 90/30, try 90/10, 80/30, 80/10, etc until you get 200 loci
- In Windows,
- Logged in to windows using the larva password
Started Cervus
Click on Tools>Convert Genotype File>Genepop to Cervus
2 digit format
do not use first ID as population name
Converted 192 individuals in one population at 1132 loci
Clicked on Analysis>Allele Frequency Analysis
Choose the cervus file just created
ID in column 2
First allele in column 3
Number of loci 1132 gives error and says there are 1134 loci, so running that
Save as seq04
OK - Run identity analysis to eliminate duplicate individuals
- Now run a simulation of parentage analysisClick on analysis>simulation of parentage analysis>sexes of pair unknownChange parent number to Saved as seq04simparChange to LOD
- Ran simulated parentage with 2000 parents, 5000 offspring 0.1 proport samples, .86 typed loci, .01 mismatch, LOD. Computer says it will take 2 hours. - computer froze, dropped it down to 1000 parents, 2000 offspring - took 6+ hours to run
- Create parent and offspring files: To make an adult/juvenile file
- Run parentage analysis in Cervus - make sure the csv’s have unix line endings in komodo (if only 1 offspring runs, this is the problem).
- Run
- If running colony, need a marker file that contains the marker IDs, if they are dominant or codominant, and gene dropping, mutation rates.