SEQ07

SEQ07_seq_processing.ipynb

2018-10-17 Wednesday

finished and waiting for 2 more seqs to start trim, map, read
move_regenos.sh
seq_proc.nb.html
regenos_from_SEQ07.csv

2018-10-16 Tuesday

Changing the process pool scripts to move to the correct location.
Started process radtags, should be done around 4:30pm (during lab meeting)

2018-10-15 Monday

started bcsplit, should be done around 7:30pm tonight

7-13-2015

make an index file for barcode splitter In the bcsplit directory:

In google sheets, make a file that contains the Pool ID’s in the first column and corresponding illumine index in the second column, separated by tabs.

P016	ATCACG
P017	ACAGTG
P018	GCCAAT
P019	GATCAG

Run barcode splitter on 1st lane

[michelles@amphiprion 00_working]$ NOHUP barcode_splitter.py --bcfile SEQ07_index.tsv --idxread 2 --suffix .fastq.gz /local/shared/pinsky_lab/sequencing/hiseq_2015_05_01_SEQ07/clownfish-ddradseq-seq07-for-231-cycles-h3mgjbcxx_1_read_1_passed_filter.fastq.gz /local/shared/pinsky_lab/sequencing/hiseq_2015_05_01_SEQ07/clownfish-ddradseq-seq07-for-231-cycles-h3mgjbcxx_1_read_2_index_read_passed_filter.fastq.gz
can ctrl-z, bg to run in background (or nohup)
rename unmatched files:

[michelles@amphiprion bcsplit]$ mv unmatched-read-1.fastq.gz seq07-unmatched-read-1.fastq.gz
[michelles@amphiprion bcsplit]$ mv unmatched-read-2.fastq.gz seq07-unmatched-read-2.fastq.gz

move all of these files into a "lane 1 directory” or lane 2 will overwrite
[michelles@amphiprion bcsplit]$ mkdir lane1
[michelles@amphiprion bcsplit]$ mv P01* ./lane1/

Run barcode splitter on 2nd lane

[michelles@amphiprion bcsplit]$ nohup barcode_splitter.py --bcfile SEQ07_index.tsv --idxread 2 --suffix .fastq.gz /local/shared/pinsky_lab/sequencing/hiseq_2015_05_01_SEQ07/clownfish-ddradseq-seq07-for-231-cycles-h3mgjbcxx_2_read_1_passed_filter.fastq.gz /local/shared/pinsky_lab/sequencing/hiseq_2015_05_01_SEQ07/clownfish-ddradseq-seq07-for-231-cycles-h3mgjbcxx_2_read_2_index_read_passed_filter.fastq.gz

combine all of the results:

[michelles@amphiprion bcsplit]$ cat ./seq07-unmatched-read-1.fastq.gz ./unmatched-read-1.fastq.gz > seq07-unmatched-read-1.fastq.gz
cat: ./seq07-unmatched-read-1.fastq.gz: input file is output file
cat ./seq07-unmatched-read-1.fastq.gz ./unmatched-read-1.fastq.gz > SEQ07-unmatched-read-1.fastq.gz
[michelles@amphiprion bcsplit]$ cat ./seq07-unmatched-read-2.fastq.gz ./unmatched-read-2.fastq.gz > SEQ07-unmatched-read-2.fastq.gz
[michelles@amphiprion bcsplit]$ cat ./lane1/P016-read-1.fastq.gz ./P016-read-1.fastq.gz > PO16.fastq.gz
[michelles@amphiprion bcsplit]$ cat ./lane1/P017-read-1.fastq.gz ./P017-read-1.fastq.gz > P017.fastq.gz
[michelles@amphiprion bcsplit]$ cat ./lane1/P018-read-1.fastq.gz ./P018-read-1.fastq.gz > P018.fastq.gz
[michelles@amphiprion bcsplit]$ cat ./lane1/P019-read-1.fastq.gz ./P019-read-1.fastq.gz > P019.fastq.gz
[michelles@amphiprion bcsplit]$ cd lane1/
[michelles@amphiprion lane1]$ rm *-2.fastq.gz

Created a barcodes file: from the sample_data spreadsheet, copied the barcodes column from the barcodes tab and pasted into nano, saved as barcodes in 00_working directory
make directories for Pools

[michelles@amphiprion 00_working]$ mkdir 16Pool 17Pool 18Pool 19Pool 30Pool 31Pool 32Pool 33Pool

Run process_radtags on pools

#!/bin/bash
process_radtags -b barcodes -c -q -r \
--renz_1 pstI --renz_2 mluCI \
-i gzfastq \
--adapter_1 ACACTCTTTCCCTACACGACGCTCTTCCGATCT \
-f ./bcsplit/P016.fastq.gz \
-o ./16Pool

950 mv process.sh16process.sh

951 nohup ./16process.sh

952 less nohup.out

953 nano 16process.sh

954 nohup ./16process.sh

955 less nohup.out

956 rm nohup.out

957 nano 16process.sh

958 ls

959 nohup ./16process.sh

960 cd 00_working/19Pool/

961 cp ../18Pool/18process.sh ./

962 mv 18process.sh19process.sh

963 nano 19process.sh

964 nohup ./19process.sh

965 top

966 cd 00_working/

967 cp ./17Pool/17process.sh ./18Pool/

968 cd 18Pool/

969 ls

970 mv 17process.sh18process.sh

971 nano 18process.sh

972 ls

973 nano 18process.sh

974 nohup ./18process.sh

975 cd 00_working/

977 cp ../19Pool/19process.sh ./

make a new names_barcodes tab delimited file:

from google docs copy names and barcodes into a nano file in the correct pool directory

Rename files and Move into one directory

[michelles@amphiprion 16Pool]$ sh rename.for.dDocent_se_gz 16Pool_names.tsv
[michelles@amphiprion 16Pool]$ cd ../17Pool/
[michelles@amphiprion 17Pool]$ sh rename.for.dDocent_se_gz 17Pool_names.tsv
[michelles@amphiprion 17Pool]$ cd ../18Pool/
[michelles@amphiprion 18Pool]$ sh rename.for.dDocent_se_gz 18Pool_names.tsv
[michelles@amphiprion 18Pool]$ cd ../19Pool/
[michelles@amphiprion 19Pool]$ sh rename.for.dDocent_se_gz 19Pool_names.tsv

[michelles@amphiprion 00_working]$ mv ./16Pool/APCL13_* ./samples/
[michelles@amphiprion 00_working]$ mv ./17Pool/APCL13_* ./samples/
[michelles@amphiprion 00_working]$ mv ./18Pool/APCL13_1* ./samples/
[michelles@amphiprion 00_working]$ mv ./19Pool/APCL14_* ./samples/

[michelles@amphiprion 00_working]$ mv ./16Pool/16* ./logs/
[michelles@amphiprion 00_working]$ mv ./17Pool/17* ./logs/
[michelles@amphiprion 00_working]$ mv ./18Pool/18* ./logs/
[michelles@amphiprion 00_working]$ mv ./19Pool/19* ./logs/

Run ustacks of seq07 and seq08 together

#!/bin/bash
i=70001
for file in ./samples/*.fq.gz
do
ustacks -t gzfastq -p 20 -d -r -m 3 -M 5 --max_locus_stacks 3 -i $i \
-f ${file} -o ./lax-stacks
let "i+=1";
done

change name of nohup.out

mv nohup.out ./logs/seq07_08_ustacks.out

Do not run cstacks again at this time. Could run cstacks with new files to generate a more comprehensive catalog but we can do that in the future. Currently using the catalog that is made up of the samples with the most loci from SEQ03, SEQ04, SEQ05 generated post rxstacks
copy catalog from the lax-rxstacks folder to the lax-stacks folder

[michelles@amphiprion 00_working]$ cp ./lax-rxstacks/batch_1.catalog.* ./lax-stacks/

Run sstacks (rxstacks needs the matches files). Use the rx-catalogs

[michelles@amphiprion 00_working]$ nohup ./sstacks-lax.sh
#!/bin/bash
for file in $(ls -1 ./lax-stacks/*.tags.tsv.gz \
| grep -v catalog | perl -pe 's/\.tags\.tsv.gz//')
do
    sstacks -p 20 -b 1 -c ./lax-stacks/batch_1 \
            -s $file \
            -o ./lax-stacks/

Run rxstacks:

copy catalog from the lax-rxstacks folder to the lax-stacks folder

[michelles@amphiprion 00_working]$ cp ./lax-rxstacks/batch_1.catalog.* ./lax-stacks/

[michelles@amphiprion 00_working]$ nohup ./rxstacks.sh
#!/bin/bash
#Run rxstacks, batch 1, 10 threads - the catalog files have to \
#be in the ./lax-stacks directory, copy them there now if needed.
rxstacks -b 1 -t 25 \
--conf_filter --conf_lim 0.25 \
--model_type bounded --bound_high 0.1 \
--prune_haplo \
--lnl_lim -8.0 \
--lnl_dist \
-P ./lax-stacks \
-o ./lax-rxstacks

change name of nohup.out to 05seq_rxstacks.out
Run sstacks: nohup ./rxsstacks.sh

#!/bin/bash
#Run sstacks
for file in $(ls -1 ./lax-rxstacks/*.tags.tsv \
| grep -v catalog | perl -pe 's/\.tags\.tsv//')
do
    sstacks -p 25 -b 1 -c ./lax-rxstacks/batch_1 \
            -s $file \
            -o ./lax-rxstacks/
done

change name of nohup.out to 05seq_sstacks.out
Move samples to the lax-stacks and rx-stacks directories in philippines/genotyping for all to use
Run populations on all APCL samples (not just this seq batch)- don’t filter this time around (no -m or -r, etc)

#!/bin/bash
# Calculate population statistics and export several output files.
populations -b 1 -P ./lax-rxstacks/ -s -t 15

change name of nohup.out to YYMMDD_populations.out

Create a mysql database

[michelles@amphiprion ~]$ mysql -plarvae168
mysql> create database seq05;
mysql> show databases;
mysql> exit

Apply stacks configuration to database:

[michelles@amphiprion ~]$ mysql -plarvae168 seq05 < ~/local/share/stacks/sql/stacks.sql

Load radtags to database

[michelles@amphiprion rxstacks]$ load_radtags.pl -D seq05 -p ./ -b 1 -c -B

Index database

[michelles@amphiprion rxstacks]$ index_radtags.pl -D seq05 -c -t

Move the matches, snaps, alleles, and tags files into a combined folder to run all populations together - can this be done with the export script? - Can these files be stored in mysql and then extracted just for this run?
Run populations to produce gene pop file - can’t run all of the samples from multiple runs in the same populations - unless you change the “i” in the script to something that will generate a unique ID???

[michelles@amphiprion 01_seq04]$ cd rxstacks/
[michelles@amphiprion rxstacks]$ nohup populations -b 1 -P ./ -t 10 -r 90 -m 30 -s --genepop where m is the coverage and r is the percent of the population. Start at 90/30, try 90/10, 80/30, 80/10, etc until you get 200 loci

In Windows,
Logged in to windows using the larva password
Started Cervus
Click on Tools>Convert Genotype File>Genepop to Cervus
2 digit format
do not use first ID as population name
Converted 192 individuals in one population at 1132 loci
Clicked on Analysis>Allele Frequency Analysis
Choose the cervus file just created
ID in column 2
First allele in column 3
Number of loci 1132 gives error and says there are 1134 loci, so running that
Save as seq04
OK
Run identity analysis to eliminate duplicate individuals
Now run a simulation of parentage analysisClick on analysis>simulation of parentage analysis>sexes of pair unknownChange parent number to Saved as seq04simparChange to LOD
Ran simulated parentage with 2000 parents, 5000 offspring 0.1 proport samples, .86 typed loci, .01 mismatch, LOD. Computer says it will take 2 hours. - computer froze, dropped it down to 1000 parents, 2000 offspring - took 6+ hours to run
Create parent and offspring files: To make an adult/juvenile file
Run parentage analysis in Cervus - make sure the csv’s have unix line endings in komodo (if only 1 offspring runs, this is the problem).
Run
If running colony, need a marker file that contains the marker IDs, if they are dominant or codominant, and gene dropping, mutation rates.