logo

Abstract: CRISPR-Cas ribonucleoproteins are important tools for gene editing in pre-implantation embryos. However, the inefficient production of biallelic deletions in cattle zygotes has hindered mechanistic studies of gene function. In addition, the presence of maternal RNAs that support embryo development until embryonic genome activation may cause confounding phenotypes. Here, we aimed to improve the efficiency of biallelic deletions and deplete specific maternal RNAs in cattle zygotes using CRISPR-Cas editing technology. Two electroporation sessions with Cas9D10A ribonucleoproteins targeting exon 1 and the promoter of OCT4 produced biallelic deletions in 91% of the embryos tested. In most cases, the deletions were longer than 1000 nucleotides long. Electroporation of Cas13a ribonucleoproteins prevents the production of the corresponding proteins. We electroporated Cas9D10A ribonucleoproteins targeting exon 1, including the promoter region, of OCT4 in two sessions with inclusion of Cas13a ribonucleoproteins targeting OCT4 mRNAs in the second session to ablate OCT4 function in cattle embryos. A lack of OCT4 resulted in embryos arresting development prior to blastocyst formation at a greater proportion (7.9%) than controls (30.7%, P<0.001). The few embryos that developed past the morula stage did not form a normal inner cell mass. Transcriptome analysis of single blastocysts, confirmed to lack exon 1 and promoter region of OCT4, revealed a significant (FDR<0.1) reduction in transcript abundance of many genes functionally connected to stemness, including markers of pluripotency (CADHD1, DPPA4, GNL3, RRM2). The results confirm that OCT4 is key regulator of genes that modulate pluripotency and is required to form a functional blastocyst in cattle.

Overview

Code produced by Jada Nix, Gustavo Schettini and Fernando Biase. We created this file to permit reproducibility of the findings described in the paper. Please direct questions to Fernando Biase: fbiase at vt.edu

The raw transcriptome data is deposited on GEO repository under the following access GSE236474.


Please contact Fernando Biase: fbiase at vt.edu if you’d like access to other files used in this code. Updated information may be obtained at www.biaselaboatory.com



library('tidyverse', quietly = TRUE,lib.loc="/usr/lib/R/site-library")
library('stringr', quietly = TRUE,lib.loc="/usr/lib/R/site-library")
library('edgeR',quietly = TRUE, lib.loc="/usr/lib/R/site-library")
library('ggplot2',quietly = TRUE, lib.loc="/usr/lib/R/site-library")
library("DESeq2",quietly = TRUE,lib.loc="/usr/lib/R/site-library")
library("goseq" ,quietly = TRUE, lib.loc="/usr/lib/R/site-library")
library("cowplot",quietly = TRUE, lib.loc="/usr/lib/R/site-library")
library("ComplexHeatmap", quietly = TRUE,lib.loc="/usr/lib/R/site-library")
library("circlize", quietly = TRUE,lib.loc="/usr/lib/R/site-library")
library("data.table",quietly = TRUE , lib.loc="/usr/lib/R/site-library")
library("kableExtra",quietly = TRUE , lib.loc="/usr/lib/R/site-library")
library("multcomp", quietly = TRUE,lib.loc="/usr/lib/R/site-library")
library("aod", quietly = TRUE,lib.loc="/usr/lib/R/site-library")
library("car", quietly = TRUE,lib.loc="/usr/lib/R/site-library")
library("emmeans", quietly = TRUE,lib.loc="/usr/lib/R/site-library")
library("lmtest", quietly = TRUE,lib.loc="/usr/lib/R/site-library")

Procedures

Use the tabs below to access each section of our code.

Analysis of embryo development

Load development one electroporation vs two

day7blast <-readxl::read_excel(paste(path_excel,"2xCas10_day7_development_data.xlsx", sep=""))
day7blast <- day7blast[c(2,3,4)]
day7blast$totalPZ<-day7blast$B + day7blast$NB
day7blast$Group<-as.factor(day7blast$Group)

day8blast <-readxl::read_excel(paste(path_excel,"2xCas10_day8_development_data.xlsx", sep=""))
day8blast <- day8blast[c(2,3,4)]
day8blast$totalPZ<-day8blast$B + day8blast$NB
day8blast$Group<-as.factor(day8blast$Group)

GLM groups day 7

model.log7<-glm(cbind(B,NB) ~ Group, family = binomial(link = "logit"), data = day7blast)
emmeans_day7<-emmeans(model.log7,  ~ Group,regrid = "log", type='response')
#summary(emmeans_day7)$response

GLM groups day 8

model.log8<-glm(cbind(B,NB) ~ Group, family = binomial(link = "logit"), data = day8blast)
emmeans_day8<-emmeans(model.log8,  ~ Group,regrid = "log", type='response')
#summary(emmeans_day8)$response

Table S1 summary GLM

Table_S1<- as.data.table(data.frame( time= c(rep("164-166 hpf",3),rep("188-190 hpf",3)), 
                          group=c("Cas9D10A + targeting sgRNAs", "Cas9D10A + scramble sgRNAs", "Controls not electroporated", "Cas9D10A + targeting sgRNAs", "Cas9D10A + scramble sgRNAs","Controls not electroporated"),
                          PZ_7= c( day7blast %>% group_by(Group) %>% summarize(total_PZ=sum(totalPZ)) %>%  deframe %>% getElement("Cas9D10A 1x") , 
                          day7blast %>% group_by(Group) %>% summarize(total_PZ=sum(totalPZ)) %>%  deframe %>% getElement("Scramble 1x"), 
                          day7blast %>% group_by(Group) %>% summarize(total_PZ=sum(totalPZ)) %>%  deframe %>% getElement("Control") ,
                          day8blast %>% group_by(Group) %>% summarize(total_PZ=sum(totalPZ)) %>%  deframe %>% getElement("Cas9D10A 1x") , 
                          day8blast %>% group_by(Group) %>% summarize(total_PZ=sum(totalPZ)) %>%  deframe %>% getElement("Scramble 1x"), 
                          day8blast %>% group_by(Group) %>% summarize(total_PZ=sum(totalPZ)) %>%  deframe %>% getElement("Control")  ),
                          percent_7= c(round(summary(emmeans_day7)$response[1]*100,1), 
                          round(summary(emmeans_day7)$response[4]*100,1) , 
                          round(summary(emmeans_day7)$response[3]*100,1) ,
                          round(summary(emmeans_day8)$response[1]*100,1) ,
                          round(summary(emmeans_day8)$response[4]*100,1) ,
                          round(summary(emmeans_day8)$response[3]*100,1)  ) ,
                          se_7= c(round(summary(emmeans_day7)$SE[1]*100,1), 
                          round(summary(emmeans_day7)$SE[4]*100,1) , 
                          round(summary(emmeans_day7)$SE[3]*100,1) ,
                          round(summary(emmeans_day8)$SE[1]*100,1) ,
                          round(summary(emmeans_day8)$SE[4]*100,1) ,
                          round(summary(emmeans_day8)$SE[3]*100,1)  ) ,
                          PZ_8= c( day7blast %>% group_by(Group) %>% summarize(total_PZ=sum(totalPZ)) %>%  deframe %>% getElement("Cas9D10A 2x") , 
                          day7blast %>% group_by(Group) %>% summarize(total_PZ=sum(totalPZ)) %>%  deframe %>% getElement("Scramble 2x"), 
                          day7blast %>% group_by(Group) %>% summarize(total_PZ=sum(totalPZ)) %>%  deframe %>% getElement("Control") ,
                          day8blast %>% group_by(Group) %>% summarize(total_PZ=sum(totalPZ)) %>%  deframe %>% getElement("Cas9D10A 2x") , 
                          day8blast %>% group_by(Group) %>% summarize(total_PZ=sum(totalPZ)) %>%  deframe %>% getElement("Scramble 2x"), 
                          day8blast %>% group_by(Group) %>% summarize(total_PZ=sum(totalPZ)) %>%  deframe %>% getElement("Control")  ),
                          percent_8= c(round(summary(emmeans_day7)$response[2]*100,1), 
                          round(summary(emmeans_day7)$response[5]*100,1) , 
                          round(summary(emmeans_day7)$response[3]*100,1) ,
                          round(summary(emmeans_day8)$response[2]*100,1) ,
                          round(summary(emmeans_day8)$response[5]*100,1) ,
                          round(summary(emmeans_day8)$response[3]*100,1)) ,
                          se_8= c(round(summary(emmeans_day7)$SE[2]*100,1), 
                          round(summary(emmeans_day7)$SE[5]*100,1) , 
                          round(summary(emmeans_day7)$SE[3]*100,1) ,
                          round(summary(emmeans_day8)$SE[2]*100,1) ,
                          round(summary(emmeans_day8)$SE[5]*100,1) ,
                          round(summary(emmeans_day8)$SE[3]*100,1)  )
                          ))
                        
kbl(Table_S1, col.names=c("",   "", "N PZ", "%", "SE","N PZ", "%", "SE"), caption = "Table S1. Blastocyst developmental rates for the experiments comparing one versus two electroporation sessions with Cas9D10A and sgRNAs." ) %>% kable_classic() %>% add_header_above(c(" "=2, "single electroporation"=3, "double electroporation"=3 )) 
Table S1. Blastocyst developmental rates for the experiments comparing one versus two electroporation sessions with Cas9D10A and sgRNAs.
single electroporation
double electroporation
N PZ % SE N PZ % SE
164-166 hpf Cas9D10A + targeting sgRNAs 337 6.8 1.4 655 3.2 0.7
164-166 hpf Cas9D10A + scramble sgRNAs 146 17.1 3.1 209 17.7 2.6
164-166 hpf Controls not electroporated 182 25.3 3.2 182 25.3 3.2
188-190 hpf Cas9D10A + targeting sgRNAs 337 11.6 1.7 655 7.9 1.1
188-190 hpf Cas9D10A + scramble sgRNAs 146 31.5 3.8 209 28.2 3.1
188-190 hpf Controls not electroporated 182 30.8 3.4 182 30.8 3.4

Table S2 Contrasts groups day 7

Table_S2<-
as.data.table(
 rbind( data.frame(contrast(emmeans(model.log7,  ~ Group, type='response') , list("Cas9D10A 1x / Scramble 1x"=c(1,0,0,-1,0)),df=15)) ,
       data.frame(contrast(emmeans(model.log7,  ~ Group, type='response') , list("Cas9D10A 1x / Control"=c(1,0,-1,0,0)),df=16) ) ,
       data.frame(contrast(emmeans(model.log7,  ~ Group, type='response') , list("Scramble 1x / Control"=c(0,0,-1,1,0)),df=10) ),
       data.frame(contrast(emmeans(model.log7,  ~ Group, type='response') , list("Cas9D10A 1x / Cas9D10A 2x"=c(1,-1,0,0,0)),df=31)),
       data.frame(contrast(emmeans(model.log7,  ~ Group, type='response') , list("Cas9D10A 2x / Scramble 2x"=c(0,1,0,0,-1)),df=26) ),
       data.frame(contrast(emmeans(model.log7,  ~ Group, type='response') , list("Cas9D10A 2x / Control"=c(0,1,-1,0,0)),df=26) ),
       data.frame(contrast(emmeans(model.log7,  ~ Group, type='response') , list("Scramble 2x / Control"=c(0,0,1,0,-1)),df=11) )
      ) ) 

Table_S2<-cbind(Table_S2,BH=round(p.adjust(c(data.frame(contrast(emmeans(model.log7,  ~ Group, type='response') , list("Cas9D10A 1x / Scramble 1x"=c(1,0,0,-1,0)),df=15))$p.value ,
       data.frame(contrast(emmeans(model.log7,  ~ Group, type='response') , list("Cas9D10A 1x / Control"=c(1,0,-1,0,0)),df=16))$p.value ,
       data.frame(contrast(emmeans(model.log7,  ~ Group, type='response') , list("Scramble 1x / Control"=c(0,0,-1,1,0)),df=10))$p.value,
       data.frame(contrast(emmeans(model.log7,  ~ Group, type='response') , list("Cas9D10A 1x / Cas9D10A 2x"=c(1,-1,0,0,0)),df=31))$p.value,
       data.frame(contrast(emmeans(model.log7,  ~ Group, type='response') , list("Cas9D10A 2x / Scramble 2x"=c(0,1,0,0,-1)),df=26))$p.value,
       data.frame(contrast(emmeans(model.log7,  ~ Group, type='response') , list("Cas9D10A 2x / Control"=c(0,1,-1,0,0)),df=26))$p.value,
       data.frame(contrast(emmeans(model.log7,  ~ Group, type='response') , list("Scramble 2x / Control"=c(0,0,1,0,-1)),df=11))$p.value), method = "BH"),4))


kableExtra::kbl(Table_S2,  caption = "Table S2. Contrasts between experimental groups for blastocyst yield at 164-166 hpf." , digits=4) %>% kable_classic() 
Table S2. Contrasts between experimental groups for blastocyst yield at 164-166 hpf.
contrast odds.ratio SE df null t.ratio p.value BH
Cas9D10A 1x / Scramble 1x 0.3545 0.1092 15 1 -3.3657 0.0042 0.0074
Cas9D10A 1x / Control 0.2166 0.0596 16 1 -5.5584 0.0000 0.0001
Scramble 1x / Control 0.6109 0.1699 10 1 -1.7722 0.1068 0.1068
Cas9D10A 1x / Cas9D10A 2x 2.2114 0.6847 31 1 2.5633 0.0154 0.0216
Cas9D10A 2x / Scramble 2x 0.1540 0.0441 26 1 -6.5322 0.0000 0.0000
Cas9D10A 2x / Control 0.0979 0.0274 26 1 -8.3042 0.0000 0.0000
Scramble 2x / Control 1.5723 0.3913 11 1 1.8185 0.0963 0.1068

Table S3 Contrasts groups day 8

Table_S3<-
as.data.table(
 rbind( data.frame(contrast(emmeans(model.log8,  ~ Group, type='response') , list("Cas9D10A 1x / Scramble 1x"=c(1,0,0,-1,0)),df=15)) ,
       data.frame(contrast(emmeans(model.log8,  ~ Group, type='response') , list("Cas9D10A 1x / Control"=c(1,0,-1,0,0)),df=16) ) ,
       data.frame(contrast(emmeans(model.log8,  ~ Group, type='response') , list("Scramble 1x / Control"=c(0,0,-1,1,0)),df=10) ),
       data.frame(contrast(emmeans(model.log8,  ~ Group, type='response') , list("Cas9D10A 1x / Cas9D10A 2x"=c(1,-1,0,0,0)),df=31)),
       data.frame(contrast(emmeans(model.log8,  ~ Group, type='response') , list("Cas9D10A 2x / Scramble 2x"=c(0,1,0,0,-1)),df=26) ),
       data.frame(contrast(emmeans(model.log8,  ~ Group, type='response') , list("Cas9D10A 2x / Control"=c(0,1,-1,0,0)),df=26) ),
       data.frame(contrast(emmeans(model.log8,  ~ Group, type='response') , list("Scramble 2x / Control"=c(0,0,1,0,-1)),df=11) )
      ) ) 

Table_S3<-cbind(Table_S3,BH=round(p.adjust(c(data.frame(contrast(emmeans(model.log8,  ~ Group, type='response') , list("Cas9D10A 1x / Scramble 1x"=c(1,0,0,-1,0)),df=15))$p.value ,
       data.frame(contrast(emmeans(model.log8,  ~ Group, type='response') , list("Cas9D10A 1x / Control"=c(1,0,-1,0,0)),df=16))$p.value ,
       data.frame(contrast(emmeans(model.log8,  ~ Group, type='response') , list("Scramble 1x / Control"=c(0,0,-1,1,0)),df=10))$p.value,
       data.frame(contrast(emmeans(model.log8,  ~ Group, type='response') , list("Cas9D10A 1x / Cas9D10A 2x"=c(1,-1,0,0,0)),df=31))$p.value,
       data.frame(contrast(emmeans(model.log8,  ~ Group, type='response') , list("Cas9D10A 2x / Scramble 2x"=c(0,1,0,0,-1)),df=26))$p.value,
       data.frame(contrast(emmeans(model.log8,  ~ Group, type='response') , list("Cas9D10A 2x / Control"=c(0,1,-1,0,0)),df=29))$p.value,
       data.frame(contrast(emmeans(model.log8,  ~ Group, type='response') , list("Scramble 2x / Control"=c(0,0,1,0,-1)),df=11))$p.value), method = "BH"),4))

kableExtra::kbl(Table_S3,  caption = "Table S3. Contrasts between experimental groups for blastocyst yield at 188-190  hpf." , digits=4) %>% kable_classic() 
Table S3. Contrasts between experimental groups for blastocyst yield at 188-190 hpf.
contrast odds.ratio SE df null t.ratio p.value BH
Cas9D10A 1x / Scramble 1x 0.2845 0.0701 15 1 -5.1005 0.0001 0.0002
Cas9D10A 1x / Control 0.2945 0.0689 16 1 -5.2232 0.0001 0.0002
Scramble 1x / Control 1.0350 0.2483 10 1 0.1434 0.8888 0.8888
Cas9D10A 1x / Cas9D10A 2x 1.5176 0.3390 31 1 1.8676 0.0713 0.0998
Cas9D10A 2x / Scramble 2x 0.2192 0.0463 26 1 -7.1936 0.0000 0.0000
Cas9D10A 2x / Control 0.1940 0.0419 26 1 -7.5892 0.0000 0.0000
Scramble 2x / Control 1.1299 0.2512 11 1 0.5496 0.5936 0.6925

Load development DART vs controls

day7blast <-readxl::read_excel(paste(path_excel,"DART_day7_development_data_updated.xlsx", sep=""))
day7blast <- day7blast[c(2,3,4)]
day7blast$totalPZ<-day7blast$B + day7blast$NB
day7blast$Group<-as.factor(day7blast$Group)

day8blast <-readxl::read_excel(paste(path_excel,"DART_day8_development_data_updated.xlsx", sep=""))
day8blast <- day8blast[c(2,3,4)]
day8blast$totalPZ<-day8blast$B + day8blast$NB
day8blast$Group<-as.factor(day8blast$Group)

GLM groups day 7

model.log7<-glm(cbind(B,NB) ~ Group, family = binomial(link = "logit"), data = day7blast)
emmeans_day7<-emmeans(model.log7,  ~ Group,regrid = "log", type='response')
#summary(emmeans_day7)$response

GLM groups day 8

model.log8<-glm(cbind(B,NB) ~ Group, family = binomial(link = "logit"), data = day8blast)
emmeans_day8<-emmeans(model.log8,  ~ Group,regrid = "log", type='response')
#summary(emmeans_day8)$response

Table S4 summary GLM

Table_S4<- as.data.table(data.frame( time= c(rep("164-166 hpf",3),rep("188-190 hpf",3)), 
                          group=c("DART + targeting sgRNAs", "DART + scramble gRNAs", "Controls not electroporated", "DART + targeting sgRNAs", "DART + scramble gRNAs","Controls not electroporated"),
                          PZ= c( day7blast %>% group_by(Group) %>% summarize(total_PZ=sum(totalPZ)) %>%  deframe %>% getElement("DART") , 
                          day7blast %>% group_by(Group) %>% summarize(total_PZ=sum(totalPZ)) %>%  deframe %>% getElement("Scramble DART"), 
                          day7blast %>% group_by(Group) %>% summarize(total_PZ=sum(totalPZ)) %>%  deframe %>% getElement("Control") ,
                          day8blast %>% group_by(Group) %>% summarize(total_PZ=sum(totalPZ)) %>%  deframe %>% getElement("DART") , 
                          day8blast %>% group_by(Group) %>% summarize(total_PZ=sum(totalPZ)) %>%  deframe %>% getElement("Scramble DART"), 
                          day8blast %>% group_by(Group) %>% summarize(total_PZ=sum(totalPZ)) %>%  deframe %>% getElement("Control")  ),
                          percent= c(round(summary(emmeans_day7)$response[2]*100,1), 
                          round(summary(emmeans_day7)$response[3]*100,1) , 
                          round(summary(emmeans_day7)$response[1]*100,1) ,
                          round(summary(emmeans_day8)$response[2]*100,1) ,
                          round(summary(emmeans_day8)$response[3]*100,1) ,
                          round(summary(emmeans_day8)$response[1]*100,1)  ) ,
                          se= c(round(summary(emmeans_day7)$SE[2]*100,1), 
                          round(summary(emmeans_day7)$SE[3]*100,1) , 
                          round(summary(emmeans_day7)$SE[1]*100,1) ,
                          round(summary(emmeans_day8)$SE[2]*100,1) ,
                          round(summary(emmeans_day8)$SE[3]*100,1) ,
                          round(summary(emmeans_day8)$SE[1]*100,1)  ) 
                          ))
                        
kbl(Table_S4, col.names=c("",   "", "N PZ", "%", "SE"), caption = "Table S4. Blastocyst developmental rates for experiments with CRISPR-DART targeting OCT4." ) %>% kable_classic() 
Table S4. Blastocyst developmental rates for experiments with CRISPR-DART targeting OCT4.
N PZ % SE
164-166 hpf DART + targeting sgRNAs 830 6.1 0.8
164-166 hpf DART + scramble gRNAs 312 17.0 2.1
164-166 hpf Controls not electroporated 348 23.0 2.3
188-190 hpf DART + targeting sgRNAs 832 13.0 1.2
188-190 hpf DART + scramble gRNAs 312 30.1 2.6
188-190 hpf Controls not electroporated 348 31.6 2.5

Table S5 Contrasts CRISPR DART

Table_S5<-
as.data.table(
 rbind( data.frame(contrast(emmeans(model.log7,  ~ Group, type='response') , list("DART + targeting sgRNAs / DART + scramble gRNAs "=c(0,1,-1)),df=35)) ,
       data.frame(contrast(emmeans(model.log7,  ~ Group, type='response') , list("DART + targeting sgRNAs / Control"=c(-1,1,0)),df=37) ) ,
       data.frame(contrast(emmeans(model.log7,  ~ Group, type='response') , list("DART + scramble gRNAs / Control"=c(-1,0,1)),df=21)),
       data.frame(contrast(emmeans(model.log8,  ~ Group, type='response') , list("DART + targeting sgRNAs / DART + scramble gRNAs"=c(0,1,-1)),df=35)) ,
       data.frame(contrast(emmeans(model.log8,  ~ Group, type='response') , list("DART + targeting sgRNAs / Control"=c(-1,1,0)),df=37) ) ,
       data.frame(contrast(emmeans(model.log8,  ~ Group, type='response') , list("DART + scramble gRNAs / Control"=c(0-1,0,1)),df=21))
       ) )

Table_S5<-cbind(time= c(rep("164-166 hpf",3),rep("188-190 hpf",3)), 
                Table_S5,
                BH=round(p.adjust(c(data.frame(contrast(emmeans(model.log7,  ~ Group, type='response') , list("DART / DART Scramble "=c(0,1,-1)),df=35))$p.value ,
                                    data.frame(contrast(emmeans(model.log7,  ~ Group, type='response') , list("DART / Control"=c(-1,1,0)),df=37))$p.value ,
                                    data.frame(contrast(emmeans(model.log7,  ~ Group, type='response') , list(" DART Scramble / Control"=c(-1,0,1)),df=21))$p.value,
                                    data.frame(contrast(emmeans(model.log8,  ~ Group, type='response') , list("DART / DART Scramble "=c(0,1,-1)),df=35))$p.value ,
                                    data.frame(contrast(emmeans(model.log8,  ~ Group, type='response') , list("DART / Control"=c(-1,1,0)),df=37))$p.value ,
                                    data.frame(contrast(emmeans(model.log8,  ~ Group, type='response') , list(" DART Scramble / Control"=c(0-1,0,1)),df=21))$p.value), method = "BH"),4))

kableExtra::kbl(Table_S5,  caption = "Table S5. Contrasts between experimental groups for blastocyst yield in experiments with CRISPR-DART targeting OCT4." , digits=4) %>% kable_classic() 
Table S5. Contrasts between experimental groups for blastocyst yield in experiments with CRISPR-DART targeting OCT4.
time contrast odds.ratio SE df null t.ratio p.value BH
164-166 hpf DART + targeting sgRNAs / DART + scramble gRNAs 0.3199 0.0668 35 1 -5.4566 0.0000 0.0000
164-166 hpf DART + targeting sgRNAs / Control 0.2193 0.0423 37 1 -7.8746 0.0000 0.0000
164-166 hpf DART + scramble gRNAs / Control 0.6855 0.1353 21 1 -1.9129 0.0695 0.0834
188-190 hpf DART + targeting sgRNAs / DART + scramble gRNAs 0.3460 0.0556 35 1 -6.6000 0.0000 0.0000
188-190 hpf DART + targeting sgRNAs / Control 0.3228 0.0499 37 1 -7.3100 0.0000 0.0000
188-190 hpf DART + scramble gRNAs / Control 0.9329 0.1575 21 1 -0.4110 0.6852 0.6852

Analysis from the time lapse culture

tab <- matrix(c(56, 7   , 29    ,20 ,28 ,13,    10,5), ncol=4, byrow=TRUE)
colnames(tab) <- c("total_n",   "blastocyst",   "morulae",  "8_cell")
rownames(tab) <- c('dart','control')
tab
##         total_n blastocyst morulae 8_cell
## dart         56          7      29     20
## control      28         13      10      5
binom.test(7,56, p=13/28)
## 
##  Exact binomial test
## 
## data:  7 and 56
## number of successes = 7, number of trials = 56, p-value = 1.065e-07
## alternative hypothesis: true probability of success is not equal to 0.4642857
## 95 percent confidence interval:
##  0.0517645 0.2407326
## sample estimates:
## probability of success 
##                  0.125
binom.test(29,56, p=10/28)
## 
##  Exact binomial test
## 
## data:  29 and 56
## number of successes = 29, number of trials = 56, p-value = 0.01692
## alternative hypothesis: true probability of success is not equal to 0.3571429
## 95 percent confidence interval:
##  0.3803158 0.6534451
## sample estimates:
## probability of success 
##              0.5178571
binom.test(20,56, p=5/28)
## 
##  Exact binomial test
## 
## data:  20 and 56
## number of successes = 20, number of trials = 56, p-value = 0.001336
## alternative hypothesis: true probability of success is not equal to 0.1785714
## 95 percent confidence interval:
##  0.2335548 0.4964069
## sample estimates:
## probability of success 
##              0.3571429

RNA-seq analysis

Scripts to process raw files

Alignment, filtering and counting

#!/bin/bash

path_to_index=/genome/hisat2/Bos_taurus.ARS-UCD1.2.105
splice_sites=/genome/annotation/hisat_splice.txt
fastq=/mnt/storage/lab_folder/crispr_sequencing/june_19_2023/fastq
output_alignment=/mnt/storage/lab_folder/crispr_sequencing/june_19_2023/alignment

#Alignment

for sample in $(ls $fastq/*.gz | cut -c 62-82 | sort | uniq); do
read1=$(ls $fastq/*.gz | grep $sample | grep "R1")
read2=$(ls $fastq/*.gz | grep $sample | grep "R2")
echo $sample
echo $read1
echo $read2
cd $output_alignment
/bioinfo/hisat2-2.2.1/hisat2  -p 30 -k 1 -x $path_to_index -1 $read1 -2 $read2 --bowtie2-dp 2 --score-min L,0,-1 --dta --known-splicesite-infile $splice_sites  -S sample_$(echo $read1 | cut -c 71-74)_alignment.sam --summary-file sample_$(echo $read1 | cut -c 71-74)_alignment_summary.txt

done;

#Filtering
samtools sort -@ 30 -O BAM -l 9 -o $(echo $sample | cut -c 66-76)_alignment.bam  $(echo $sample | cut -c 66-76)_alignment.sam
samtools index $(echo $sample | cut -c 66-76)_alignment.bam
samtools view --threads 30 -b -h -F 1796 -o $(echo $sample | cut -c 66-76)_alignment.filtered.bam $(echo $sample | cut -c 66-76)_alignment.bam
samtools sort -@ 30 -O BAM -l 9 -o $(echo $sample | cut -c 66-76)_alignment.filtered.sorted.bam  $(echo $sample | cut -c 66-76)_alignment.filtered.bam

alignment=$(echo $sample | cut -c 66-76)_alignment.filtered.sorted.bam
undup_output=$(echo $sample | cut -c 66-76)_alignment.filtered.sorted.undup.bam

bammarkduplicates  I=$alignment O=$undup_output  markthreads=30 level=9  rmdup=1 index=1 dupindex=0  verbose=0

/bioinfo/samtools-1.17/samtools view -h $(echo $sample | cut -c 66-76)_alignment.filtered.sorted.undup.bam | awk ' $6 ~ /[1-9][3-9][0-9][M]/ || $1 ~ /^@/' | samtools view -bS -o a_$(echo $sample | cut -c 66-76)_alignment.filtered.sorted.undup.bam
/bioinfo/samtools-1.17/samtools index a_$(echo $sample | cut -c 66-76)_alignment.filtered.sorted.undup.bam
/bioinfo/samtools-1.17/samtools view -e '[NM]<=10' -O BAM -o b_$(echo $sample | cut -c 66-76)_alignment.filtered.sorted.undup.bam a_$(echo $sample | cut -c 66-76)_alignment.filtered.sorted.undup.bam
/bioinfo/samtools-1.17/samtools index b_$(echo $sample | cut -c 66-76)_alignment.filtered.sorted.undup.bam

#Counting

path_to_gtf=/genome/annotation/Bos_taurus.ARS-UCD1.2.105.gtf

/bioinfo/subread-2.0.1-source/bin/featureCounts -s 0 -a $path_to_gtf -o /mnt/storage/lab_folder/crispr_sequencing/june_19_2023/counts/b_$(echo $sample | cut -c 66-76).count -F 'GTF' -t 'exon' -g 'gene_id' --ignoreDup -T 30 -p  /mnt/storage/lab_folder/crispr_sequencing/june_19_2023/alignment/b_$(echo $sample | cut -c 66-76)_alignment.filtered.sorted.undup.bam
done;

Create matrix with raw counts

files<-list.files(path_counts, recursive=T, pattern="count", full.names = TRUE)
files<-files[grep("summary", files, invert = TRUE)]
files<-files[grep("/b_sample", files, invert = FALSE)]
#length(files)
count_data<-data.frame(matrix(nrow=27607))
for (n in 1:length(files)) {
  count<-read.delim(files[n], header=TRUE, sep= "\t", stringsAsFactors = FALSE, comment.char= "#")
  count<-count[,c(1,7)]
  count_data<-cbind(count_data,count)
}
rownames(count_data)<-count_data[,2]
count_data<-count_data[,seq(from = 3, to = 49, by = 2)]
colnames(count_data)<- str_remove(str_remove(stringr::str_split_fixed(colnames(count_data), "[.]", 8)[,8], "b_"), "_alignment.filtered.sorted.undup.bam")

Load annotation

annotation.ensembl.symbol<-read.delim(paste(path_resources,"/2023_06_24_annotation.ensembl.symbol.txt.bz2", sep=""), header=TRUE, sep= "\t",row.names=1, stringsAsFactors = FALSE)
gene.length<-read.delim(paste(path_resources,"/2023_06_24_gene.length.txt.bz2", sep=""), header=TRUE, sep= "\t",row.names=1, stringsAsFactors = FALSE)
annotation.GO.biomart<-read.delim(paste(path_resources,"/2023_06_24_annotation.GO.biomart.txt.bz2", sep=""), header=TRUE, sep= "\t",row.names=1, stringsAsFactors = FALSE)
annotation.ensembl.transcript<-read.delim(paste(path_resources,"/2023_06_24_annotation.ensembl.transcript.txt.bz2", sep=""), header=TRUE, sep= "\t",row.names=1, stringsAsFactors = FALSE)

Filter matrix to retain ‘protein_coding’ and ‘lncRNA’

count_data_annotated<-merge(count_data_a,annotation.ensembl.symbol, by.x="row.names", by.y="ensembl_gene_id", all.x=TRUE, all.y=FALSE)
count_data_annotated<-count_data_annotated[count_data_annotated$gene_biotype %in% c('protein_coding', 'lncRNA'),]
count_data_annotated_length<-gene.length[gene.length$ensembl_gene_id %in% count_data_annotated$Row.names, 2]

calculate cpm and filter for lowly expressed genes

rownames(count_data_annotated)<-count_data_annotated$Row.names

cpm<-edgeR::cpm(count_data_annotated[,c(2:13)],normalized.lib.sizes = TRUE,log = FALSE)

keep<-rowSums(cpm>1) >3
cpm_filtered<-cpm[keep,]
genes_expressed<-rownames(cpm_filtered)

cpm_filtered<-cpm_filtered[rownames(cpm_filtered) %in% genes_expressed,]
count_filtered<-count_data_a[rownames(count_data_a) %in% genes_expressed,]

Calculate TPM

gene_length<-gene.length[gene.length$ensembl_gene_id %in% rownames(count_filtered), ]
x <- count_filtered / gene_length$transcript_length
tpm_all_data<- t( t(x) * 1e6 / colSums(x) )

Differential transcript abundance

edgeR

group<-rep(c("dart", "control"),c(5,7))

design<-model.matrix(~ group)

dart_vs_control<-DGEList(count=count_data_annotated[,c(2:13)])
dart_vs_control<-estimateDisp(dart_vs_control,design, robust=TRUE)

dart_vs_control_QLFit <- glmQLFit(dart_vs_control, design,robust=TRUE)
dart_vs_control_QLFit <- glmQLFTest(dart_vs_control_QLFit , coef="groupdart")
dart_vs_control_edgeR_results_QLF<- topTags(dart_vs_control_QLFit, adjust.method = "fdr", n=Inf)$table

DESeq2

group<-data.frame(colnames(count_data_annotated[,c(2:13)]), groups=as.factor(rep(c("dart", "control"),c(5,7))))

dart_vs_control<-DESeqDataSetFromMatrix(countData=count_data_annotated[,c(2:13)],colData=group, design= ~groups)

# Wald method
dart_vs_control_Wald<-DESeq(dart_vs_control, test="Wald")
dart_vs_control_Wald<-results(dart_vs_control_Wald, contrast=c("groups",  "dart", "control"), pAdjustMethod="fdr", tidy=TRUE)
dart_vs_control_Wald<-dart_vs_control_Wald[with(dart_vs_control_Wald, order(pvalue)),]

Merge both results

dart_vs_control_combined_results<-merge(dart_vs_control_edgeR_results_QLF, by.x='row.names', dart_vs_control_Wald, by.y='row')
dart_vs_control_combined_results<-dart_vs_control_combined_results[complete.cases(dart_vs_control_combined_results),]
dim(dart_vs_control_combined_results[(dart_vs_control_combined_results$FDR<0.1) & (dart_vs_control_combined_results$padj<0.1),])
## [1] 125  12

Plots

Fig 1A

dart_vs_control_combined_results_annotated<-merge(dart_vs_control_combined_results,annotation.ensembl.symbol, by.x="Row.names", by.y="ensembl_gene_id", all.x=TRUE, all.y=FALSE)
tpm_all_data_subset<-tpm_all_data[rownames(tpm_all_data) %in% dart_vs_control_combined_results_annotated[(dart_vs_control_combined_results_annotated$FDR<0.1 & dart_vs_control_combined_results_annotated$padj<0.1),]$Row.names , ]

mat_scaled = t(apply(log(tpm_all_data_subset+1), 1, scale))

colnames(mat_scaled)<-colnames(tpm_all_data_subset)

ha = HeatmapAnnotation(type = type, annotation_name_side = "left")

htmp <-Heatmap(mat_scaled,name = "Log2(TPM)\n scaled",
col = colorRamp2(c(-3, 0, 3), c("blue2", "white", "orange2")), 
show_row_names = FALSE, show_column_names = FALSE, 
top_annotation =  HeatmapAnnotation(group = rep(c("OCT4-KO", "control"), c(5,7)) , col=list(group= c( "OCT4-KO"="red", "control"="blue")) ) )

draw(htmp, legend_grouping = "original",merge_legend = TRUE)

Fig 1B

tpm_all_data_subset<-tpm_all_data[rownames(tpm_all_data) %in%  c("ENSBTAG00000025246", "ENSBTAG00000000060", "ENSBTAG00000047871", "ENSBTAG00000046092", "ENSBTAG00000021523", "ENSBTAG00000014549", "ENSBTAG00000016455", "ENSBTAG00000038103","ENSBTAG00000025441", "ENSBTAG00000051907", "ENSBTAG00000008501", "ENSBTAG00000001595"),]


data.graph<-data.frame()
for(i in 1:12){
ID<-rownames(tpm_all_data_subset)[i]
group<-rep(c("OCT4-KO", "control"), c(5,7))
TPM<-tpm_all_data_subset[i,]
data.graph<-rbind(data.frame(ID,group,TPM),data.graph)
}


data.graph$ID <- factor(data.graph$ID, levels = c(  "ENSBTAG00000025246", "ENSBTAG00000008501","ENSBTAG00000000060", "ENSBTAG00000047871", "ENSBTAG00000046092", "ENSBTAG00000021523", "ENSBTAG00000014549", "ENSBTAG00000016455", "ENSBTAG00000038103","ENSBTAG00000025441", "ENSBTAG00000051907",  "ENSBTAG00000001595"), labels = c("ZIC2","FOXD3", "PRDM14", "KLF17" ,"HDAC8" ,"STAT3" ,"GNL3" ,"CACHD1" ,"DPPA4" ,"HSPA1A" ,"CYBA" ,  "MT1E"))


ggplot(data=data.graph, aes(x=group, y=TPM , color=group))+
geom_jitter()+
facet_wrap(~ID, scale="free", nrow=2)+
scale_color_manual(values=c("blue","red"))+
scale_x_discrete(NULL)+
theme_bw()+
theme(
axis.text=element_text(color="black"),
axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
strip.text.x = element_text(face="italic"),
strip.background =element_rect(fill="white"),
legend.position="bottom",
legend.margin=margin(unit(-1,"cm")),
plot.margin=margin(0,0,0,0)
)

sessionInfo()

sessionInfo()
## R version 4.3.1 (2023-06-16)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.6 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] grid      stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] lmtest_0.9-40               zoo_1.8-12                 
##  [3] emmeans_1.8.7               car_3.1-2                  
##  [5] carData_3.0-5               aod_1.3.2                  
##  [7] multcomp_1.4-25             TH.data_1.1-2              
##  [9] MASS_7.3-60                 survival_3.5-5             
## [11] mvtnorm_1.2-2               kableExtra_1.3.4           
## [13] data.table_1.14.8           circlize_0.4.15            
## [15] ComplexHeatmap_2.16.0       cowplot_1.1.1              
## [17] goseq_1.52.0                geneLenDataBase_1.36.0     
## [19] BiasedUrn_2.0.10            DESeq2_1.40.2              
## [21] SummarizedExperiment_1.30.2 Biobase_2.60.0             
## [23] MatrixGenerics_1.12.2       matrixStats_1.0.0          
## [25] GenomicRanges_1.52.0        GenomeInfoDb_1.36.1        
## [27] IRanges_2.34.1              S4Vectors_0.38.1           
## [29] BiocGenerics_0.46.0         edgeR_3.42.4               
## [31] limma_3.56.2                lubridate_1.9.2            
## [33] forcats_1.0.0               stringr_1.5.0              
## [35] dplyr_1.1.2                 purrr_1.0.1                
## [37] readr_2.1.4                 tidyr_1.3.0                
## [39] tibble_3.2.1                ggplot2_3.4.2              
## [41] tidyverse_2.0.0            
## 
## loaded via a namespace (and not attached):
##   [1] RColorBrewer_1.1-3       rstudioapi_0.14          jsonlite_1.8.7          
##   [4] shape_1.4.6              magrittr_2.0.3           magick_2.7.4            
##   [7] estimability_1.4.1       GenomicFeatures_1.52.1   farver_2.1.1            
##  [10] rmarkdown_2.23           GlobalOptions_0.1.2      BiocIO_1.10.0           
##  [13] zlibbioc_1.46.0          vctrs_0.6.3              Cairo_1.6-0             
##  [16] memoise_2.0.1            Rsamtools_2.16.0         RCurl_1.98-1.12         
##  [19] webshot_0.5.5            htmltools_0.5.5          S4Arrays_1.0.4          
##  [22] progress_1.2.2           curl_5.0.1               cellranger_1.1.0        
##  [25] sass_0.4.6               bslib_0.5.0              sandwich_3.0-2          
##  [28] cachem_1.0.8             GenomicAlignments_1.36.0 mime_0.12               
##  [31] lifecycle_1.0.3          iterators_1.0.14         pkgconfig_2.0.3         
##  [34] Matrix_1.5-4.1           R6_2.5.1                 fastmap_1.1.1           
##  [37] GenomeInfoDbData_1.2.10  clue_0.3-64              digest_0.6.32           
##  [40] colorspace_2.1-0         AnnotationDbi_1.62.2     RSQLite_2.3.1           
##  [43] labeling_0.4.2           filelock_1.0.2           fansi_1.0.4             
##  [46] timechange_0.2.0         abind_1.4-5              httr_1.4.6              
##  [49] mgcv_1.8-42              compiler_4.3.1           bit64_4.0.5             
##  [52] withr_2.5.0              doParallel_1.0.17        BiocParallel_1.34.2     
##  [55] DBI_1.1.3                highr_0.10               biomaRt_2.56.1          
##  [58] rappdirs_0.3.3           DelayedArray_0.26.6      rjson_0.2.21            
##  [61] tools_4.3.1              glue_1.6.2               restfulr_0.0.15         
##  [64] nlme_3.1-162             cluster_2.1.4            generics_0.1.3          
##  [67] gtable_0.3.3             tzdb_0.4.0               hms_1.1.3               
##  [70] xml2_1.3.5               utf8_1.2.3               XVector_0.40.0          
##  [73] foreach_1.5.2            pillar_1.9.0             splines_4.3.1           
##  [76] BiocFileCache_2.8.0      lattice_0.21-8           rtracklayer_1.60.0      
##  [79] bit_4.0.5                tidyselect_1.2.0         GO.db_3.17.0            
##  [82] locfit_1.5-9.8           Biostrings_2.68.1        knitr_1.43              
##  [85] svglite_2.1.1            xfun_0.39                statmod_1.5.0           
##  [88] stringi_1.7.12           yaml_2.3.7               evaluate_0.21           
##  [91] codetools_0.2-19         cli_3.6.1                xtable_1.8-4            
##  [94] systemfonts_1.0.4        munsell_0.5.0            jquerylib_0.1.4         
##  [97] readxl_1.4.3             Rcpp_1.0.11              dbplyr_2.3.2            
## [100] coda_0.19-4              png_0.1-8                XML_3.99-0.14           
## [103] parallel_4.3.1           blob_1.2.4               prettyunits_1.1.1       
## [106] bitops_1.0-7             viridisLite_0.4.2        scales_1.2.1            
## [109] crayon_1.5.2             GetoptLong_1.0.5         rlang_1.1.1             
## [112] rvest_1.0.3              KEGGREST_1.40.0