Links to other parts: czech.html, czech2.html, czech4.html.
Uncle John Returns posted the tables below on Twitter and wrote: "Turns out that those Czechs who received a dose of Moderna in 2021 were twice as likely to have a pre-existing chronic condition as those who had Pfizer with AZ higher still. [...] The source is a ministry of health file which has 1 row per dose (so 3.68 GB) with other data such as the age band, sex and place of residence. Available on the vaccinations tab here https://onemocneni-aktualne.mzcr.cz/api/v2/covid-19." [https://x.com/UncleJo46902375/status/1814375808062112215]
The value of the indikace_chronicke_onemocneni
column (indication chronic illness) is always either 1 or empty. I thought that the percentage of people where the value of the "indication chronic illness" column was 1 was surprisingly low, because there's only one age group where the column is true for over 10% of people with a first dose, and the percentage seemed to be fairly high in young age groups relative to old age groups (and for some reason it's about 10 times higher in ages 65-69 than ages 80+):
> download.file("https://onemocneni-aktualne.mzcr.cz/api/v2/covid-19/ockovani-profese.csv","ockovani-profese.csv") # about 4 GiB > pro=fread("ockovani-profese.csv") > d=pro[poradi_davky==1&vekova_skupina!="nezařazeno",.(.N,chronic=sum(indikace_chronicke_onemocneni,na.rm=T)),.(type=vakcina,age=as.numeric(sub("[+-].*","",vekova_skupina)))] > d[,.(chronicpct=round(sum(chronic)/sum(N)*100,1)),age][order(age)]|>print(r=F) age chronicpct 0 1.5 5 0.0 12 0.1 16 0.6 18 1.3 25 1.7 30 1.9 35 2.4 40 3.0 45 3.9 50 5.1 55 6.6 60 8.6 65 11.1 70 5.3 75 2.1 80 1.1
Clare Craig said that the column didn't mean if the person had a chronic illness or not but that it referred to "individuals preferentially vaccinated due to chronic disease", even though she didn't cite any source for why that would be the case. [https://drclarecraig.substack.com/p/double-checking-the-claims-about] That might explain why the total percentage of people who have the indication of a chronic illness is so low.
On this page for Czech vaccination statistics, there's 285,232 vaccinated people who have the "indication" of chronic illness (which is identical to the number of rows in the ockovani-profese.csv
file where the indikace_chronicke_onemocneni
column is true and the dose number is 1): https://ockovani.opendatalab.cz/statistiky_ockovani#ockovani-indikace. But there are also 273,374 vaccinated people who are in the "priority group" for a chronic illness. I don't know how the priority group is different from the indication group. A tooltip for the indication of chronic illness included the following text in Czech: "Chronically ill (hemato-oncological disease, oncological disease (solid tumors), serious acute or long-term heart disease, serious long-term lung disease, diabetes mellitus, obesity, serious long-term kidney disease, serious long-term liver disease, status after transplantation or on the waiting list, hypertension, seriousltip neurological or neuromuscular disease, congenital or acquired cognitive deficit, rare genetic disease, severe weakening of the immune system, other serious diseases)." But the tooltip for the priority group of chronic illness simply said "A person with a chronic illness".
One explanation for why there's so few people who have the indication of chronic illness might be if they only include serious chronic illnesses. But it wouldn't explain why the percentage of people with an indication for chronic illness is much lower in ages 80+ than in ages 65-69. And the list of chronic illnesses in the tooltip also included hypertension, which alone should have a prevalence of more than the total percentage of people with the "indication" of chronic illness.
I found this PDF file which describes the "infectious disease information system" that is used in the Czech Republic ("Informační systém infekční nemoci", ISIN): https://vladci.cz/archive/covid/106/UZIS_2022-02_Struktura_NZIS_106.pdf. The PDF includes a list of fields which are contained in a database for COVID-19 cases, which also includes a module for COVID vaccinations. Here's a translation of the complete list of the so-called indications that were included in the vaccination module:
Indications (may change with regard to indications in the Central Reservation System - CRS):
- Professional
- Health workers of the ARO Department, ICU
- Health workers Emergency reception
- Ambulance
- Medical staff of the Infectious Diseases Department
- Medical staff of the Pulmonary Department
- Employees of public health authorities conducting epidemiological investigations
- Laboratory workers processing biological samples for examination on Covid-19
- Workers and clients in social services
- General practitioners, general practitioners for children and adolescents, dentists, pharmacists
- Workers of critical infrastructure - integrated rescue system, energy workers, government, crisis teams
- Other workers of public health protection authorities
- Other healthcare workers
- Employees of the Ministry of Defense
- Teaching staff
- Other workers in education
- Security forces
- Persons involved in the provision of health services
- THP workers in hospitals
- Caregivers (a person caring for a person in the III or IV degree dependencies)
- University academic staff
- Riskiness
- Hemato-oncological disease
- Oncological disease (solid tumors)
- Severe acute or long-term heart disease
- Serious long-term lung disease
- Diabetes mellitus
- Obesity
- Other serious illness
- Severe long-term kidney disease
- Severe long-term liver disease
- Status after transplant or on the waiting list
- Hypertension
- Severe neurological or neuromuscular disease
- Congenital or acquired cognitive deficit
- A rare genetic disease
- Severe weakening of the immune system
- Age
- People aged 80+
- People aged 70-79
- People aged 65-69
- People aged 60-64
- People aged 55-59
- People aged 50-54
- People aged 45-49
- People aged 40-44
- People aged 35-39
- People aged 30-34
- People aged 25-29
- People aged 20-24
- People aged 16-19
- People aged 12-15
- Other
In the file ockovani-registrace.csv
which contains one row for each vaccinated person, there's a column called povolani
(profession) which has the following values:
Count | Translation | Czech |
---|---|---|
6037598 | Based on age | Na základě dosaženého věku |
912596 | not specified | neuvedeno |
305888 | A person with a chronic illness | Osoba s chronickým onemocněním |
275259 | Teaching staff/non-teaching staff | Pedagogický pracovník/nepedagogický zaměstnanec |
79837 | Critical infrastructure | Kritická infrastruktura |
70863 | Healthcare worker according to §76 and §77 of Act 372/2011 Coll. | Zdravotnický pracovník dle §76 a §77 zákona 372/2011 Sb. |
39777 | Self-paying | Samoplátce |
30231 | A person with a chronic disease - in the care of a specialized center | Osoba s chronickým onemocněním - v péči specializovaného centra |
26026 | Employees of the Ministry of Defense | Zaměstnanci Ministerstva obrany |
25530 | University academic worker | Akademický pracovník VŠ |
21521 | Worker in social services | Pracovník v sociálních službách |
5486 | A person caring for a person in 3rd or 4th degree of dependence | Osoba pečující o osobu v III. nebo IV. stupni závislosti |
1813 |
Non-health care workers involved in the provision of health care and care for COVID-19 positive persons |
Nezdravotničtí pracovníci podílející se na poskytování zdravotní péče a péče o covid-19 pozitivní osoby |
1602 | THP workers in hospitals | THP pracovníci v nemocnicích |
The values of the column might indicate the criteria based on which a person qualified to be vaccinated. By far the most common value of the column was by age. So if for example a person who was chronically ill qualified to be vaccinated because of their age, maybe only the qualification for age but not chronic illness was listed in the column, since each person has only one value listed in the column. So if a similar system was used in the ockoni-profese.csv
file, it might explain why there's such a low percentage of people who are indicated to have a chronic illness in the oldest age groups.
In the table above the number of people with a chronic illness is 305,888 along with 30,231 people with a chronic illness who were in the care of a specialized center, so it's a bit higher than the file ockoni-profese.csv
where there's rows for 285,232 first doses where the indikace_chronicke_onemocneni
column is true (even though maybe there would be more unique people where the column would be true for at least one dose, but it's not possible to tell because the file doesn't have a way to identify which doses belong to the same person).
But anyway, in the following code I calculated an age-normalized proportion of people whose "indication chronic illness" column was true, so that my baseline was the proportion of the column by age group among all people who are included in the CSV file. The excess proportion of the column was about -26% for the first dose type Comirnaty but about 55% for SPIKEVAX:
> a=d[,.(N=sum(N),chronic=sum(chronic)),.(age,type)] > a=merge(a,d[,.(base=sum(chronic)/sum(N)),age])[,base:=base*N] > a[,.(pct=round((sum(chronic)/sum(base)-1)*100),N=sum(N)),type][order(-N)]|>print(r=F) type pct N Comirnaty -26 5527776 # Pfizer SPIKEVAX 55 526448 # Moderna VAXZEVRIA 252 447428 # AstraZeneca COVID-19 Vaccine Janssen -35 412320 Comirnaty 5-11 31 59001 Nuvaxovid -91 5424 # Novavax Comirnaty Omicron XBB.1.5. 6 1988 Comirnaty Original/Omicron BA.4/BA.5 -33 669 Comirnaty Original/Omicron BA.1 77 294 Sinovac -72 181 Covishield -100 117 Comirnaty 6m-4 -100 111 Comirnaty Omicron XBB.1.5. 5-11 -31 102 Sinopharm -100 38 COMIRNATY OMICRON XBB.1.5 6m-4 150 29 Covovax -33 29 Spikevax bivalent Original/Omicron BA.1 -100 15 Sputnik V -100 10 Nuvaxovid XBB 1.5 -100 8 COVAXIN -100 5 SPIKEVAX BIVALENT ORIGINAL/OMICRON BA.4-5 310 5 Valneva -100 2 type pct N
The next plot shows the age-normalized proportion of the column by vaccine type and month of vaccination. I was expecting the late vaccinees who got the first dose in late 2021 to have a higher age-normalized proportion of the "indication chronic illness" column than people who got vaccinated during the rollout peak, but actually I got the opposite result. However it might be if the indication for a chronic illness was generally not entered for people who qualified to be vaccinated based on other criteria such as age.
# download.file("https://onemocneni-aktualne.mzcr.cz/api/v2/covid-19/ockovani-profese.csv","ockovani-profese.csv") library(data.table) t=fread("ockovani-profese.csv") yemo=\(x){u=unique(x);format(u,"%Y-%m")[match(x,u)]} name1=unique(t$vakcina);name2=rep("Other",length(name1)) name2[name1==""]="" name2[grep("comirnaty",ignore.case=T,name1)]="Pfizer" name2[grep("spikevax",ignore.case=T,name1)]="Moderna" name2[grep("nuvaxovid",ignore.case=T,name1)]="Novavax" name2[name1=="COVID-19 Vaccine Janssen"]="Janssen" name2[name1=="VAXZEVRIA"]="AstraZeneca" t$type=name2[match(t$vakcina,name1)] d=t[year(datum)<=2022&poradi_davky==1,.(.N,chronic=sum(indikace_chronicke_onemocneni,na.rm=T)),.( month=yemo(datum),type,age=vekova_skupina)][age!="nezařazeno"] a=d[,.(N=sum(N),chronic=sum(chronic)),.(age,type,month)] a=merge(a,d[,.(base=sum(chronic)/sum(N)),age])[,base:=base*N] a=a[,.(chronic=sum(chronic),N=sum(N),base=sum(base)),.(type,month)] a[,type:=factor(type,names(sort(a[,tapply(N,type,sum)],T)))] a=rbind(a,a[,.(chronic=sum(chronic),N=sum(N),base=sum(base),month="Total"),type]) a=rbind(a,a[,.(chronic=sum(chronic),N=sum(N),base=sum(base),type="Total"),month]) m1=a[,tapply(chronic,.(type,month),c)] m2=a[,tapply(base,.(type,month),c)] pop=a[,xtabs(N~type+month)] m=(m1-m2)/ifelse(m1>m2,m2,m1)*100 disp=round((m1/m2-1)*100) exp=.82;maxcolor=500 m[m==-Inf]=-maxcolor pheatmap::pheatmap(abs(m)^exp*sign(m),filename="i1.png",display_numbers=disp, gaps_row=nrow(m)-1,gaps_col=ncol(m)-1, cluster_rows=F,cluster_cols=F,legend=F,cellwidth=19,cellheight=19,fontsize=9,fontsize_number=8, border_color=NA,na_col="gray90", number_color=ifelse((abs(m)^exp>.55*maxcolor^exp)&!is.na(m),"white","black"), breaks=seq(-maxcolor^exp,maxcolor^exp,,256), colorRampPalette(hsv(rep(c(7/12,0),5:4),c(.9,.75,.6,.3,0,.3,.6,.75,.9),c(.4,.65,1,1,1,1,1,.65,.4)))(256)) kimi=\(x){e=floor(log10(ifelse(x==0,1,abs(x))));e2=pmax(e,0)%/%3+1;p=!is.na(x) x[p]=paste0(sprintf(paste0("%.",ifelse(e[p]%%3==0,1,0),"f"),x[p]/1e3^(e2[p]-1)),c("","k","M","B","T")[e2[p]]);x} exp2=.6;maxcolor2=max(pop[-nrow(pop),-ncol(pop)]) pheatmap::pheatmap(pop^exp2,filename="i2.png",display_numbers=ifelse(pop<=10,round(pop),kimi(pop)), gaps_row=nrow(m)-1,gaps_col=ncol(m)-1, cluster_rows=F,cluster_cols=F,legend=F,cellwidth=19,cellheight=19,fontsize=9,fontsize_number=8, border_color=NA,na_col="white", number_color=ifelse(pop^exp2>maxcolor2^exp2*.45,"white","black"), breaks=seq(0,maxcolor2^exp2,,256), sapply(seq(1,0,,256),\(i)rgb(i,i,i))) cap="\u00a0 Czech MoH data: Excess percentage of people indicated to have a chronic illness out of all people who have a first dose listed. The x-axis shows the month of vaccination. The baseline is the proportion of people who were indicated to have a chronic illness by age group among all people in the CSV file who have a first dose listed in 2022 or earlier. The direct method of age standardization was used, so that for each cell, the number of people in each age group was multiplied by the proportion of people in the same age group who were indicated to have a chronic illness in the whole CSV file, and the results for each age group were added together to get the total expected number of people who were indicated to have a chronic illness for the cell. Source: onemocneni-aktualne.mzcr.cz/api/v2/covid-19, tab for vaccines, CSV file translated as \"COVID-19: Overview of reported vaccinations by profession (place of vaccination, residence of the vaccinated person)\", column \"indikace_chronicke_onemocneni\"." system(paste0("w=`identify -format %w i1.png`;pad=20;magick -pointsize 40 -font Arial \\( -size $[w-pad*3]x caption:'",cap,"' -splice $[pad]x20 \\) \\( i1.png -crop -0+8 \\) \\( -size $[w-pad*3]x caption:'Number of people who received a first dose each month' -splice $[pad]x0 \\) \\( i2.png -crop -0+8 \\) -append 1.png")) system(paste0("w=`identify -format %w i1.png`;pad=20;magick -pointsize 40 -font Arial \\( -size $[w-pad*3]x caption:'",cap,"' -splice $[pad]x20 \\) \\( i1.png -crop -0+8 \\) \\( -size $[w-pad*3]x caption:'Number of people who received a first dose each month' -splice $[pad]x0 \\) \\( i2.png -crop -0+8 \\) -append 1.png"))
This shows the percentage of people by age group and vaccine type whose "indication chronic illness" column was true (where I again included only first doses):
> t=fread("ockovani-profese.csv") > name1=unique(t$vakcina);name2=rep("Other",length(name1)) > name2[name1==""]="" > name2[grep("comirnaty",ignore.case=T,name1)]="Pfizer" > name2[grep("spikevax",ignore.case=T,name1)]="Moderna" > name2[grep("nuvaxovid",ignore.case=T,name1)]="Novavax" > name2[name1=="COVID-19 Vaccine Janssen"]="Janssen" > name2[name1=="VAXZEVRIA"]="AstraZeneca" > t$type=name2[match(t$vakcina,name1)] > a=t[poradi_davky==1&vekova_skupina!="nezařazeno",.(.N,chronic=sum(indikace_chronicke_onemocneni,na.rm=T)),.(type,age=as.integer(sub("[-+].*","",vekova_skupina)))] > a[,age:=factor(age,unique(age)[order(as.numeric(sub("[-+].*","",unique(age))))])] > a[,type:=factor(type,a[,sum(N),type][order(-V1)]$type)] > a=rbind(a,a[,.(N=sum(N),chronic=sum(chronic),age="Total"),type]) > a=rbind(a,a[,.(N=sum(N),chronic=sum(chronic),type="Total"),age]) > a[,tapply(round(chronic/N*100),.(type,age),c)] 0 5 12 16 18 25 30 35 40 45 50 55 60 65 70 75 80 Total Pfizer 1 0 0 1 1 1 2 2 2 3 4 5 6 7 3 1 1 3 Moderna NA NA 0 1 3 4 4 5 5 7 8 10 12 16 7 3 2 7 AstraZeneca NA NA NA 14 17 16 19 20 21 25 28 31 34 39 12 4 3 17 Janssen NA NA 0 0 1 1 1 1 2 3 3 4 5 5 5 5 4 3 Novavax NA NA 0 0 0 0 0 0 0 0 1 0 1 2 0 1 2 0 Other NA 0 NA 0 0 0 0 0 0 3 0 8 0 4 0 0 0 1 Total 1 0 0 1 1 2 2 2 3 4 5 7 9 11 5 2 1 4
This shows third doses instead of first doses:
0 5 12 16 18 25 30 35 40 45 50 55 60 65 70 75 80 Total Pfizer 0 0 0 1 2 2 3 3 4 5 6 7 9 11 5 2 1 5 Moderna NA NA 0 2 3 4 4 5 5 7 8 10 13 17 8 3 2 7 AstraZeneca NA NA NA NA 0 0 0 12 9 9 16 33 28 30 3 6 5 12 Janssen NA NA NA NA 2 4 4 4 5 5 6 9 7 9 4 3 3 5 Novavax NA NA 0 100 7 0 0 3 0 0 6 0 0 6 5 0 12 3 Other NA NA NA NA NA 0 0 0 0 0 0 0 0 0 0 0 NA 0 Total 0 0 0 1 2 3 3 3 4 5 6 8 9 12 6 2 1 5
In both of the tables above Novavax got a lower percentage than Pfizer, which might partially explain people with a Novavax vaccine have such low mortality.
When I looked at third doses instead of first doses and I aggregated Omicron and pediatric vaccine types together with the main vaccine type, my excess age-normalized proportion of the "indication chronic illness" column was about -6% for Pfizer but 38% for Moderna:
> a=t[,.(N=sum(N),chronic=sum(chronic)),.(age,type)] > a=merge(a,d[,.(base=sum(chronic)/sum(N)),age])[,base:=base*N] > a[,.(pct=round((sum(chronic)/sum(base)-1)*100),N=sum(N)),type][order(-N)]|>print(r=F) type pct N Pfizer -6 3805345 Moderna 38 569083 Janssen -5 2463 AstraZeneca 134 433 Novavax -48 318 Other -100 28
When I calculated a Moderna-Pfizer ratio for an age-normalized proportion of the chronic illness column grouped by the month of the first dose, the ratio remained at a fairly steady level of about 5 to 6 for people who were vaccinated in May to October 2021, but the ratio suddenly dropped to about 2.3 the next month and it remained around 1 to 1.5 for the next 4 months. So that might also partially explain why the Moderna-Pfizer mortality ratio dropped to around 1.0 in people who were vaccinated in the fourth quarter of 2021 (even though the drop in the mortality ratio was already in October and not November):
> a=d[,.(N=sum(N),chronic=sum(chronic)),.(age,type,month)] > a=merge(a,d[,.(base=sum(chronic)/sum(N)),age])[,base:=base*N] > a=a[,.(rat=sum(chronic)/sum(base)),.(type,month)] > m=a[,tapply(rat,.(type,month),c)] > round(m["Moderna",]/m["Pfizer",],2) 2020-12 2021-01 2021-02 2021-03 2021-04 2021-05 2021-06 2021-07 2021-08 2021-09 NA 0.62 0.61 2.65 1.74 2.87 5.68 5.03 6.86 5.84 2021-10 2021-11 2021-12 2022-01 2022-02 2022-03 2022-04 2022-05 2022-06 2022-07 5.44 2.32 1.51 1.12 1.45 1.40 0.43 0.67 1.35 0.00 2022-08 2022-09 2022-10 2022-11 2022-12 2023-01 2023-02 2023-03 2023-04 2023-05 1.26 1.47 1.51 0.00 0.00 3.74 0.00 0.00 0.00 NaN 2023-06 2023-07 2023-08 2023-09 2023-10 2023-11 2023-12 2024-01 2024-02 2024-03 NA NA NA 0.00 0.00 0.00 0.00 0.00 NA NA 2024-04 2024-05 2024-06 2024-07 NA NA NaN NA
This shows the age-normalized chronic illness ratio compared to the age-normalized mortality ratio:
download.file("https://onemocneni-aktualne.mzcr.cz/api/v2/covid-19/ockovani-profese.csv","ockovani-profese.csv") download.file("http://sars2.net/f/czbucketskeep.csv.gz","czbucketskeep.csv.gz") library(data.table);library(ggplot2) yemo=\(x){u=unique(x);format(u,"%Y-%m")[match(x,u)]} kim=\(x)ifelse(x>=1e3,ifelse(x>=1e6,paste0(x/1e6,"M"),paste0(x/1e3,"k")),x) t=fread("ockovani-profese.csv") d=t[poradi_davky==1,.(.N,chronic=sum(indikace_chronicke_onemocneni,na.rm=T)),.(type=vakcina,age=vekova_skupina,month=yemo(datum))][age!="nezařazeno"] name1=unique(d$type);name2=ifelse(name1=="","","Other") name2[grep("comirnaty",ignore.case=T,name1)]="Pfizer" name2[grep("spikevax",ignore.case=T,name1)]="Moderna" name2[grep("nuvaxovid",ignore.case=T,name1)]="Novavax" name2[name1=="COVID-19 Vaccine Janssen"]="Janssen" name2[name1=="VAXZEVRIA"]="AstraZeneca" d$type=name2[match(d$type,name1)] a=d[,.(N=sum(N),chronic=sum(chronic)),.(age,type,month)] a=merge(a,d[,.(base=sum(chronic)/sum(N)),age])[,base:=base*N] a=a[,.(chronic=sum(chronic)/sum(base),N=sum(N)),.(type,month)] p=dcast(a,month~type,value.var="chronic")[,.(month,chronic=Moderna/Pfizer)] p=cbind(p,dcast(a,month~type,value.var="N")[,.(Pfizer,Moderna)]) b=fread("czbucketskeep.csv.gz")[,age:=pmax(pmin(95,age%/%5*5),15)] me=merge(b[dose==1],b[dose<=1,.(base=sum(dead)/sum(alive)),age])[,base:=base*alive] me=me[,.(rate=sum(dead)/sum(base)),.(month=vaxmonth,type)] p=merge(p,dcast(me,month~type,value.var="rate")[,.(month,mort=Moderna/Pfizer)]) p1=data.frame(month=p$month,y=c(p$mort,p$chronic),type=rep(c("Mortality ratio","Chronic illness ratio"),each=nrow(p))) p2=data.frame(month=p$month,y=c(p$Pfizer,p$Moderna),type=rep(c("Pfizer","Moderna"),each=nrow(p))) p1$type=factor(p1$type,unique(p1$type)) p2$type=factor(p2$type,unique(p2$type)) cand=c(sapply(c(1,2,5),\(x)x*10^c(-10:10))) ymax=max(p$chronic,p$mort,na.rm=T) ystep=cand[which.min(abs(cand-ymax/6))] yend=ystep*ceiling(ymax/ystep) ymax2=max(p2$y,na.rm=T) ystep2=cand[which.min(abs(cand-ymax2/6))] yend2=ystep2*ceiling(ymax2/ystep2) secmult=yend/yend2 pal1=c("black","gray60") pal2=alpha(hsv(c(22,0)/36,1,.8),.5) exp=.7;xstart=1-exp;xend=length(p$month)+exp ggplot(p,aes(x=month))+ geom_vline(xintercept=c(xstart,xend),linewidth=.3,lineend="square")+ geom_hline(yintercept=1,linewidth=.3,linetype=2,lineend="square")+ geom_bar(data=p2,aes(y=y*secmult,fill=type),stat="identity",position="dodge",linewidth=.15)+ geom_line(data=na.omit(p1),aes(y=y,color=type,group=type),linewidth=.4)+ labs(title="Age-normalized Moderna-Pfizer ratio by month of vaccination",x=NULL,y="Moderna-Pfizer ratio")+ scale_x_discrete(expand=expansion(mult=0,add=exp))+ scale_y_continuous(limits=c(0,yend),breaks=seq(0,yend,ystep),expand=expansion(0),labels=\(x)ifelse(x>=1e3,paste0(x/1e3,"k"),x),sec.axis=sec_axis(trans=~./secmult,breaks=seq(0,yend2,ystep2),name="Monthly new doses",labels=kim))+ scale_color_manual(values=pal1)+ scale_fill_manual(values=pal2)+ coord_cartesian(clip="off")+ theme(axis.text=element_text(size=7,color="black"), axis.text.x=element_text(angle=90,vjust=.5,hjust=1,margin=margin(2.5)), axis.ticks=element_line(linewidth=.3), axis.ticks.length=unit(3,"pt"), axis.ticks.length.x=unit(0,"pt"), axis.title=element_text(size=7,color="black"), axis.title.y.left=element_text(margin=margin(,3,,1)), axis.title.y.right=element_text(margin=margin(,,,4)), legend.direction="horizontal", legend.background=element_rect(fill=alpha("white",0)), legend.box.spacing=unit(0,"pt"), legend.key.height=unit(4,"pt"), legend.key.width=unit(15,"pt"), legend.key=element_rect(fill=alpha("white",0)), legend.margin=margin(0,2,4,2), legend.position="top", legend.spacing.x=unit(2,"pt"), legend.text=element_text(size=7), legend.title=element_blank(), panel.grid.major=element_blank(), panel.background=element_rect(fill="white"), plot.margin=margin(4,4,4,4), plot.title=element_text(size=7.5,margin=margin(1,0,4.5,0),face=2)) ggsave("0.png",width=4,height=2.2,dpi=300*4) system("magick 0.png -resize 25% 1.png")
This also shows the Moderna-Pfizer ratio for chronic illness by age group and month of vaccination:
The second file from the bottom under vaccinations here has data for hospitalizations by vaccine type and age group: https://onemocneni-aktualne.mzcr.cz/api/v2/covid-19.
The file has about 7 million rows with one row for each vaccinated person. It has these columns:
kraj_bydliste_nazev name of region of resicence kraj_bydliste_kod code of region of residence orp_bydliste_nazev name of county of residence orp_bydliste_kod code of county of residence vekova_kategorie age category ockovaci_latka_nazev vaccine name ockovaci_latka_kod vaccine code prvni_davka date of first dose druha_davka date of second dose dokoncene_ockovani date when completed primary course posilujici_davka date of booster dose indikace_pred_ockovanim type of test before vaccination pozivita_pred_ockovanim positivity of test before vaccination pozivita_pred_ockovanim_datum date of positive test before vaccination hospitalizace_pred_ockovanim hospitalized before vaccination jip_pred_ockovanim ICU before vaccination indikace_po_ockovani type of test after vaccination pozivita_po_ockovani positivivity of test after vaccination pozivita_po_ockovani_datum date of positive test after vaccination hospitalizace_po_ockovani hospitalized after vaccination jip_po_ockovani ICU after vaccination
The types of test are diagnostic, preventative, epidemiological, or other (ostatní). Negative tests are not listed so the type of the test is only shown for positive tests. There's only up to one test before vaccination and one test after vaccination listed for each person, and similarly there's only up to one hospitalization before vaccination and one hospitalization after vaccination listed for each person.
I used the file to calculate an age-normalized hospitalization rate from the day of the first dose up to the present. I didn't account for aging of people over time, so I treated the age at vaccination as the age of each person. The file included data for hospitalizations up the the present month, so I used the date when I downloaded the file as the end of my observation period.
In order to calculate the expected number of deaths for each vaccine type, I multiplied the number of person-days the vaccine type had for each age group with the mortality rate of each age group among all people who had a first dose listed in the file.
My excess percentage of hospitalizations was about -6% for the first dose type "Comirnaty" and about -4% for the first dose type "SPIKEVAX", so there was not much difference, but AstraZeneca and particularly Janssen vaccines had a much higher excess hospitalization rate:
> download.file("https://onemocneni-aktualne.mzcr.cz/api/v2/covid-19/ockovani-pozitivni-hospitalizovani.csv","ockovani-pozitivni-hospitalizovani.csv") > h=fread("ockovani-pozitivni-hospitalizovani.csv") > d=h[,.(age=vekova_kategorie,date=prvni_davka,type=ockovaci_latka_nazev,hosp=hospitalizace_po_ockovani)][age!="-"] > maxdate=as.IDate("2024-7-20") > a=d[,.(hosp=sum(hosp,na.rm=T),pop=sum(as.double(maxdate-date+1))),.(type,age)] > a=merge(a,a[,.(base=sum(hosp)/sum(pop)),age])[,base:=base*pop] > a[,.(excesspct=round((sum(hosp)/sum(base)-1)*100),hospitalizations=sum(hosp),pyears=round(sum(pop/365))),type][order(-pyears)]|>print(r=F) type excesspct hospitalizations pyears Comirnaty -6 43053 17432607 # Pfizer SPIKEVAX -4 6234 1657122 # Moderna VAXZEVRIA 19 11921 1478098 # AstraZeneca COVID-19 Vaccine Janssen 57 3576 1185707 Nuvaxovid -50 7 12140 # Novavax Comirnaty Omicron XBB.1.5. -10 7 1349 Comirnaty Original/Omicron BA.4/BA.5 -100 0 1015 Sinovac -100 0 543 Comirnaty Original/Omicron BA.1 -100 0 454 Covishield -100 0 372 Comirnaty 6m-4 -100 0 143 Sinopharm -100 0 113 Covovax -100 0 87 Comirnaty Omicron XBB.1.5. 5-11 -100 0 62 Sputnik V -100 0 30 Spikevax bivalent Original/Omicron BA.1 -100 0 22 COVAXIN -100 0 16 COMIRNATY OMICRON XBB.1.5 6m-4 -100 0 15 SPIKEVAX BIVALENT ORIGINAL/OMICRON BA.4-5 -100 0 4 Valneva -100 0 4 Nuvaxovid XBB 1.5 -100 0 2 type excesspct hospitalizations pyears
In the code below I additionally normalized the hospitalization rate for the month of vaccination. I now only included first doses given in 2021, because otherwise I would've been more likely to get anomalously high baseline mortality rates for combinations of vaccination month and age group that had a low number of person-days. However now Moderna got a slightly lower excess hospitalization percent than Pfizer which was the opposite of my previous calculation. And now the excess hospitalization percent of Janssen vaccines approximately halved from 57% to 28%, because late vaccinees had a higher hospitalization rate than early vaccinees and late vaccinees were overrepresented for Janssen vaccines:
> yemo=\(x){u=unique(x);format(u,"%Y-%m")[match(x,u)]} > d2=d[date<="2021-12-31"][,month:=yemo(date)] > a=d2[,.(hosp=sum(hosp,na.rm=T),pop=sum(as.double(maxdate-date+1))),.(type,age,month)] > a=merge(a,a[,.(base=sum(hosp)/sum(pop)),.(month,age)])[,base:=base*pop] > a=a[,.(excesspct=round((sum(hosp)/sum(base)-1)*100),hospitalizations=sum(hosp),pyears=round(sum(pop/365))),type] > a[order(-pyears)]|>print(r=F) type excesspct hospitalizations pyears Comirnaty -5 42775 17127673 SPIKEVAX -6 6210 1633228 VAXZEVRIA 21 11921 1478067 COVID-19 Vaccine Janssen 28 3562 1168387 Sinovac -100 0 539 Covishield -100 0 372 Sinopharm -100 0 109 Covovax -100 0 85 Sputnik V -100 0 28 COVAXIN -100 0 16
The next plot shows the excess hospitalization percent by age and the month of the first dose. There's a diagonal pattern across different age groups where there's a low excess hospitalization percent during the same month when the number of first doses peaked in the age group:
The next plot shows the excess hospitalization rate by vaccine type instead of age group. Janssen and AstraZeneca got a much higher total hospitalization rate than Pfizer and Moderna. I didn't normalize the total excess mortality by the month of vaccination so Janssen again got a very high total percentage of excess hospitalizations (because the late vaccinees who had a high hospitalization rate were overrepresented for Janssen vaccines):
Here's code for making the heatmaps:
library(data.table) h=fread("ockovani-pozitivni-hospitalizovani.csv") d=h[,.(age=vekova_kategorie,date=prvni_davka,type=ockovaci_latka_nazev,hosp=hospitalizace_po_ockovani)] d=d[age!="-"&date<"2022-01-01"] u=unique(d$date);d$month=format(u,"%Y-%m")[match(d$date,u)] a=d[,.(.N,hosp=sum(hosp,na.rm=T),pop=sum(as.double(as.IDate("2024-7-20")-date+1))),.(month,age)] a=merge(a,a[,.(base=sum(hosp)/sum(pop)),age])[,base:=base*pop] p=a[,.(N=sum(N),hosp=sum(hosp),base=sum(base),pop=sum(pop)),.(month,age)] p=rbind(p,p[,.(N=sum(N),hosp=sum(hosp),base=sum(base),pop=sum(pop),age="Total"),month]) p=rbind(p,p[,.(N=sum(N),hosp=sum(hosp),base=sum(base),pop=sum(pop),month="Total"),age]) m=p[,tapply((hosp-base)/ifelse(hosp>base,base,hosp)*100,.(age,month),c)] disp=p[,tapply((hosp/base-1)*100,.(age,month),round)] pop=p[,xtabs(N~age+month)] exp=.7;maxcolor=300;m[m==-Inf]=-maxcolor pheatmap::pheatmap(abs(m)^exp*sign(m),filename="i1.png",display_numbers=disp, gaps_row=nrow(m)-1,gaps_col=ncol(m)-1, cluster_rows=F,cluster_cols=F,legend=F,cellwidth=19,cellheight=19,fontsize=9,fontsize_number=8, border_color=NA,na_col="gray90", number_color=ifelse((abs(m)^exp>.55*maxcolor^exp)&!is.na(m),"white","black"), breaks=seq(-maxcolor^exp,maxcolor^exp,,256), colorRampPalette(hsv(rep(c(7/12,0),5:4),c(.9,.75,.6,.3,0,.3,.6,.75,.9),c(.4,.65,1,1,1,1,1,.65,.4)))(256)) kimi=\(x){e=floor(log10(ifelse(x==0,1,abs(x))));e2=pmax(e,0)%/%3+1;p=!is.na(x) x[p]=paste0(sprintf(paste0("%.",ifelse(e[p]%%3==0,1,0),"f"),x[p]/1e3^(e2[p]-1)),c("","k","M","B","T")[e2[p]]);x} exp2=.6;maxcolor2=max(pop[-nrow(pop),-ncol(pop)]) pheatmap::pheatmap(pop^exp2,filename="i2.png",display_numbers=ifelse(pop<=10,round(pop),kimi(pop)), gaps_row=nrow(m)-1,gaps_col=ncol(m)-1, cluster_rows=F,cluster_cols=F,legend=F,cellwidth=19,cellheight=19,fontsize=9,fontsize_number=8, border_color=NA,na_col="white", number_color=ifelse(pop^exp2>maxcolor2^exp2*.45,"white","black"), breaks=seq(0,maxcolor2^exp2,,256), sapply(seq(1,0,,256),\(i)rgb(i,i,i))) cap="\u00a0 Czech MoH data: Excess percentage for age-normalized hospitalization rate for people with a first dose listed in 2021. The x-axis shows the month of vaccination. The baseline is the hospitalization rate among people in the same age group in the CSV file who got a first dose in 2021. The hospitalization rate was calculated from the day of the first dose until July 20th 2024. Source: onemocneni-aktualne.mzcr.cz/api/v2/covid-19, tab for vaccinations, second file from bottom translated as \"COVID-19: Overview of persons with reported vaccinations with regard to SARS-CoV-2 infection\"." system(paste0("pad=24;w=`identify -format %w i1.png`;magick -pointsize 40 -font Arial \\( -size $[w*2-pad*3] caption:'",cap,"' -splice $[pad]x20 \\) -font Arial-Bold \\( \\( \\( -size $[w-pad*3] caption:'Excess hospitalization percentage' -splice $[pad]x20 \\) i1.png -append \\) \\( \\( -size $[w-pad*2] caption:'Number of people who received a first dose each month' -splice $[pad]x20 \\) i2.png -append \\) +append \\) -append 1.png"))
I have been telling Kirsch that if the reason why Moderna has higher ASMR than Pfizer would be because Moderna vaccines were killing a lot more people, you'd expect the Moderna-Pfizer ratio to be higher in the first days or weeks following vaccination and get closer to 1.0 over time, because deaths caused by vaccines would probably be much more common in the days or weeks following vaccination than more than a year after vaccination.
However this plot shows that the Moderna-Pfizer ratio remains at a consistent level of about 1.1 to 1.6 until it shoots up at the very end of the plot (when there's only a small number of people who got vaccinated early enough that they had not yet ran out of follow-up time):
In the black line the ratio was only normalized by age. In the gray line the ratio was additionally normalized by ongoing month, but it didn't change the results that much.
But anyway, you would at least think that there would be more vaccine deaths in weeks 0 to 29 after vaccination than in weeks 60 to 89. But because the Moderna-Pfizer ratio remains consistently elevated even more than a year after vaccination, it rather seems to be the product of some kind of confounding factors.
b=data.table::fread("http://sars2.net/f/czbucketskeep.csv.gz") b[,age:=pmax(pmin(95,age%/%5*5),15)] d=merge(b[dose==1],b[dose<=1,.(base2=sum(dead)/sum(alive)),.(month,age)])[,base2:=base2*alive] d=merge(d,b[dose<=1,.(base1=sum(dead)/sum(alive)),age],by="age")[,base1:=base1*alive] d=d[,.(rate1=sum(dead)/sum(base1),rate2=sum(dead)/sum(base2)),.(week,type)] a1=d[,xtabs(rate1~week+type)];r1=a1[,"Moderna"]/a1[,"Pfizer"] a2=d[,xtabs(rate2~week+type)];r2=a2[,"Moderna"]/a2[,"Pfizer"] week=unique(d$week) png("1.png",1300,800,res=198) par(mar=c(2.3,2.3,2.1,.8),mgp=c(0,.6,0)) tit="Age-normalized Moderna-Pfizer ratio by weeks after first dose" leg=c("Normalized by age only","Normalized by age and ongoing month") col=c("black","gray60") plot(week,r1,type="l",col=col[1],main=tit,xlab=NA,ylab=NA,lwd=1.5,cex.main=1.1) abline(h=1,lty=2) lines(week,r2,type="l",col=col[2],lwd=1.5) legend("bottom",legend=leg,col=col,lty=1,lwd=1.5) dev.off()
The following code calculates an age-normalized excess hospitalization percentage relative to a baseline of unvaccinated people. The observation period starts in January 2021 and ends on July 20th 2024.
I combined two different CSV files published by the Czech Ministry of Health: https://onemocneni-aktualne.mzcr.cz/api/v2/covid-19.
I took hospitalizations by vaccine type from the same CSV file as earlier. It's missing the date of hospitalization but it only has a field which indicates if a person has been hospitalized after vaccination or not, so I had to calculate the hospitalization rate up to the present day and I wasn't able to choose which day my observation period ended.
I took hospitalizations in unvaccinated people from another CSV file published by the MoH. But I didn't find a good way to calculate the number of person-days for unvaccinated people up to the present day, so I took the number of person-days for unvaccinated people from one of my buckets files, so that I used the number of person-days in each age group in December 2022 as the subsequent population size up to July 2024 so that I adjusted the person-days for the length of each month.
system("wget onemocneni-aktualne.mzcr.cz/api/v2/covid-19/ockovani-{hospitalizace-tyden,pozitivni-hospitalizovani}.csv sars2.net/f/czbucketsdaily.csv.gz") library(data.table) yemo=\(x){u=unique(x);format(u,"%Y-%m")[match(x,u)]} unvaxhosp=fread("ockovani-hospitalizace-tyden.csv") unvax=unvaxhosp[,.(hosp=hospitalizovani_bez_ockovani),.(date=tyden_od+3,age=pmax(25,pmin(80,as.numeric(sub("[-+].*","",vekova_skupina)))))][,spline(date,hosp/7,xout=min(date):max(date)),age][,.(hosp=sum(y)),.(age,month=yemo(`class<-`(x,"Date")))] maxdate=as.IDate("2024-7-20") unvaxpop=fread("czbucketsdaily.csv.gz")[dose==0][,.(pop=sum(alive)),.(month=yemo(date),age=pmax(pmin(age,80),25)%/%5*5)] unvaxpop=rbind(unvaxpop,unvaxpop[month=="2022-12",.(pop,month=seq(as.Date("2023-1-1"),as.Date("2024-7-1"),"month")),age][,pop:=pop/31*ifelse(month==as.Date("2024-7-1"),20,lubridate::days_in_month(month))][,month:=yemo(month)][]) vaxhosp=fread("ockovani-pozitivni-hospitalizovani.csv") vax=vaxhosp[,.(age=vekova_kategorie,date=prvni_davka,type=ockovaci_latka_nazev,hosp=hospitalizace_po_ockovani)][age!="-"] name1=unique(vax$type);name2=rep("Other",length(name1)) name2[name1==""]="" name2[grep("comirnaty",ignore.case=T,name1)]="Pfizer" name2[grep("spikevax",ignore.case=T,name1)]="Moderna" name2[grep("nuvaxovid",ignore.case=T,name1)]="Novavax" name2[name1=="COVID-19 Vaccine Janssen"]="Janssen" name2[name1=="VAXZEVRIA"]="AstraZeneca" vax$type=name2[match(vax$type,name1)] a=vax[,.(hosp=sum(hosp,na.rm=T),pop=sum(as.double(as.IDate("2024-7-20")-date+1))),.(age=pmax(25,pmin(80,as.numeric(sub("[+-].*","",age)))),type)] a=merge(a,merge(unvax,unvaxpop)[,.(hosp=sum(hosp),pop=sum(pop)),age][,.(base=hosp/pop,age)])[,base:=base*pop][] a[,.(excess=(sum(hosp)/sum(base)-1)*100),type][order(-excess),excess:=round(excess)][]|>print(r=F)
I only considered first doses in the code above. The output shows that the excess hospitalization percent was about -79% for people whose first dose was Pfizer, -77% for Moderna, -71% for AstraZeneca, and -68% for Janssen:
type excesspct Pfizer -79 Moderna -77 Novavax -91 AstraZeneca -71 Janssen -68 Other -100
So in other words unvaccinated people had about 5 times higher age-normalized hospitalization rate than people who got a Pfizer vaccine for the first dose, or Pfizer vaccines had about 79% efficacy in preventing hospitalization relative to the baseline of unvaccinated people.
Novavax got an excess hospitalization percent of -91% above, but it's because people only started getting Novavax vaccines in March 2022 after which there weren't that many people hospitalized for COVID. The CSV file I used to get hospitalization data by vaccine type is missing the date of hospitalization, so I wasn't able to adjust my excess mortality percent for the ongoing month.
For the rare vaccine types that I aggregated under the label "Other", there were zero hospitalizations so the excess hospitalization percent was -100%.
But anyway, my results roughly seem to match the record-level data which shows that during the COVID wave in the last two months of 2021, the spike in all-cause ASMR was higher for Janssen and AstraZeneca vaccines but more flat for Moderna and Pfizer.
One limitation of my analysis is that a large part of the person-days of vaccinated people are during periods with low hospitalizations from mid-2022 up to July 2024 and in the third quarter of 2021. But unvaccinated people have a lot of person-days in the first quarter of 2021 when there was a high number of hospitalizations but many people had not yet been vaccinated.
In June 2024 the Czech authors who uploaded the record-level data to GitHub published a paper where they analyzed data for adverse event reports they received through a freedom of information response: https://onlinelibrary.wiley.com/doi/abs/10.1111/eci.14271. The paper is behind a paywall, but I bought access to it and posted screenshots of the paper here: i/czech-batch-study.jpg. The batch data is available from this spreadsheet: https://github.com/PalackyUniversity/batch-dependent-safety/blob/main/data/SUKL-batches-AE.xlsx. The authors also published a blog post about the paper on the website of the Czech Association of Microbiologists, Immunologists and Statisticians: https://smis-lab.cz/2024/07/03/znepokojujici-rozdily-mezi-sarzemi-covidovych-vakcin/.
In the spreadsheet at GitHub and in the original Czech paper, the number of doses is the number of doses shipped and not the number of doses administered. So new batches where all doses have not yet been administered tend to get a lower rate of deaths per dose, and there might also be some older batches where all doses didn't end up being administered. Schmeling et al.'s infamous study of Danish batch data had the same problem, which might partially explain why they found that newer batches tended to get a lower rate of adverse events per dose than older batches. [https://onlinelibrary.wiley.com/doi/10.1111/eci.13998] Max Schmeling posted this reply to Jessica Rose: "I just wanted to add, that the lot size data, we obtained from The Danish Serum Institute, is in fact the number of doses pr. batch, that were shipped from the Danish Serum Institute to all the Danish vaccination centers. It is perhaps the data, that comes closest to being the real number of administered doses pr. batch, since the shipped doses exclude any doses, the Serum Institute might have in stock." [https://jessicar.substack.com/p/danish-study-on-lot-variation-and] However I don't know if in the Czech data the number of doses is the number of doses that some Czech distribution center received from abroad, or the number of doses that a Czech distribution center shipped to further locations in Czechia. However the latter option seems more likely based on one of the Excel files in the GitHub repository of Palacký University (P1_Prehled propustenych sarzi vakcin proti onemocneni covid-19.xlsx
), where a column for the date of each batch was translated as "Released on, not the same as the date of import" ("Propuštěno dne, není shodné s datem dovozu"), and a total number of doses is titled "Total released". The paper about the batch data says that batch data provided by SUKL included "the date of authorization of the release of each batch", and that "the dataset contained all batches of COVID-19 vaccines released for use from the beginning of the vaccination campaign to 1 March, 2023".
However with the preceding caveat in mind, when I simply used the number of doses released from each batch as the denominator, then in the spreadsheet posted at GitHub, Moderna got a higher rate of adverse event reports and deaths per million than Pfizer, but Pfizer got a higher rate of serious adverse event reports per million than Moderna:
library(data.table);library(readxl) f="SUKL-batches-AE.xlsx" download.file("https://github.com/PalackyUniversity/batch-dependent-safety/raw/main/data/SUKL-batches-AE.xlsx",f) s1=data.table(read_excel(f))[,.(doses=sum(doses_per_batch)),name] s2=data.table(read_excel(f,sheet=2))[,.(reports=sum(no_reports),dead=sum(deaths),serious=sum(serious)),name] me=merge(s1,s2,all=T) name=rep("Other",nrow(me)) name[grep("COMIRNATY",me$name)]="Pfizer" name[grep("MODERNA|SPIKEVAX",me$name)]="Moderna" name[grep("NUVAXOVID",me$name)]="Novavax" name[grep("JANSSEN",me$name)]="Janssen" name[grep("VAXZEVRIA|ASTRAZENECA",me$name)]="AstraZeneca" me$name=name me[is.na(me)]=0 me=me[,.(dead=sum(dead),reports=sum(reports),serious=sum(serious),doses=sum(doses)),name] me=me[,.(name,doses,reports,serious,dead,reportspermil=reports/doses*1e6,seriouspermil=serious/doses*1e6,deadpermil=dead/doses*1e6)] me[order(-doses)]|>dplyr::mutate_if(is.double,round,1)|>print(r=F)
name doses reports serious dead reportspermil seriouspermil deadpermil Pfizer 28343760 10712 5183 134 377.9 182.9 4.7 Moderna 3798150 1575 583 26 414.7 153.5 6.8 AstraZeneca 1113200 1512 695 44 1358.2 624.3 39.5 Janssen 776800 555 241 10 714.5 310.2 12.9 Novavax 632000 14 4 0 22.2 6.3 0.0 Other 64800 18 13 2 277.8 200.6 30.9
If Pfizer is used as the baseline, the number of deaths per dose was about 8 times higher for AstraZeneca and about 3 times higher for Janssen:
> x=data.frame(me,row.names=1);t((t(x)/unlist(x["Pfizer",])))[,5:7]|>round(2) reportspermil seriouspermil deadpermil Pfizer 1.00 1.00 1.00 AstraZeneca 3.59 3.41 8.36 Janssen 1.89 1.70 2.72 Moderna 1.10 0.84 1.45 Novavax 0.06 0.03 0.00 Other 0.73 1.10 6.53
The spreadsheet for the batch data didn't contain information about age or other confounding variables. But in the record-level data when I assigned a random birthday for each person and I counted the person-days for each age up to the end of 2022, the average age was about 50 for people who got a Pfizer vaccine for the first dose but about 56 for Moderna and 68 for AstraZeneca:
> b=fread("http://sars2.net/f/czbucketskeep.csv.gz") > b[dose==1,.(alive=sum(as.double(alive))),.(age,type)][,.(age=weighted.mean(age,alive)),type][order(age)]|>print(r=F) type age Novavax 41.5 Other 44.1 Janssen 47.3 Pfizer 49.7 Moderna 56.0 AstraZeneca 68.4
However when I took the average age for the batches from the record-level data, I got a correlation of only about 0.18 between age and the number of adverse event reports, about 0.19 between age and the number of serious adverse event reports, and about 0.18 between age and the rate of deaths in the adverse event reports:
download.file("http://sars2.net/f/czbucketsbatchkeep.csv.xz","czbucketsbatchkeep.csv.xz") b=fread("xz -dc czbucketsbatchkeep.csv.xz") ages=b[batch!="",.(alive=sum(alive)),.(age,batch)][,.(age=weighted.mean(age,alive)),batch] s1=data.table(read.xlsx(f))[,.(batch=sub("^\t","",batch_id),doses=doses_per_batch)] s2=data.table(read.xlsx(f,sheet=2))[,.(dead=sum(deaths),serious=sum(serious)),batch] me=merge(merge(s1,ages,all=T),s2,all=T) me[is.na(me)]=0 me[doses>0,cor(reports/doses,age)] # [1] 0.1784564 (correlation between age and the rate of adverse event reports) me[doses>0,cor(serious/doses,age)] # [1] 0.1874058 (correlation between age and the rate of serious adverse event reports) me[doses>0,cor(dead/doses,age)] # [1] 0.1825102 (correlation between age and the rate of deaths in adverse event reports)
However the reason why the correlation was fairly low was partially because there's a lot of noise for small batches, and when I only looked at batches with at least 100,000 doses, the correlation increased to about 0.33 for deaths and 0.26 for serious adverse events:
me[doses>=1e5,cor(dead/doses,age)] # [1] 0.3297154 me[doses>=1e5,cor(serious/doses,age)] # [1] 0.2552369
In the following code I calculated an age-normalized mortality rate for each batch in the record-level data. Its correlation was about -0.01 with both the rate of deaths in the spreadsheet and the rate of serious adverse events:
b=fread("xz -dc czbucketsbatchkeep.csv.xz") d=b[,.(dead=sum(dead),alive=sum(alive)),.(batch,age=ifelse(age<15,0,pmin(age,95)))] d=merge(d,d[,.(base=sum(dead)/sum(alive)),.(age)])[,base:=base*alive] d=d[,.(rate=sum(dead)/sum(base)),batch] f="SUKL-batches-AE.xlsx" s1=data.table(read.xlsx(f))[,.(batch=sub("^\t","",batch_id),doses=doses_per_batch)] s2=data.table(read.xlsx(f,sheet=2))[,.(dead=sum(deaths),serious=sum(serious)),batch] merge(d,merge(s1,s2)[doses>0,.(reported=dead/doses),batch])[,cor(rate,reported)] # [1] -0.009126758 merge(d,merge(s1,s2)[doses>0,.(reported=serious/doses),batch])[,cor(rate,reported)] # [1] -0.01475335
From this plot you can also see that the rate of serious adverse event reports has a correlation of close to zero with the excess mortality in the record-level data:
# download.file("http://sars2.net/f/czbucketsbatchkeep.csv.xz","czbucketsbatchkeep.csv.xz") # download.file("https://github.com/PalackyUniversity/batch-dependent-safety/raw/main/data/SUKL-batches-AE.xlsx","SUKL-batches-AE.xlsx") library(data.table);library(openxlsx);library(ggplot2);library(ggallin) kim=\(x)ifelse(x>=1e3,ifelse(x>=1e6,paste0(x/1e6,"M"),paste0(x/1e3,"k")),x) b=fread("xz -dc czbucketsbatchkeep.csv.xz") d=b[,.(dead=sum(dead),alive=sum(alive)),.(batch,month,type,age=ifelse(age<15,0,pmin(age,95)))] d=merge(d,d[,.(base=sum(dead)/sum(alive)),.(age,month)])[,base:=base*alive] d=d[,.(y=(sum(dead)/sum(base)-1)*100),.(batch,type)] f="SUKL-batches-AE.xlsx" s1=data.table(read.xlsx(f))[,.(doses=head(doses_per_batch,1)),.(batch=sub("^\t","",batch_id))] s2=data.table(read.xlsx(f,sheet=2))[,.(x=sum(serious)),batch] p=merge(d,merge(s1,s2)[doses>0,.(x=x/doses*1e6,doses),batch]) xstart=0;xend=max(p$x);ystart=-100;yend=max(p$y)*1.02;ystep=20 ybreak=seq(ystart,yend,ystep) p$type=factor(p$type,p[,sum(as.double(doses)),type][order(-V1)]$type) color=hsv(c(240,355,330,204,36)/360,c(.94,.75,.8,.67,.64),c(.76,.82,.5,.87,.95)) pointsize=10^(3:6) sub="All doses are included and not only first doses. On the y-axis excess mortality was calculated from the day of vaccination up to the end of 2022, where the baseline was the mortality rate among all people who were included in the record-level data for the same combination of age and ongoing month. On the y-axis people remain included under earlier batches after a dose from a new batch, so a death after doses from multiple batches is counted under each batch. Sources: github.com/PalackyUniversity/uzis-data-analysis, github.com/PalackyUniversity/batch-dependent-safety."|>stringr::str_wrap(128) ggplot(p,aes(x,y))+ geom_vline(xintercept=c(xstart,xend),linewidth=.3,lineend="square")+ geom_hline(yintercept=c(ystart,0,yend),linewidth=.3,lineend="square")+ geom_point(aes(color=type,size=doses),stroke=0)+ # geom_text(aes(label=batch,color=type),size=2.3,vjust=-.6,show.legend=F)+ ggrepel::geom_text_repel(aes(label=batch,color=type),size=2.3,show.legend=F,max.overlaps=Inf,segment.size=.2,min.segment.length=.2,force=10,force_pull=2,box.padding=.1)+ labs(title="Batches in Czech record-level data compared to serious adverse event reports",x="Serious adverse event reports per million doses",y="Excess mortality percent in record-level data",subtitle=sub)+ scale_x_continuous(trans=pseudolog10_trans,limits=c(xstart,xend),breaks=c(0,outer(c(1,3),10^(0:9))),labels=kim)+ scale_y_continuous(limits=c(ystart,yend),breaks=ybreak,labels=\(x)paste0(x,"%"))+ scale_color_manual(values=color,name="Type")+ scale_size_continuous(range=c(.7,2.7),limits=range(pointsize),breaks=pointsize,labels=kim,name="Doses")+ coord_cartesian(clip="off",expand=F)+ guides(color=guide_legend(order=1),size=guide_legend(order=2))+ theme(axis.text=element_text(size=8,color="black"), axis.ticks=element_line(linewidth=.3), axis.ticks.length=unit(4,"pt"), axis.title=element_text(size=8,face="bold"), axis.title.x=element_text(margin=margin(4)), axis.title.y=element_text(margin=margin(,,,1)), legend.background=element_rect(fill="white",color="black",linewidth=.3), legend.key.height=unit(8,"pt"), legend.key.width=unit(6,"pt"), legend.key=element_rect(fill="white"), legend.margin=margin(6,8,6,8), legend.direction="horizontal", legend.box="horizontal", legend.position="top", legend.box.margin=margin(), legend.box.spacing=unit(0,"pt"), legend.spacing=unit(0,"pt"), legend.text=element_text(size=8), legend.title=element_text(size=8,face="bold",margin=margin(,2)), panel.background=element_rect(fill="white"), panel.grid=element_blank(), plot.margin=margin(5,12,5,5), plot.subtitle=element_text(size=8,margin=margin(,,4)), plot.title=element_text(size=9,face="bold",margin=margin(2,,5))) ggsave("0.png",width=7,height=4.8,dpi=380*4) system("magick 0.png -resize 25% 1.png")
In the following code I compared the number of doses administered from each batch in the record-level data to doses released in the spreadsheet for SUKL data. I only included doses administered up to July 4th 2023, which was the date when the authors of the batch study said they received the data from SUKL. A couple of batches had multiple rows in the spreadsheet because the batches included multiple sets of packages that were authorized on different dates, but I used the earliest date as the date of each batch. I didn't attempt to resolve incorrectly typed batch IDs, except I only converted all batch IDs to uppercase and I removed extra spaces from the batch IDs.
About a third of all batches had less than 10% of doses administered. There were less than 20% doses administered in almost all batches with an authorization date in 2022, which probably explains why the new batches got such a low ratio of deaths per doses released. 15 batches had more doses administered than doses released by SUKL:
download.file("https://github.com/PalackyUniversity/batch-dependent-safety/raw/main/data/SUKL-batches-AE.xlsx","SUKL-batches-AE.xlsx") t=data.table(readxl::read_excel("SUKL-batches-AE.xlsx",sheet=1)) t=t[,.(released=sum(doses_per_batch),date=min(date_authorization)),.(batch=batch_id,type=name)] rec=fread("CR_records.csv") k=grep("Datum_",colnames(rec));k2=k+1 d=data.table(date=`class<-`(unlist(rec[,..k],,F),"Date"),batch0=unlist(rec[,..k2],,F))[!is.na(date)] d=d[,batch:=toupper(trimws(batch0)),batch0][date<="2023-7-4",.(administered=.N),batch] t=merge(t,d) t[,pct:=round(administered/released*100)][order(pct)]|>print(nrow=Inf)
batch type released date administered pct 1: 000202A SPIKEVAX 39600 2022-01-20 67 0 2: 000256A SPIKEVAX 24000 2022-02-16 42 0 3: 000382A SPIKEVAX 57600 2022-05-18 33 0 4: 000384A SPIKEVAX 56400 2022-05-16 3 0 5: 000388A SPIKEVAX 56400 2022-05-09 17 0 6: 016G21A SPIKEVAX 51600 2021-10-13 111 0 7: 200106A SPIKEVAX BIVALENT ORIGINAL/OMICRON BA.1 121000 2022-10-05 6 0 8: 226011 SPIKEVAX 48000 2022-03-30 150 0 9: 36660TB COMIRNATY 253890 2022-04-20 1042 0 10: 400056A SPIKEVAX BIVALENT ORIGINAL/OMICRON BA.4-5 20400 2023-01-19 21 0 11: 400057A SPIKEVAX BIVALENT ORIGINAL/OMICRON BA.4-5 8450 2023-01-31 1 0 12: 4301MF012 NUVAXOVID 90000 2022-03-07 131 0 13: 4301MF015 NUVAXOVID 90000 2022-03-15 1 0 14: 4302MF002 NUVAXOVID 120000 2022-07-01 195 0 15: 4302MF016 NUVAXOVID 200000 2022-07-22 732 0 16: ABY4055 VAXZEVRIA 122400 2021-06-28 10 0 17: ACA4541 COVID-19 VACCINE JANSSEN 55200 2021-09-15 1 0 18: FL4574 COMIRNATY 306540 2021-11-16 15 0 19: FP9632 COMIRNATY 411840 2022-02-03 1245 0 20: FR8477 COMIRNATY DISP 455040 2022-04-25 2 0 21: GG3683 COMIRNATY BABY 52800 2022-10-31 180 0 22: GJ2636 COMIRNATY ORIGINAL/OMICRON BA.4-5 57600 2022-11-07 12 0 23: GJ2638 COMIRNATY ORIGINAL/OMICRON BA.4-5 259200 2022-11-07 1 0 24: GJ7179 COMIRNATY ORIGINAL/OMICRON BA.4-5 786240 2022-10-31 3090 0 25: GK4367 COMIRNATY ORIGINAL/OMICRON BA.4-5 1177920 2022-11-14 4615 0 26: GK7826 COMIRNATY ORIGINAL/OMICRON BA.4-5 126720 2022-10-24 2 0 27: MV1041A SPIKEVAX BIVALENT ORIGINAL/OMICRON BA.4-5 16900 2023-02-13 23 0 28: PCA0075 COMIRNATY 87750 2022-02-28 273 0 29: PCA0081 COMIRNATY 8190 2022-04-25 11 0 30: PCA0082 COMIRNATY 304200 2022-04-25 106 0 31: PCB0016 COMIRNATY 143910 2022-03-07 6 0 32: PCB0017 COMIRNATY 285480 2022-03-07 1110 0 33: SDYY1 COMIRNATY 36270 2022-05-23 17 0 34: 000162A SPIKEVAX 24600 2022-02-01 144 1 35: 150001 COMIRNATY 36270 2022-04-20 249 1 36: ACA5778 COVID-19 VACCINE JANSSEN 96000 2021-09-29 1140 1 37: FN4074 COMIRNATY PED 32400 2022-01-14 164 1 38: FR9187 COMIRNATY DISP 161280 2022-05-23 1871 1 39: GD6803 COMIRNATY ORIGINAL/OMICRON BA.1 659520 2022-09-12 4414 1 40: GJ2639 COMIRNATY ORIGINAL/OMICRON BA.4-5 357120 2022-10-18 3430 1 41: GL6799 COMIRNATY ORIGINAL/OMICRON BA.4-5 PED 28800 2022-11-21 360 1 42: PCA0003 COMIRNATY 212940 2021-10-25 2046 1 43: 000251A SPIKEVAX 108000 2022-04-21 2061 2 44: GE0694 COMIRNATY PED 57600 2022-10-03 1148 2 45: 000082A SPIKEVAX 157200 2021-11-16 3979 3 46: 000266A SPIKEVAX 19200 2022-03-23 502 3 47: 000371A SPIKEVAX 159600 2022-06-07 5035 3 48: 000397A SPIKEVAX 158400 2022-06-02 4224 3 49: 210155 VAXZEVRIA 148800 2021-06-29 4968 3 50: 223034 SPIKEVAX BIVALENT ORIGINAL/OMICRON BA.1 28000 2022-10-14 706 3 51: FP8748 COMIRNATY DISP 354240 2022-03-18 10518 3 52: FR3566 COMIRNATY DISP 345600 2022-05-23 11495 3 53: MV1013A SPIKEVAX BIVALENT ORIGINAL/OMICRON BA.1 52800 2022-09-26 1472 3 54: 000090A SPIKEVAX 31200 2022-01-26 1325 4 55: FP9604 COMIRNATY 193050 2022-02-03 8588 4 56: FR5493 COMIRNATY DISP 247680 2022-02-10 10595 4 57: 000139A SPIKEVAX 117700 2021-12-07 6425 5 58: 1F1053A COMIRNATY 81900 2022-02-28 4371 5 59: GJ2631 COMIRNATY ORIGINAL/OMICRON BA.4-5 354240 2022-10-11 15942 5 60: 000358A SPIKEVAX 56400 2022-05-02 3122 6 61: 200090A SPIKEVAX BIVALENT ORIGINAL/OMICRON BA.1 48000 2022-09-14 2945 6 62: 000187A SPIKEVAX 39600 2022-01-14 2931 7 63: 4301MF011 NUVAXOVID 132000 2022-02-24 9483 7 64: 000275A SPIKEVAX 19200 2022-03-22 1460 8 65: 200038A SPIKEVAX BIVALENT ORIGINAL/OMICRON BA.1 40800 2022-09-07 3097 8 66: 1F1045A COMIRNATY 429390 2022-01-31 39278 9 67: FN5988 COMIRNATY PED 25200 2022-02-10 2367 9 68: FP8234 COMIRNATY 444600 2022-01-12 42560 10 69: GJ1988 COMIRNATY ORIGINAL/OMICRON BA.4-5 354240 2022-10-03 33815 10 70: 093F21A SPIKEVAX 52800 2021-11-01 5851 11 71: XD955 COVID-19 VACCINE JANSSEN 144000 2021-04-12 16440 11 72: 000239A SPIKEVAX 25200 2022-02-24 2927 12 73: 226004 SPIKEVAX 19200 2022-03-09 2487 13 74: FP8956 COMIRNATY DISP 587520 2022-02-21 82849 14 75: 1K082A COMIRNATY 236340 2022-02-28 38074 16 76: GD6798 COMIRNATY ORIGINAL/OMICRON BA.1 662400 2022-09-06 129668 20 77: GH9851 COMIRNATY ORIGINAL/OMICRON BA.4-5 650880 2022-09-26 132692 20 78: 216046 SPIKEVAX 117400 2021-12-15 24119 21 79: 216041 SPIKEVAX 116400 2021-11-30 30947 27 80: FN4071 COMIRNATY PED 43200 2022-01-10 11736 27 81: FN5519 COMIRNATY 625950 2021-12-30 185131 30 82: 000058A SPIKEVAX 187200 2021-11-05 59856 32 83: 3004738 SPIKEVAX 63600 2021-07-26 24769 39 84: ACA5775 COVID-19 VACCINE JANSSEN 55200 2021-09-21 21538 39 85: GH9722 COMIRNATY ORIGINAL/OMICRON BA.4-5 650880 2022-09-19 255011 39 86: FP1972 COMIRNATY DISP 233280 2022-01-24 94822 41 87: 216040 SPIKEVAX 116400 2021-11-24 53447 46 88: FN4072 COMIRNATY PED 187200 2021-12-10 86197 46 89: 000039A SPIKEVAX 30100 2021-11-18 14110 47 90: 3005238 SPIKEVAX 62400 2021-07-29 29931 48 91: ACA6111 COVID-19 VACCINE JANSSEN 69600 2021-10-06 35290 51 92: FM9088 COMIRNATY 152100 2022-01-10 81589 54 93: 300042698 COVID-19 VACCINE MODERNA 8400 2021-01-11 4938 59 94: PCB0001 COMIRNATY 305370 2021-11-01 182391 60 95: FA4597 COMIRNATY 7020 2021-05-14 4305 61 96: 300042721 COVID-19 VACCINE MODERNA 36000 2021-02-01 22193 62 97: 3004218 SPIKEVAX 81600 2021-06-29 51654 63 98: PCA0002 COMIRNATY 212940 2021-10-20 136568 64 99: 090F21A SPIKEVAX 52800 2021-10-22 34583 65 100: FL4213 COMIRNATY 305370 2021-11-11 198003 65 101: EW4815 COMIRNATY 286650 2021-04-12 198610 69 102: ABZ5320 COVID-19 VACCINE JANSSEN 24000 2021-09-06 16707 70 103: FF2832 COMIRNATY 391950 2021-08-06 287691 73 104: 3000493 COVID-19 VACCINE MODERNA 44400 2021-02-24 32938 74 105: 3000496 COVID-19 VACCINE MODERNA 55800 2021-03-09 41458 74 106: FA5829 COMIRNATY 5850 2021-05-04 4303 74 107: XE423 COVID-19 VACCINE JANSSEN 14400 2021-07-26 11055 77 108: 21C17-05 COVID-19 VACCINE JANSSEN 12000 2021-06-08 9371 78 109: FE7051 COMIRNATY 359190 2021-07-26 278701 78 110: 3001531 COVID-19 VACCINE MODERNA 88800 2021-03-23 70452 79 111: 3004493 SPIKEVAX 63600 2021-07-08 50310 79 112: FE8244 COMIRNATY 359190 2021-07-19 285346 79 113: FM7533 COMIRNATY 1074060 2021-12-09 846426 79 114: 21C16-05 COVID-19 VACCINE JANSSEN 12000 2021-06-01 9552 80 115: ABX4044 VAXZEVRIA 144000 2021-05-27 115756 80 116: ABY1328 VAXZEVRIA 28800 2021-06-21 23077 80 117: FM7785 COMIRNATY 57330 2021-12-06 45638 80 118: XD974 COVID-19 VACCINE JANSSEN 26400 2021-05-26 21512 81 119: 210105 VAXZEVRIA 19200 2021-06-22 15721 82 120: FG4509 COMIRNATY 391950 2021-08-02 320080 82 121: XD986 COVID-19 VACCINE JANSSEN 38400 2021-05-21 31673 82 122: 1F1020A COMIRNATY 754650 2021-09-29 623216 83 123: 3002920 COVID-19 VACCINE MODERNA 70800 2021-06-03 59077 83 124: 3003606 COVID-19 VACCINE MODERNA 75600 2021-06-18 63554 84 125: XE393 COVID-19 VACCINE JANSSEN 174400 2021-06-29 146119 84 126: FG3739 COMIRNATY 333450 2021-08-31 282062 85 127: 3003601 COVID-19 VACCINE MODERNA 70800 2021-06-14 61750 87 128: 3004670 SPIKEVAX 63600 2021-07-15 56079 88 129: EL0725 COMIRNATY 76050 2021-02-01 67048 88 130: EX0893 COMIRNATY 129870 2021-04-19 114437 88 131: EX3599 COMIRNATY 69030 2021-04-14 60465 88 132: FF0680 COMIRNATY 358020 2021-07-01 315866 88 133: FF3318 COMIRNATY 359190 2021-07-09 315524 88 134: FK0594 COMIRNATY 306540 2021-11-23 268872 88 135: EY3014 COMIRNATY 404820 2021-04-26 361281 89 136: 3002916 COVID-19 VACCINE MODERNA 70800 2021-05-27 63606 90 137: EJ6790 COMIRNATY 77220 2021-02-08 69716 90 138: ET7205 COMIRNATY 163800 2021-03-22 146818 90 139: EW2239 COMIRNATY 163800 2021-03-29 146766 90 140: 210102 VAXZEVRIA 38400 2021-06-14 34991 91 141: ABW4801 VAXZEVRIA 158400 2021-04-23 144504 91 142: EW2246 COMIRNATY 285480 2021-04-06 258562 91 143: FA4598 COMIRNATY 431730 2021-05-10 390988 91 144: FD4555 COMIRNATY 278460 2021-06-15 254568 91 145: FE6208 COMIRNATY 670410 2021-06-28 607290 91 146: SDCN1 COMIRNATY 77220 2021-10-05 70208 91 147: EP2163 COMIRNATY 91260 2021-02-15 84102 92 148: EY7015 COMIRNATY 424710 2021-05-03 389369 92 149: FC2473 COMIRNATY 432900 2021-05-24 398696 92 150: FD6840 COMIRNATY 897390 2021-06-21 823343 92 151: FL5324 COMIRNATY 211770 2021-10-13 194289 92 152: 21C11-05 COVID-19 VACCINE JANSSEN 12000 2021-05-07 11110 93 153: ABX3519 VAXZEVRIA 38400 2021-05-18 35887 93 154: ET1831 COMIRNATY 236340 2021-03-01 219680 93 155: ET3620 COMIRNATY 223470 2021-03-08 208756 93 156: ET6956 COMIRNATY 156780 2021-04-19 145733 93 157: FD1921 COMIRNATY 662220 2021-05-31 619097 93 158: FE1248 COMIRNATY 271440 2021-06-15 252428 93 159: SCTR1 COMIRNATY 335790 2021-09-06 311533 93 160: ABV2856 COVID-19 VACCINE ASTRAZENECA 19200 2021-02-05 18086 94 161: EJ6797 COMIRNATY 144300 2021-01-08 136208 94 162: EP2166 COMIRNATY 93600 2021-02-22 88414 94 163: FC0681 COMIRNATY 423540 2021-05-14 399653 94 164: FD0168 COMIRNATY 549900 2021-06-07 516021 94 165: 3001939 COVID-19 VACCINE MODERNA 52200 2021-04-26 49611 95 166: 3002546 COVID-19 VACCINE MODERNA 70800 2021-05-20 67509 95 167: 31100TB COMIRNATY 134550 2021-10-05 127842 95 168: ABX5105 VAXZEVRIA 28800 2021-05-11 27236 95 169: 3001659 COVID-19 VACCINE MODERNA 74400 2021-04-06 71200 96 170: ABV6096 COVID-19 VACCINE ASTRAZENECA 36000 2021-02-24 34691 96 171: 3002339 COVID-19 VACCINE MODERNA 70800 2021-05-12 68752 97 172: XD975 COVID-19 VACCINE JANSSEN 19200 2021-05-13 18818 98 173: 1F1005A COMIRNATY 224640 2021-09-15 222765 99 174: 1F1006A COMIRNATY 217620 2021-09-21 215683 99 175: EK9788 COMIRNATY 47775 2021-01-25 47106 99 176: FE2296 COMIRNATY 124020 2021-06-28 122996 99 177: 3002330 COVID-19 VACCINE MODERNA 70800 2021-05-04 71061 100 178: 3005242 SPIKEVAX 62400 2021-08-06 62551 100 179: ABV5811 COVID-19 VACCINE ASTRAZENECA 61100 2021-02-16 61223 100 180: 3001943 COVID-19 VACCINE MODERNA 48000 2021-04-27 48695 101 181: ABV4678 COVID-19 VACCINE ASTRAZENECA 38400 2021-02-10 38961 101 182: EL1491 COMIRNATY 69225 2021-01-05 69647 101 183: FA7082 COMIRNATY 2340 2021-05-14 2362 101 184: 3003659 COVID-19 VACCINE MODERNA 80400 2021-06-24 82070 102 185: 21C18-05 COVID-19 VACCINE JANSSEN 12000 2021-06-16 12449 104 186: ABW9941 VAXZEVRIA 14400 2021-04-12 15030 104 187: ABV7764 COVID-19 VACCINE ASTRAZENECA 9600 2021-03-23 10192 106 188: ABW2187 COVID-19 VACCINE ASTRAZENECA 98700 2021-03-29 104137 106 189: ABX3502 VAXZEVRIA 33600 2021-04-16 35618 106 190: SCWF7 COMIRNATY 8190 2021-09-21 8660 106 191: EL1484 COMIRNATY 19500 2020-12-29 20930 107 192: ABV5297 COVID-19 VACCINE ASTRAZENECA 9600 2021-03-16 10350 108 193: ABV8139 COVID-19 VACCINE ASTRAZENECA 15000 2021-03-08 16178 108 194: EJ6796 COMIRNATY 9750 2020-12-23 10549 108 195: ABW1277 COVID-19 VACCINE ASTRAZENECA 50400 2021-03-03 54872 109 196: 21C10-05 COVID-19 VACCINE JANSSEN 12000 2021-04-26 13277 111 197: 000023A SPIKEVAX 1200 2021-10-15 1463 122 198: EJ6134 COMIRNATY 19500 2021-01-08 33555 172 199: EX8680 COMIRNATY 0 2021-05-27 1507 Inf batch type released date administered pct
Here's a plot of the same data:
The number of doses per vial is listed as 5 for the 8 Pfizer lots with the earliest authorization date but as 6 for all later Pfizer lots (except for some pediatric lots which have 10 doses per vial). The reason why some of the early lots have more doses administered than released might be if the vials were divided into 6 instead of 5 doses. A note in a New Mexico COVID dashboard said: "Doses administered may occasionally exceed doses received; in some cases, providers have been able to use six doses of Pfizer – not five – from each vial (or 11 doses, not 10, for Moderna)." [http://web.archive.org/web/20210728001738/https://cvvaccine.nmhealth.org/public-dashboard.html] A Canadian article from April 2021 said: "One positive report Thursday is that provinces have had no trouble getting six doses out of every vial of Pfizer instead of five. Health Canada adjusted the number of doses per vial at Pfizer's request in February, but getting the extra doses requires the use of a special syringe that lets less vaccine go to waste." [http://web.archive.org/web/20210401223024/https://www.ctvnews.ca/health/coronavirus/five-million-canadians-now-have-at-least-one-dose-as-vaccine-rollout-gathers-steam-1.5371919] A New York Times article from December 2020 said: "Pharmacists have found that they can squeeze an additional dose from some of the glass vials that were supposed to contain five doses of the Pfizer vaccine". [https://www.nytimes.com/2020/12/16/health/Covid-Pfizer-vaccine-extra-doses.html] However there might also be other reasons why some batches have more doses administered than released, because Europe CDC's vaccine tracker included this note under Czechia: "The number of doses administered may be greater than the number of doses distributed to the country, possibly due to under-reporting of vaccine supplies by healthcare providers." [https://thedailybeagle.substack.com/p/solving-the-swedish-mystery?open=false#§countries-admit-to-fraudulent-shot-administration-data]
WelcomeTheEagle made a Tableau dashboard for Schmeling's data, so I asked him where he got the data, but he posted a screenshot which showed the percentage of adverse event reports per batch and wrote: "I never saw the raw data, only this summary. I'm still missing one green lot# (next to last). There is still blue, green, yellow lots coming into VAERS. Yellow lots are not harmless as the Dutch claim. They were just the newest lots at the time...." [https://x.com/welcometheeagle/status/1818662954604413265] I compiled the data from the screenshot into this file (where there's one batch missing that wasn't visible in the screenshot): f/schmeling.csv.
There's only 9 batches that are included in both Schmeling's study and the Czech spreadsheet, which are all Pfizer batches. I thought that if I would've calculated a ratio of adverse event reports per doses administered instead of per doses released for the Czech batches, it would've resulted in a big increase in the ratio of the newer batches, but actually the results didn't change that much. It might be because only 2 of the 9 batches had an authorization date after the first half of 2021, and they both had a very low number of adverse event reports. But the other 7 batches had a high percentage of doses administered out of doses released:
rec=fread("CR_records.csv") s1=data.table(readxl::read_excel("SUKL-batches-AE.xlsx",sheet=1)) s2=data.table(readxl::read_excel("SUKL-batches-AE.xlsx",sheet=2)) t=s1[,.(released=sum(doses_per_batch),date=min(date_authorization)),.(batch=batch_id,type=name)] t=merge(t,s2[,.(batch,no_reports)]) k=grep("Datum_",colnames(rec));k2=k+1 d=data.table(date=`class<-`(unlist(rec[,..k],,F),"Date"),batch0=unlist(rec[,..k2],,F))[!is.na(date)] d=d[,batch:=toupper(trimws(batch0)),batch0][date<="2023-7-4",.(administered=.N),batch] t=merge(t,d) t[,pct:=round(administered/released*100)] dk=fread("http://sars2.net/f/schmeling.csv") t=merge(t,dk[,.(batch,dk=adverse/doses)]) t[,.(batch,czdate=date,czadminpership=round(administered/released*100),czpership=round(no_reports/released*1e6),czperadmin=round(no_reports/administered*1e6),dkpership=round(dk*1e6))][order(czdate)]|>print(r=F)
Here's the output of the code above formatted as a table:
Pfizer batch ID |
Czech release date |
Czech doses administered per doses released |
Czech adverse event reports per million doses released |
Czech adverse event reports per million doses administered |
Danish adverse event reports per million doses released |
---|---|---|---|---|---|
EJ6796 | 2020-12-23 | 108% | 5538 | 5119 | 184615 |
EJ6134 | 2021-01-08 | 172% | 4359 | 2533 | 119319 |
EJ6797 | 2021-01-08 | 94% | 2190 | 2320 | 163825 |
EK9788 | 2021-01-25 | 99% | 2784 | 2823 | 100597 |
EJ6790 | 2021-02-08 | 90% | 1140 | 1262 | 35096 |
ET7205 | 2021-03-22 | 90% | 1038 | 1158 | 6838 |
FD4555 | 2021-06-15 | 91% | 269 | 295 | 2404 |
FN5519 | 2021-12-30 | 30% | 16 | 54 | 0 |
FM9088 | 2022-01-10 | 54% | 39 | 74 | 0 |
Previously I calculated the correlation with excess mortality in the record-level data for deaths per doses released, but now I looked at deaths per doses administered instead. But the correlation was still only about 0.05, and when I looked at adverse event reports or serious adverse event reports instead of deaths, the correlation became negative. The authors of the batch study wrote that they received the data from SUKL on July 4th 2023, so I only included doses administered up to that date:
rec=fread("CR_records.csv") buck=fread("xz -dc czbucketsbatchkeep.csv.xz") t=data.table(readxl::read_excel("SUKL-batches-AE.xlsx",sheet=2)) k=grep("Datum_",colnames(rec));k2=k+1 d=data.table(date=`class<-`(unlist(rec[,..k],,F),"Date"),batch0=unlist(rec[,..k2],,F))[!is.na(date)] d=d[,batch:=toupper(trimws(batch0)),batch0][date<="2023-7-4",.(administered=.N),batch] t=merge(t,d) b=buck[,.(dead=sum(dead),alive=sum(alive)),.(batch,age=ifelse(age<15,0,pmin(age,95)))] b=merge(b,b[,.(base=sum(dead)/sum(alive)),.(age)])[,base:=base*alive] t=merge(t,b[,.(rate=sum(dead)/sum(base)),batch]) t[,cor(rate,deaths/administered)] # deaths per doses administered # [1] 0.050869 t[,cor(rate,no_reports/administered)] # adverse event reports per doses administered # [1] -0.08735131 t[,cor(rate,serious/administered)] # serious adverse event reports per doses administered # [1] -0.2521826
Previously I got a higher rate of serious adverse events per dose for Pfizer than Moderna when I used doses released as the denominator. However based on the record-level data, the percentage of doses administered out of doses released was about 55% for Pfizer but 43% for Moderna. So now when I used doses administered as the denominator instead, the rate of serious adverse events per dose became higher for Moderna than Pfizer (but my calculation was still not normalized for differences in the age distribution):
library(data.table);library(readxl) rename=\(x){u=unique(x);o=rep("Other",length(u)) o[grep("comirnaty|pfizer",u,T)]="Pfizer" o[grep("moderna|spikevax",u,T)]="Moderna" o[grep("nuvaxovid|novavax",u,T)]="Novavax" o[grep("janssen",u,T)]="Janssen" o[grep("vaxzevria|astrazeneca",u,T)]="AstraZeneca" o[match(x,u)]} f="SUKL-batches-AE.xlsx" download.file("https://github.com/PalackyUniversity/batch-dependent-safety/raw/main/data/SUKL-batches-AE.xlsx",f) s1=data.table(read_excel(f))[,.(released=sum(doses_per_batch)),.(type=rename(name))] s2=data.table(read_excel(f,sheet=2))[,.(reports=sum(no_reports),serious=sum(serious),dead=sum(deaths)),.(type=rename(name))] rec=fread("CR_records.csv") k=grep("Datum_",colnames(rec));k2=k+3 d=data.table(date=`class<-`(unlist(rec[,..k],,F),"Date"),type=rename(unlist(rec[,..k2],,F))) d=d[!is.na(date)&date<="2023-7-4",.(admin=.N),type] o=merge(merge(s1,s2),d)[,.(type,released,admin,adminperreleasedpct=admin/released*100,reports,serious,dead)] o[,reportsperadmin:=reports/admin*1e6] o[,seriousperadmin:=serious/admin*1e6] o[,deadperadmin:=dead/admin*1e6] o[order(-admin)]|>mutate_if(is.double,round)|>print(r=F)
Here's the output formatted as a table:
Manufacturer |
Doses released |
Doses administered |
Administered per released percent | Reports |
Serious reports | Deaths |
Reports per million administered |
Serious per million administered |
Deaths per million administered |
---|---|---|---|---|---|---|---|---|---|
Pfizer | 28343760 | 15680944 | 55 | 10712 | 5183 | 134 | 683 | 331 | 9 |
Moderna | 3798150 | 1643159 | 43 | 1575 | 583 | 26 | 959 | 355 | 16 |
AstraZeneca | 1113200 | 887601 | 80 | 1512 | 695 | 44 | 1703 | 783 | 50 |
Janssen | 776800 | 415077 | 53 | 555 | 241 | 10 | 1337 | 581 | 24 |
Novavax | 632000 | 11190 | 2 | 14 | 4 | 0 | 1251 | 357 | 0 |
Other | 64800 | 765 | 1 | 18 | 13 | 2 | 23529 | 16993 | 2614 |
The Czech MoH has published CSV files which show the daily number of COVID deaths by vaccination status: https://onemocneni-aktualne.mzcr.cz/api/v2/covid-19. But unfortunately the files are missing age groups, so in order to estimate the percentage of unvaccinated people in the age groups which accounted for the COVID deaths, I calculated a weighted average for the percentage of unvaccinated people in each age group where the weight was the total number of COVID deaths the age group had in 2020-2023. It gave me a figure of only about 15-16% unvaccinated people from December 2021 onwards.
In the code below the number of COVID deaths in each age group is compared to the percentage of unvaccinated people by age group in December 2022. You can see that most COVID deaths are in the 5-year age groups between 65 and 94 which all have below 20% unvaccinated people:
> icd=fread("http://sars2.net/f/czicd.csv.gz")[icd=="U07",.(coviddead=sum(dead)),.(age=ifelse(age<20,0,age))] > b=fread("http://sars2.net/f/czbucketsdaily.csv.gz")[date>="2022-12-01",.(alive=sum(alive)),.(age=ifelse(age<20,0,pmin(age,95)%/%5*5),unvax=dose==0)] > merge(icd,b[order(age)][,.(unvaxpct=round(alive[unvax==T]/sum(alive)*100)),age])|>print(r=F) age coviddead unvaxpct 0 14 79 20 8 28 25 37 33 30 50 36 35 97 35 40 210 30 45 420 26 50 625 24 55 1169 23 60 2172 20 65 3972 15 70 6771 12 75 7834 11 80 7154 11 85 6347 12 90 3626 19 95 1013 38
The following code calculates an excess mortality percentage among unvaccinated people relative to the weighted percentage of unvaccinated people in the whole population:
system("wget onemocneni-aktualne.mzcr.cz/api/v2/covid-19/{ockovani-umrti,umrti}.csv sars2.net/f/cz{icd,bucketsdaily}.csv.gz") u=fread("ockovani-umrti.csv") u=u[,.(totaldead=sum(zemreli_celkem),unvaxdead=sum(zemreli_bez_ockovani)),.(month=substr(datum,1,7))] u[,unvaxdeadpct:=unvaxdead/totaldead*100] um=fread("umrti.csv") u=rbind(um[,.(totaldead=.N,unvaxdead=.N,unvaxdeadpct=100),.(month=substr(datum,1,7))][month<2021],u) icd=fread("czicd.csv.gz")[icd=="U07",.(dead=sum(dead)),.(age=pmax(age,20))] b=fread("czbucketsdaily.csv.gz") uv=b[,.(alive=sum(as.double(alive))),.(age=pmin(95,pmax(age,20))%/%5*5,month=substr(date,1,7),dose)] uv=uv[,.(unvaxpct=100*alive[dose==0]/sum(alive)),.(month,age)] uv=merge(icd,uv)[,.(unvaxpoppct=weighted.mean(unvaxpct,dead)),.(month)] uv=rbind(uv,u[!month%in%uv$month,.(month,unvaxpoppct=uv[month=="2022-12",unvaxpoppct])]) me=merge(u,uv)[,ratio:=round(unvaxdeadpct/unvaxpoppct,1)] me[,4:5]=round(me[,4:5]);print(me,r=F)
The output shows that from May 2021 onwards the percentage of COVID deaths in unvaccinated people was about 2 to 4 times higher than the weighted percentage of the unvaccinated population size:
month totaldead unvaxdead unvaxdeadpct unvaxpoppct ratio 2020-03 35 35 100 100 1.0 2020-04 207 207 100 100 1.0 2020-05 76 76 100 100 1.0 2020-06 29 29 100 100 1.0 2020-07 35 35 100 100 1.0 2020-08 45 45 100 100 1.0 2020-09 248 248 100 100 1.0 2020-10 2929 2929 100 100 1.0 2020-11 5004 5004 100 100 1.0 2020-12 3409 3409 100 100 1.0 2021-01 4858 4843 100 97 1.0 2021-02 4167 4055 97 86 1.1 2021-03 6100 5821 95 66 1.4 2021-04 2586 2317 90 47 1.9 2021-05 677 580 86 30 2.8 2021-06 83 68 82 24 3.4 2021-07 16 12 75 22 3.5 2021-08 29 20 69 20 3.4 2021-09 48 31 65 19 3.4 2021-10 341 188 55 19 2.9 2021-11 2467 1388 56 17 3.2 2021-12 2999 1749 58 16 3.6 2022-01 1043 677 65 15 4.2 2022-02 1452 736 51 15 3.3 2022-03 920 439 48 15 3.2 2022-04 467 197 42 15 2.8 2022-05 112 47 42 15 2.8 2022-06 26 10 38 15 2.6 2022-07 198 68 34 15 2.3 2022-08 341 102 30 15 2.0 2022-09 271 93 34 15 2.3 2022-10 470 170 36 15 2.5 2022-11 214 81 38 15 2.6 2022-12 264 85 32 15 2.2 2023-01 159 48 30 15 2.1 2023-02 135 49 36 15 2.5 2023-03 215 67 31 15 2.1 2023-04 99 21 21 15 1.5 2023-05 34 9 26 15 1.8 2023-06 5 0 0 15 0.0 2023-07 3 1 33 15 2.3 2023-08 6 3 50 15 3.4 2023-09 30 12 40 15 2.7 2023-10 84 17 20 15 1.4 2023-11 143 44 31 15 2.1 2023-12 293 93 32 15 2.2 2024-01 120 32 27 15 1.8 2024-02 23 5 22 15 1.5 2024-03 8 3 38 15 2.6 2024-04 0 0 NaN 15 NaN 2024-05 2 0 0 15 0.0 2024-06 2 1 50 15 3.4 2024-07 0 0 NaN 15 NaN month totaldead unvaxdead unvaxdeadpct unvaxpoppct ratio
The date of hospitalization was missing from the CSV file I used to get hospitalizations by vaccine type in a previous section, but there's another file which includes the week of hospitalization but not vaccine type: https://onemocneni-aktualne.mzcr.cz/api/v2/covid-19/ockovani-hospitalizace-tyden.csv.
I used the file to make the plot below, which shows that during the COVID wave in December 2021, unvaccinated people had about 6 times higher age-standardized hospitalization rate than vaccinated people. However the difference between the rates got smaller over time, which may have been because unvaccinated people acquired natural immunity over time. So by December 2023 the rate was only about 1.5 times higher in unvaccinated people than in vaccinated people:
library(data.table);library(ggplot2) # system("wget onemocneni-aktualne.mzcr.cz/api/v2/covid-19/ockovani-hospitalizace-tyden.csv sars2.net/f/cz{bucketsdaily.csv.gz,census2021pop.csv}") t=fread("ockovani-hospitalizace-tyden.csv") t=t[,.(date=tyden_od+3,hosp=c(hospitalizovani_celkem-hospitalizovani_bez_ockovani,hospitalizovani_bez_ockovani),status=rep(c("Vaccinated","Unvaccinated"),each=.N),age=as.integer(sub("[-+].*","",vekova_skupina)))] t[age<25,age:=0] a=t[,spline(date,hosp/7,xout=min(t$date):max(t$date)),.(age,status)][,.(hosp=sum(y)),.(age,status,month=substr(as.IDate(x),1,7))] b=fread("czbucketsdaily.csv.gz")[,.(pop=sum(as.double(alive))),.(month=substr(date,1,7),status=ifelse(dose==0,"Unvaccinated","Vaccinated"),age=ifelse(age<25,0,pmin(age,90)%/%5*5))] last=b[month=="2022-12",2:4] b=rbind(b,cbind(last,month=rep(setdiff(a$month,b$month),each=nrow(last)))[,pop:=pop/31*days_in_month(paste0(month,"-1"))]) me=merge(a,b) me=merge(me,me[,.(base=sum(hosp)/sum(pop)),age])[,base:=base*pop] me=merge(me,fread("czcensus2021pop.csv")[,.(stdpop=sum(pop)),.(age=ifelse(age<25,0,pmin(age,95)%/%5*5))]) me[,month:=as.Date(paste0(month,"-1"))] p=me[,.(hosp=sum(hosp/pop*stdpop/sum(stdpop)*365e5)),.(status,month)] xstart=as.Date("2021-1-1");xend=as.Date("2024-8-1") xlab=c(rbind("",2021:2024),"") xbreak=sort(c(as.Date("2024-4-15"),xend,seq(xstart,as.Date("2024-1-1"),"6 month"))) cand=c(sapply(c(1,2,5),\(x)x*10^c(-10:10))) ymax=max(p$hosp,na.rm=T) ystep=cand[which.min(abs(cand-ymax/6))] yend=ystep*ceiling(ymax/ystep) yend=max(p$hosp) ystart=0;ybreak=seq(ystart,yend,ystep) color=hsv(c(22,0)/36,1,.8) ggplot(p,aes(x=month+15,y=hosp))+ geom_vline(xintercept=seq(xstart,xend,"3 month"),color="gray85",linewidth=.3,lineend="square")+ geom_vline(xintercept=seq(xstart,xend,"year"),color="gray65",linewidth=.3,lineend="square")+ geom_vline(xintercept=c(xstart,xend),linewidth=.3,lineend="square")+ geom_hline(yintercept=c(ystart,0,yend),linewidth=.3,lineend="square")+ geom_line(aes(color=status),linewidth=.4)+ geom_point(aes(color=status),size=.5)+ labs(title="Age-standardized hospitalization rate per 100,000 person-years",subtitle=paste0("The standard population is the 2021 census population by 5-year age groups.")|>stringr::str_wrap(76),x=NULL,y=NULL)+ scale_x_date(limits=c(xstart,xend),breaks=xbreak,labels=xlab)+ scale_y_continuous(limits=c(ystart,yend),breaks=seq(ystart,yend,ystep),labels=\(x)ifelse(x>=1e3,paste0(x/1e3,"k"),x))+ scale_color_manual(values=color)+ coord_cartesian(clip="off",expand=F)+ theme(axis.text=element_text(size=7,color="black"), axis.ticks=element_line(linewidth=.3), axis.ticks.x=element_line(color=alpha("black",c(1,0))), axis.ticks.length=unit(3,"pt"), legend.direction="vertical", legend.background=element_rect(fill="white",color="black",linewidth=.3), legend.key.height=unit(11,"pt"), legend.key.width=unit(15,"pt"), legend.key=element_rect(fill="white"), legend.margin=margin(-1,5,3,3.5), legend.justification=c(1,1), legend.position=c(1,1), legend.spacing.x=unit(1.5,"pt"), legend.text=element_text(size=7), legend.title=element_blank(), panel.grid.major=element_blank(), panel.background=element_rect(fill="white"), plot.margin=margin(3,5,4,4), plot.subtitle=element_text(size=6.7,margin=margin(0,0,4,0)), plot.title=element_text(size=7.5,face="bold",margin=margin(2,0,4,0))) ggsave("0.png",width=3.8,height=2.1,dpi=300*4) system("magick 0.png -resize 25% 1.png")
The file ockovani-hospitalizace-tyden.csv
has hospitalizations by age group and vaccination status, but it's missing data for 2020. The file nakazeni-hospitalizace-testy.csv
has hospitalizations by age group and it includes data for 2020, but it's missing the vaccination status, and for some reason it has about 27% more hospitalizations in 2021-2023 than the other file:
> ho1=fread("ockovani-hospitalizace-tyden.csv") > ho2=fread("nakazeni-hospitalizace-testy.csv") > d1=ho1[,.(tyden=sum(hospitalizovani_celkem)),.(year=substr(tyden,1,4))] > d2=ho2[,.(testy=sum(nove_hospitalizace,na.rm=T)),.(year=substr(datum,1,4))] > merge(d1,d2)[,ratio:=testy/tyden][] year tyden testy ratio 1: 2020 5139 65095 12.666861 2: 2021 108847 133201 1.223745 3: 2022 55494 72212 1.301258 4: 2023 17133 24564 1.433724 5: 2024 1460 2266 1.552055
Therefore I wasn't able to merge the two files so I could've included hospitalizations from 2020 in the next two plots. And the next two plots are also likely to miss about 27% of hospitalizations in 2021-2023.
The hospitalization figures at OWID seem to roughly match the file nakazeni-hospitalizace-testy.csv
:
$ wget covid.ourworldindata.org/data/owid-covid-data.csv $ csvtk cut -f location,date,weekly_hosp_admissions owid-covid-data.csv|grep Czechia|grep 2021|awk -F, '{x+=$3}END{print x/7}' 134893
But anyway the following plot still shows that in ages 12-19 where there's the lowest percentage of vaccinated people, the total number of hospitalizations during the peak in February 2021 is higher than the number of hospitalizations at the start of 2021. But in ages 80+ where there's the highest percentage of vaccinated people, the total number of hospitalizations during the peak around February 2021 is much smaller than at the start of 2021.
And also if you add together the blue and red lines during the peak in March 2021, then the combined height of the lines is lower in old age groups that got vaccinated earlier.
However it's also weird that during the peak in February 2021, ages 12-19 have almost the same number of hospitalizations in vaccinated and unvaccinated people even though the percentage of vaccinated people is not that high.
The next plot shows the hospitalization rate per million people instead of the raw number of hospitalizations. Now vaccinated people in ages 12-19 have a big spike in the hospitalization rate around April 2021, but it's probably because vulnerable groups of people had been priorized to be vaccinated early. But otherwise unvaccinated people generally have a much higher hospitalization rate than vaccinated people up to around mid-2022, after which the ratio between vaccinated and unvaccinated people gradually gets smaller over time, which might be because unvaccinated people end up acquiring natural immunity over time:
library(data.table);library(ggplot2) ma=\(x,b=1,f=b){x[]=rowMeans(embed(c(rep(NA,b),x,rep(NA,f)),f+b+1),na.rm=T);x} agecut=\(x,y)cut(x,c(y,Inf),paste0(y,c(paste0("-",y[-1]-1),"+")),T,F) ages=c(12,20,30,50,65,80) t=fread("ockovani-hospitalizace-tyden.csv") t=t[,.(date=tyden_od+3,hosp=c(hospitalizovani_celkem-hospitalizovani_bez_ockovani,hospitalizovani_bez_ockovani),status=rep(c("Vaccinated","Unvaccinated"),each=.N),age=agecut(as.integer(sub("[-+].*","",vekova_skupina)),ages))] a=t[,spline(date,hosp/7,xout=min(t$date):max(t$date)),.(age,status)][,.(hosp=sum(y)),.(age,status,date=as.IDate(x))] hosp=fread("nakazeni-hospitalizace-testy.csv") hosp=hosp[vekova_kategorie!="-"&datum<"2021-1-1",.(hosp=sum(nove_hospitalizace,na.rm=T),status="Unvaccinated"),.(age=agecut(as.numeric(sub("[-+].*","",vekova_kategorie)),ages),date=datum)] a=rbind(a[date>="2021-1-1"],hosp) b=fread("czbucketsdaily.csv.gz")[,.(pop=sum(as.double(alive))),.(date,status=ifelse(dose==0,"Unvaccinated","Vaccinated"),age=agecut(age,ages))] last=b[date=="2022-12-31",2:4] b=rbind(b,cbind(last,date=rep(as.IDate(setdiff(a$date,b$date)),each=nrow(last)))) me=merge(a,b) me=merge(me,me[,.(base=sum(hosp)/sum(pop)),age])[,base:=base*pop] me=me[,.(hosp=sum(hosp),pop=sum(pop)),.(status,date,age)] # first plot p=me[,.(y=ma(hosp/pop,10),date),.(z=status,age)] # # second plot # p=me[,.(y=ifelse(pop<2e3,NA,ma(hosp/pop*1e6,10)),date),.(z=status,age)] xstart=as.Date("2021-01-01");xend=as.Date("2024-1-1") xbreak=seq(xstart,xend,"6 month");xlab=c(rbind("",2021:2023),"") p=p[date>=xstart&date<=xend] p=merge(p,p[,.(max=max(y)),age]) v=merge(b[status=="Vaccinated",.(age,date,vax=pop)],b[,.(pop=sum(pop)),.(date,age)]) p=rbind(p,v[,.(y=vax/pop,z="Percent vaccinated",age,date,max=1)]) p=p[!is.na(age)] p$z=factor(p$z,c("Unvaccinated","Vaccinated","Percent vaccinated")) color=c(hsv(c(7,0)/12,1,.8),"black") ggplot(p,aes(x=date))+ facet_wrap(~age,ncol=1,strip.position="top")+ geom_vline(xintercept=seq(xstart,xend,"month"),linewidth=.2,color="gray92")+ geom_vline(xintercept=seq(xstart,xend,"3 month"),linewidth=.25,color="gray83")+ geom_vline(xintercept=seq(xstart,xend,"year"),linewidth=.25,color="gray66",lineend="square")+ geom_vline(xintercept=c(xstart,xend),linewidth=.25,lineend="square")+ geom_hline(yintercept=0:1,linewidth=.25,lineend="square")+ geom_line(aes(y=y/max,color=z),linewidth=.35)+ geom_label(data=p[!duplicated(age)],aes(label=sprintf("\n %s (%.0f) \n",age,max)),x=xend,y=1,lineheight=.5,hjust=1,vjust=1,size=2.3,fill=alpha("white",1),label.r=unit(0,"lines"),label.padding=unit(0,"lines"),label.size=.25)+ labs(x=NULL,y=NULL,title="Daily hospitalizations by vaccination status",subtitle="±10-day moving averages, not adjusted for population size. Displayed as percentage of the maximum value which is shown in parentheses."|>stringr::str_wrap(64))+ scale_x_date(limits=c(xstart,xend),breaks=xbreak,labels=xlab)+ scale_y_continuous(limits=0:1)+ scale_color_manual(values=color)+ coord_cartesian(clip="off",expand=F)+ guides(color=guide_legend(nrow=1,byrow=F))+ theme(axis.text=element_text(size=7,color="black"), axis.text.y=element_blank(), axis.ticks=element_blank(), axis.ticks.x=element_line(color=alpha("black",c(1,0)),linewidth=.25), axis.ticks.length=unit(0,"lines"), axis.ticks.length.x=unit(.15,"lines"), legend.background=element_blank(), legend.box.spacing=unit(0,"pt"), legend.direction="horizontal", legend.justification=c(1,.5), legend.key.height=unit(.7,"lines"), legend.key.width=unit(1.1,"lines"), legend.key=element_rect(fill=alpha("white",0)), legend.margin=margin(.15,0,0,0,"lines"), legend.position="bottom", legend.spacing.x=unit(.1,"lines"), legend.text=element_text(size=7,vjust=.5), legend.title=element_blank(), panel.spacing=unit(0,"lines"), panel.background=element_rect(fill="white"), panel.grid=element_blank(), plot.margin=margin(.3,.3,.3,.3,"lines"), plot.subtitle=element_text(size=6.8,margin=margin(0,0,.6,0,"lines")), plot.title=element_text(size=7.2,face="bold",margin=margin(.1,0,.5,0,"lines")), strip.background=element_blank(), strip.text=element_blank()) ggsave("1.png",width=3,height=4.7,dpi=450)
The file ockovani-profese.csv
has about 19 million rows with one row per vaccine dose. There's also a column for the name of the organization or individual who gave each vaccine dose.
Elderly people who live in a long-term care home have much higher age-normalized mortality rates than elderly people who don't live in a long-term care home, which is one of the confounding factors that is not account for by simple age-standardized mortality rates.
I wanted to check if Pfizer or Moderna had a higher percentage of vaccines given to elderly homes, so I searched for vaccinators whose name matched "Domov pro seniory", "Domov seniorů", or "Penzion pro seniory". There were only 7,708 matching rows, but it might be if people in elderly homes got vaccinated at a hospital, or if the name of the individual person who gave the vaccine was listed as the vaccinator and not the name of an elderly care facility:
> download.file("https://onemocneni-aktualne.mzcr.cz/api/v2/covid-19/ockovani-profese.csv","ockovani-profese.csv") # about 4 GiB > pro=fread("ockovani-profese.csv") > pro[,.N,zarizeni_nazev][grep("Domov pro seniory|Domov seniorů|Penzion pro seniory",zarizeni_nazev)][order(-N)]|>print(r=F) zarizeni_nazev N Domov pro seniory Zahradní Město - ordinace 1324 Domov pro seniory Háje, PL pro dospělé 1014 Domov pro seniory Chodov, praktický lékař 966 Domov pro seniory Malešice 959 Domov pro seniory Kobylisy, praktický lékař 536 Domov pro seniory Krč 512 Domov pro seniory Kamenec, Slezská Ostrava, p.o. 490 Domov seniorů Vysočany s.r.o. 435 Domov pro seniory Kladno 361 Domov seniorů U Přehrady, z. s. 322 Domov pro seniory Osoblaha, příspěvková organizace 185 Domov pro seniory Frýdek-Místek, p. o. 175 Domov pro seniory Holásecká, p. o. 103 Domov pro seniory Seniorcentrum Slavkov, p. o. 101 Domov pro seniory U Pramene Louny 86 Penzion pro seniory Frýdek-Místek, p. o. 59 Domov pro seniory Foltýnova, p. o. 50 Domov pro seniory Nopova,příspěvková organizace 18 Domov pro seniory Kosmonautů, p. o. 8 Domov seniorů Jankov, poskytovatel soc. služeb 4
But anyway, compared to a baseline of all vaccine types, the excess age-normalized proportion of rows where the vaccinator was one of the elderly homes listed above was about 50% for SPIKEVAX but about 8% for Comirnaty (which has now become another possible explanation for why Moderna gets a higher age-normalized mortality rate than Pfizer):
> d=pro[,.N,.(vaxtype=vakcina,vaccinator=zarizeni_nazev,age=vekova_skupina)] > d=d[,.(N=sum(N),match=sum(N*grepl("Domov pro seniory|Domov seniorů|Penzion pro seniory",vaccinator))),.(vaxtype,age)] > d=merge(d,d[,.(base=sum(match)/sum(N)),age])[,base:=base*N] > d[,.(excess=round((sum(match)/sum(base)-1)*100),match=sum(match),N=sum(N)),vaxtype][,matchpct:=match/N*100][order(-N)][N>1e4]|>print(r=F) vaxtype excess match N matchpct Comirnaty 8 6071 15037534 0.0403723110 # excess is normalized for age but matchpct is not SPIKEVAX 50 1111 1634938 0.0679536472 VAXZEVRIA -83 104 887562 0.0117174913 Comirnaty Original/Omicron BA.4/BA.5 -38 158 436729 0.0361780418 COVID-19 Vaccine Janssen -98 2 415094 0.0004818186 Comirnaty Omicron XBB.1.5. -36 172 375130 0.0458507717 Comirnaty Original/Omicron BA.1 10 89 121312 0.0733645476 Comirnaty 5-11 -100 0 112874 0.0000000000 Nuvaxovid -100 0 11311 0.0000000000
The file ockovani-profese.csv
includes a vaccinator ID and vaccinator name on all 19,047,050 rows. However it includes a different set of vaccinator IDs and vaccinator names than the file ockovaci-zarizeni.csv
which has information about each vaccinator. And regardless of whether I tried to join the two files by the names or the IDs, about 15 million out of about 19 million rows in ockovani-profese.csv
could not be joined:
File | Unique IDs | Unique names |
---|---|---|
ockovaci-zarizeni.csv
| 5571 | 5365 |
ockovani-profese.csv
| 5663 | 5439 |
Shared by both files | 5391 | 5168 |
Doses included by join | 3979039 | 4052183 |
The proportion of doses whose vaccinator ID was not missing from the file for vaccinator information was about 11% for Comirnaty, 52% for SPIKEVAX, and 81% for VAXZEVRIA (which again shows that the vaccine brands were not just distributed randomly like Kirsch has argued, since Pfizer vaccines had by far the highest percentage of doses whose vaccinator ID was missing from the ockovaci-zarizeni.csv
file):
> download.file("https://onemocneni-aktualne.mzcr.cz/api/v2/covid-19/ockovani-zarizeni.csv","ockovani-zarizeni.csv") > zar=fread("ockovaci-zarizeni.csv") > d=pro[,.(joined=sum(zarizeni_kod%in%zar$zarizeni_kod),doses=.N),.(vaxtype=vakcina)] > d[,joinedpct:=round((joined/doses)*100)][order(-joined)][joined>1e4]|>print(r=F) vaxtype joined doses joinedpct Comirnaty 1606601 15037534 11 # Pfizer (by far lowest joined percentage) SPIKEVAX 850742 1634938 52 # Moderna VAXZEVRIA 714750 887562 81 # AstraZeneca (highest joined percentage) Comirnaty Omicron XBB.1.5. 278685 375130 74 COVID-19 Vaccine Janssen 220765 415094 53 Comirnaty Original/Omicron BA.4/BA.5 186579 436729 43 Comirnaty Original/Omicron BA.1 72163 121312 59 Comirnaty 5-11 40573 112874 36
In the file for vaccinator information, I believe the field prakticky_lekar
indicates if the vaccinator was a doctor who is a practicioner, but it might not necessarily mean a general practicioner. For the approximately 4 million out of 19 million rows in ockovani-profese.csv
where the vaccinator ID was not missing from the file for vaccinator information, the excess age-normalized percentage of doses where the prakticky_lekar
field was true was about -7% for Comirnaty but 7% for SPIKEVAX:
> vaccinators=zar[,.(vaccinator=zarizeni_kod,practioner=!is.na(zar$prakticky_lekar))] > d=merge(pro[,.N,.(vaxtype=vakcina,vaccinator=zarizeni_kod,age=vekova_skupina)],vaccinators) > d=d[,.(N=sum(N),match=sum(practioner*N)),.(vaxtype,age)] > d=merge(d,d[,.(base=sum(match)/sum(N)),age])[,base:=base*N] > d[,.(excesspct=round((sum(match)/sum(base)-1)*100),joined=sum(N)),vaxtype][order(-joined)][joined>=1e4]|>print(r=F) vaxtype excesspct joined Comirnaty -7 1606601 SPIKEVAX 7 850742 VAXZEVRIA 3 714750 Comirnaty Omicron XBB.1.5. 4 278685 COVID-19 Vaccine Janssen 5 220765 Comirnaty Original/Omicron BA.4/BA.5 3 186579 Comirnaty Original/Omicron BA.1 3 72163 Comirnaty 5-11 0 40573
When I looked at only first doses instead of all doses, Comirnaty now had so many doses missing that there were three other vaccine types that had a higher number of doses that could be joined by the vaccinator ID. But now the excess age-normalized percentage of doses where the prakticky_lekar
field was true was about -30% for Comirnaty but about 11% for SPIKEVAX:
> d=merge(pro[poradi_davky==1,.N,.(vaxtype=vakcina,vaccinator=zarizeni_kod,age=vekova_skupina)],vaccinators) > d=d[,.(N=sum(N),match=sum(practioner*N)),.(vaxtype,age)] > d=merge(d,d[,.(base=sum(match)/sum(N)),age])[,base:=base*N] > d[,.(excesspct=round((sum(match)/sum(base)-1)*100),joined=sum(N)),vaxtype][order(-joined)][joined>=1e4]|>print(r=F) vaxtype excesspct joined VAXZEVRIA 4 359921 SPIKEVAX 11 272755 COVID-19 Vaccine Janssen 7 218950 Comirnaty -30 209720 Comirnaty 5-11 0 21275
For about 7 million rows where the vaccinator ID was missing from the ockovaci-zarizeni.csv
file, the vaccinator ID was included in the file prehled-ockovacich-mist.csv
(which means overview of vaccination sites):
> pro=fread("ockovani-profese.csv") > mist=fread("prehled-ockovacich-mist.csv") > pro[zarizeni_kod%in%mist$nrpzs_kod,.N] [1] 6863264
However the prehled-ockovacich-mist.csv
file only includes the following columns but not the column which indicates if the vaccinator was a doctor practicioner:
Column | Translation | Sample value |
---|---|---|
ockovaci_misto_id | vaccination site ID | 1185422c-106a-4606-8828-2e53d31860e0 |
ockovaci_misto_nazev | vaccination site name | (Bez registrace) FYZIOklinika, Praha 11 Chodov, Machkova 1642/2 |
okres_nuts_kod | district NUTS code | CZ0100 |
operacni_status | operational status | 1 |
ockovaci_misto_adresa | vaccination site address | Machkova 1642/2, Praha 11, Chodov |
latitude | latitude | 50.03269 |
longitude | longitude | 14.51327 |
ockovaci_misto_typ | vaccination site type | WALKIN |
nrpzs_kod | NRPZS code | 24222101000 |
minimalni_kapacita | minimum capacity | 250 |
bezbarierovy_pristup | barrier-free access | 1 |
In an earlier section I combined Excel files published by the Czech Statistical Office in order to generate a CSV file that shows yearly deaths grouped by the underlying cause of death and age group: czech2.html#Deaths_by_ICD_code_region_age_group_and_year, https://csu.gov.cz/produkty/zemreli-podle-seznamu-pricin-smrti-pohlavi-a-veku-v-cr-krajich-a-okresech-fgjmtyk2qr.
The yearly number of deaths in the Excel files is otherwise identical to the record-level data except the record-level data is missing a single death in 2021:
> rec=fread("CR_records.csv") > reclev=rec[,.(reclev=.N),.(year=year(DatumUmrti))] > eurostat=fread("http://sars2.net/f/czpopdead.csv")[,.(eurostat=sum(dead)),year] > icd=fread("http://sars2.net/f/czicd.csv.gz")[,.(icd=sum(dead)),year] > merge(icd,merge(reclev,eurostat)) year icd reclev eurostat 1: 2020 129289 129289 129289 2: 2021 139891 139890 139891 3: 2022 120219 120219 120219
But anyway, I now used the data for deaths by underlying cause of death to make the plot below, where the left side shows ASMR grouped by the underlying cause of death, and the right side shows the excess ASMR percent relative to a 2013-2019 linear trend. I used the 2021 census population by 5-year age groups as the standard population.
There's almost no excess deaths in 2021 and 2022 if the deaths where the underlying cause of death was COVID are excluded. For example in 2021 there was about 25% total excess ASMR but it fell to only about 3% when COVID was excluded. So if vaccines were killing a huge number of people like Kirsch seems to suggest, then did the people all die of COVID?
Turbo cancer is nowhere to be seen either, because the excess percentage of deaths with an underlying cause of neoplasms is about -3% in 2021 and -1% in 2022. (But Ethical Skeptic would probably argue it's because I committed the crime of torfuscation, which means that I didn't account for mortality displacement by adjusting my baseline for the pull forward effect.)
In 2020-2022 there is however a clear increase in deaths under the ICD chapter I which consists of diseases of the circulatory system, even though the excess mortality for chapter I was the highest in 2020 which is difficult to blame on the vaccines.
library(data.table) cul=\(x,y)y[cut(x,c(y,Inf),y,T,F)] t=fread("http://sars2.net/f/czicd.csv.gz") icd=read.csv(sep=":",text="start:end:name A00:B99:Certain infectious and parasitic diseases C00:D48:Neoplasms D50:D89:Diseases of blood and blood-forming organs and certain immune disorders E00:E88:Endocrine, nutritional and metabolic diseases F01:F99:Mental and behavioural disorders G00:G98:Diseases of the nervous system H00:H93:Diseases of the eye and adnexa; Diseases of the ear and mastoid process I00:I99:Diseases of the circulatory system J00:J98:Diseases of the respiratory system K00:K92:Diseases of the digestive system L00:L98:Diseases of the skin and subcutaneous tissue M00:M99:Diseases of the musculoskeletal system and connective tissue N00:N98:Diseases of the genitourinary system O00:O99:Pregnancy, childbirth and the puerperium P00:P96:Certain conditions originating in the perinatal period Q00:Q99:Congenital malformations, deformations and chromosomal abnormalities R00:R99:Symptoms, signs and abnormal findings, not elsewhere classified U00:U99:Codes for special purposes V01:Y89:External causes of morbidity and mortality") u=unique(t$icd) w=apply(outer(u,icd$start,">=")&outer(u,icd$end,"<="),1,which) icdname=paste0(icd$start,"-",icd$end," (",icd$name,")")[w][match(t$icd,u)] icdname[t$icd=="U07"]="U07.1 (COVID-19)" t$cause=icdname d=t[,.(dead=sum(dead)),.(cause=factor(cause),age,year)] d=rbind(d,d[!grep("COVID",cause),.(dead=sum(dead),cause="Not COVID"),.(year,age)],d[,.(dead=sum(dead),cause="Total"),.(year,age)]) pop=fread("http://sars2.net/f/czpopdead.csv")[,.(pop=sum(pop)),.(age=cul(age,unique(t$age)),year)] std=fread("http://sars2.net/f/czcensus2021pop.csv")[,.(std=sum(pop)),.(age=cul(age,unique(t$age)))][,std:=std/sum(std)] d=merge(std,merge(d,pop)) a=d[,.(asmr=sum(dead/pop*std*1e5)),.(year,cause)] a=merge(a,a[year<=2019,.(year=2013:2022,trend=predict(lm(asmr~year),.(year=2013:2022))),cause],all=T) m1=a[,tapply(asmr,.(cause,year),c)] m2=a[,tapply(trend,.(cause,year),c)] m=(m1-m2)/ifelse(m1>m2,m2,m1)*100 disp=round((m1/m2-1)*100) hide=apply(is.na(m1),1,any);m[hide]=disp[hide]=NA maxcolor=100;m[is.infinite(m)]=maxcolor;exp=.8 pheatmap::pheatmap(abs(m)^exp*sign(m),filename="i1.png",display_numbers=disp, gaps_row=nrow(m)-2, cluster_rows=F,cluster_cols=F,legend=F,cellwidth=18,cellheight=18,fontsize=9,fontsize_number=8, border_color=NA,na_col="gray90", number_color=ifelse(abs(m)^exp>maxcolor^exp*.4,"white","black"), breaks=seq(-maxcolor^exp,maxcolor^exp,,256), colorRampPalette(hsv(rep(c(7/12,0),5:4),c(.9,.75,.6,.3,0,.3,.6,.75,.9),c(.4,.65,1,1,1,1,1,.65,.4)))(256)) pal2=colorRampPalette(hsv(c(21,21,21,16,11,6,3,0,0,0)/36,c(0,.25,rep(.5,8)),c(rep(1,8),.5,0)))(256) m=m1 dig=floor(apply(log10(m),1,max,na.rm=T)) disp=m;disp[]=ifelse(dig[row(m)]<=1,sub("^0","",sprintf("%.*f",1-dig,m)),round(m)) maxcolor=max(m,na.rm=T);exp=.7 pheatmap::pheatmap(m^exp,filename="i2.png",display_numbers=disp, gaps_row=nrow(m)-2, cluster_rows=F,cluster_cols=F,legend=F,cellwidth=18,cellheight=18,fontsize=9,fontsize_number=8, border_color=NA,na_col="gray90", number_color=ifelse(abs(m)^exp>.8*maxcolor^exp,"white","black"), breaks=seq(0,maxcolor^exp,,256),pal2) system("magick \\( i2.png -extent 780x \\) \\( i1.png -chop 14x \\) +append 1.png")
The plot below shows raw deaths and not ASMR. Now the only year when circulatory deaths are clearly elevated above the baseline is 2020, but in 2021 and 2022 circulatory deaths are close to the baseline (even though the method I used to calculate excess deaths in my previous plot was probably more accurate because it was based on ASMR and not raw deaths, and in my previous plot I also got fairly high excess mortality for circulatory deaths in 2021 and 2022):
library(data.table) cul=\(x,y)y[cut(x,c(y,Inf),y,T,F)] t=fread("http://sars2.net/f/czicd.csv.gz") codes=read.csv(sep=":",text="start:end:name C00:D48:Neoplasms I00:I99:Circulatory J00:J98:Respiratory V01:Y89:External causes U07:U07:COVID-19") u=unique(t$icd) w=apply(outer(u,codes$start,">=")&outer(u,codes$end,"<="),1,which) w[lengths(w)==0]=NA;w=unlist(w) icdname=paste0(codes$start,"-",codes$end," (",codes$name,")") p=t[,.(y=sum(dead)),.(z=factor(icdname[w[match(t$icd,u)]],icdname),x=year)][!is.na(z)] xstart=min(p$x);xend=max(p$x) xbreak=seq(xstart-.5,xend+.5,.5);xlab=c(rbind("",xstart:xend),"") ystart=0;yend=max(p$y);ystep=1e4;ybreak=seq(ystart,yend,ystep) color=c("#bb00bb","red",hsv(22/36,1,.7),hsv(4/36,.5,.5),hsv(9/36,1,.6)) lm=p[x<2020,.(x=2013:2022,y=predict(lm(y~x),.(x=2013:2022))),z] ggplot(p,aes(x,y))+ geom_vline(xintercept=c(xstart-.5,xend+.5),linewidth=.3,lineend="square")+ geom_hline(yintercept=c(ystart,yend),linewidth=.3,lineend="square")+ geom_line(aes(color=z),linewidth=.4)+ geom_line(data=lm,aes(color=z),linetype=2,linewidth=.4,alpha=.3)+ geom_line(data=lm[x<2020],aes(color=z),linetype=2,linewidth=.4)+ geom_point(data=p[!grep("Linear",z)],aes(color=z),size=.5,show.legend=F)+ labs(title="Yearly deaths in the Czech Republic by underlying cause of death",subtitle=paste0("Source: sars2.net/f/czicd.csv.gz (compiled from spreadsheets in dataset \"Zemřelí podle seznamu příčin smrti, pohlaví a věku v ČR, krajích a okresech - 2013-2022\"). The dashed line is a 2013-2019 linear regression."|>stringr::str_wrap(88)),x=NULL,y=NULL)+ scale_x_continuous(limits=c(xstart-.5,xend+.5),breaks=xbreak,labels=xlab)+ scale_y_continuous(limits=c(ystart,yend),breaks=ybreak,labels=\(x)ifelse(x>=1e3,paste0(x/1e3,"k"),x))+ scale_color_manual(values=color)+ coord_cartesian(clip="off",expand=F)+ theme(axis.text=element_text(size=7,color="black"), axis.ticks=element_line(linewidth=.3), axis.ticks.x=element_line(color=alpha("black",c(1,0))), axis.ticks.length=unit(3,"pt"), legend.direction="vertical", legend.background=element_rect(fill=alpha("white",.8),color="black",linewidth=.3), legend.key.height=unit(9,"pt"), legend.key.width=unit(15,"pt"), legend.key=element_rect(fill="white"), legend.margin=margin(-2,5,3,4), legend.justification=c(0,.5), legend.position=c(0,.73), legend.spacing.x=unit(1.5,"pt"), legend.text=element_text(size=7), legend.title=element_blank(), panel.grid.major=element_blank(), panel.background=element_rect(fill="white"), plot.margin=margin(3,5,4,4), plot.subtitle=element_text(size=6.7,margin=margin(0,0,4,0)), plot.title=element_text(size=7.5,face="bold",margin=margin(2,0,4,0))) ggsave("0.png",width=4.2,height=2.9,dpi=400*4) system("magick 0.png -resize 25% 1.png")
Ethical Skeptic and Clare Craig have made plots which have showed that in US counties that later had a lower percentage of vaccinated people, there already tended to be a higher COVID mortality rate in 2020 before vaccination started. So I wanted to check if the same was true of the Czech Republic.
The Czech Republic is divided into 14 regions (kraj). They can be further divided into 77 LAU1 regions, which consist of 76 districts along with Prague. The LAU1 regions are the same as the NUTS4 regions. In the file umrti.csv
COVID deaths are classified under codes for the 77 NUTS4/LAU1 regions. But in the files ockovani-orp.csv
and ockovani-profese.csv
, the vaccinations classified under 208 ORP codes instead ("obce s rozšířenou působností", "municipality with extended jurisdiction"). I didn't find a way to map the ORP codes to NUTS4 codes. This file has information about the ORP codes, but it doesn't seem to include NUTS4 or LAU1 codes: https://data.gov.cz/dataset?iri=https%3A%2F%2Fdata.gov.cz%2Fzdroj%2Fdatov%C3%A9-sady%2F00025712%2F52c6e525fda1b5f842e169420a4d8d29.
So in the plot below I had to settle with plotting the kraj-level regions but not more fine-grained regions. When I compared the percentage of vaccinated people in 2024 against the yearly COVID ASMR, I got a negative correlation of -0.35 in 2020, -0.30 in 2021, and -0.23 in 2022, but in 2023 I got a positive correlation of 0.34. But the percentage of vaccinated people only ranges from about 60% to 65% across the regions, so the difference is so small that it probably won't have much effect on mortality from COVID (regardless of whether the effect would be due to vaccine efficacy or due to confounders):
library(data.table);library(ggplot2) # dlf=\(x,y,...){if(missing(y))y=sub(".*/","",x);for(i in 1:length(x))download.file(x[i],y[i],quiet=T,...)} # dlf("https://onemocneni-aktualne.mzcr.cz/api/v2/covid-19/ockovani.csv") # dlf("https://onemocneni-aktualne.mzcr.cz/api/v2/covid-19/umrti.csv") # dlf("https://csu.gov.cz/docs/107508/1cab7ab3-4d2e-4d0e-c2a8-425e3442218c/130181-23data2022.csv") # dlf("http://sars2.net/f/cznuts.csv") # dlf("http://sars2.net/f/czcensus2021pop.csv") vax=fread("ockovani.csv") um=fread("umrti.csv") pop=fread("130181-23data2022.csv") nuts=read.csv("cznuts.csv") covid=um[,.(dead=.N),.(year=year(datum),age=pmin(vek,95)%/%5*5,nuts=kraj_nuts_kod)] pop2=pop[uzemi_typ=="kraj"&!is.na(vek_txt)&pohlavi_txt!=""] pop2=pop2[,.(pop=sum(hodnota)),.(age=pmin(95,vek_txt)%/%5*5,name=uzemi_txt)] pop2=merge(pop2,nuts) covid=unique(rbind(covid,cbind(expand.grid(lapply(covid[,-4],unique)),dead=0)),by=1:3)[!is.na(age)] covid=merge(covid,pop2) std=fread("czcensus2021pop.csv")[,sum(pop),pmin(age,95)%/%5][,V1/sum(V1)] p=covid[,.(pop=sum(pop),asmr=sum(dead/pop*std[age/5+1]*1e5)),.(name,year,nuts)] p=merge(p,vax[,.(vax=sum(prvnich_davek)),.(nuts=kraj_nuts_kod)]) p[,vaxpct:=vax/pop*100] p$size=pmax(.2,log(p$pop,300)-1.2)^4+.2 xstart=58;xend=68;xstep=1;ystart=0;yend=400;ystep=50 xbreak=seq(xstart,xend,xstep) ybreak=seq(ystart,yend,ystep) cor=sapply(split(p,p$year),\(x)cor(x$asmr,x$vaxpct)) p$year=`levels<-`(factor(p$year),sprintf("%s (r ≈ %.2f)",2020:2024,cor)) smooth=do.call(rbind,lapply(split(p,p$year),\(x)data.frame(year=x$year[1],x=0:1000/10,y=predict(lm(asmr~vaxpct,x),list(vaxpct=seq(0,100,.1)))))) color=c("black",hcl(c(0,330,200,240)+15,c(130,110,120,100),c(50,60,50,40))) d=p[,.(x=vaxpct,y=asmr,year,name)][order(x)] ggplot(d,aes(x,y))+ geom_vline(xintercept=c(xstart,xend),linewidth=.3,lineend="square")+ geom_hline(yintercept=c(ystart,yend),linewidth=.3,lineend="square")+ # ggrepel::geom_text_repel(data=d[year==2021],aes(label=sub(" kraj$","",name),color=year),size=2.3,point.padding=2,show.legend=F,max.overlaps=Inf,segment.size=.3,min.segment.length=.1,direction="y",force=10,force_pull=2,box.padding=.1)+ geom_text(data=d[grep(2021,year)],aes(label=sub(" kraj$","",name),color=year),size=2.3,vjust=c(-.7,1.8)[c(2,1,2,2,1,2,2,1,2,2,1,2,1,2)],show.legend=F)+ geom_point(aes(color=year),stroke=0,size=p$size,alpha=1)+ geom_line(data=smooth,aes(group=year),color=color[smooth$year],linetype=2,linewidth=.4,show.legend=F)+ labs(x=NULL,y="COVID deaths per 100,000 people",title="Regions of Czech Republic: COVID ASMR by percentage of vaccinated people in 2024",caption="Point size indicates population size. ASMR was calculated by 5-year age groups using the 2021 census population as the standard population. Sources: nemocneni-aktualne.mzcr.cz/api/v2/covid-19, csu.gov.cz/produkty/ obyvatelstvo-podle-jednotek-veku-a-pohlavi-ve-spravnich-obvodech-obci-s-rozsirenou-pusobnosti.")+ scale_x_continuous(limits=c(xstart,xend),breaks=xbreak,labels=\(x)paste0(x,"%"))+ scale_y_continuous(limits=c(ystart,yend),breaks=ybreak)+ scale_color_manual(values=color)+ coord_cartesian(clip="off",expand=F)+ theme(axis.text=element_text(size=7,color="black"), axis.ticks=element_line(linewidth=.3), axis.ticks.length=unit(.15,"lines"), axis.title=element_text(face="bold",size=7), legend.direction="vertical", legend.background=element_rect(fill="white",color="black",linewidth=.3), legend.key.height=unit(11,"pt"), legend.key.width=unit(15,"pt"), legend.key=element_rect(fill="white"), legend.margin=margin(0,8,5,4), legend.justification=c(1,1), legend.position=c(1,1), legend.spacing.x=unit(0,"pt"), legend.text=element_text(size=7), legend.title=element_blank(), panel.grid=element_blank(), panel.background=element_rect(fill="white"), plot.margin=margin(4,11,4,4), plot.caption=element_text(size=7,hjust=0), plot.subtitle=element_text(size=7,hjust=0), plot.title=element_text(size=8,face="bold",margin=margin(2,,5))) ggsave("0.png",width=5.4,height=3.2,dpi=370*4) system("magick 0.png -resize 25% 1.png")
I excluded Prague from the plot above, because it had about 97% vaccinated people even though in all other regions the percentage was below 65%. In many age groups Prague had more vaccinated people than the total population size, so maybe it includes people who lived outside of Prague but who only got vaccinated in Prague (or maybe the population estimates I used only included residents but the vaccination data also included non-residents):
> vax=fread("https://onemocneni-aktualne.mzcr.cz/api/v2/covid-19/ockovani.csv") > pop=fread("https://csu.gov.cz/docs/107508/1cab7ab3-4d2e-4d0e-c2a8-425e3442218c/130181-23data2022.csv") > v=vax[vekova_skupina!="nezařazeno"&kraj_nazev=="Hlavní město Praha",.(vax=sum(prvnich_davek)),.(age=as.numeric(sub("[-+].*","",vekova_skupina)))] > cul=\(x,y)y[cut(x,c(y,Inf),,T,F)] # cut left > p=pop[uzemi_txt=="Hlavní město Praha"&pohlavi_txt==""&!is.na(vek_txt),.(pop=sum(hodnota)),.(age=cul(vek_txt,sort(v$age)))] > me=merge(v,p) > me=rbind(me,me[,.(vax=sum(vax),pop=sum(pop),age="Total")]) > print(me[,pct:=round(vax/pop*100)][],r=F) age vax pop pct 0 66 72645 0 5 17732 102655 17 12 46141 57207 81 16 25165 25668 98 18 110186 90035 122 25 106575 79198 135 # ages 25-29 have about 35% more vaccinated people than total people 30 122900 103610 119 35 120213 109717 110 40 132965 115795 115 45 126063 118052 107 50 89726 85964 104 55 79771 78193 102 60 63867 63691 100 65 70931 69461 102 70 70938 67388 105 75 60680 60170 101 80 68902 57877 119 Total 1312821 1357326 97
On Twitter canceledmouse told me that I shouldn't be using the term "age-standardized mortality rate" if I'm not calculating ASMR using a standardized standard population like the WHO standard population. [https://x.com/canceledmouse/status/1816244644981924101, https://x.com/henjin256/status/1816446274087817629] He next wrote a Substack article about the same topic. [https://openvaet.substack.com/p/dunning-kruger-illustrated-asmr-explained] I posted the following comment to the article:
You wrote that on my website I "explained that the following was an ASMR without any reference population being loaded". However in the code you showed, I used the resident population estimates from the 2021 Czech census as my reference population.
I think the word "standard" in a "standard population" means that it's a signpost against which the age-standardized mortality rate is calculated, and not necessarily that it's some established standard that has been defined formally.
In English the word "standard" has multiple meanings, but in other languages the word for a standard population does not have a connotation of a standard in the sense of an ISO standard. For example the Finnish term for ASMR is "ikävakioitu kuolleisuusaste" which literally means something like "age-defaulted mortality degree", and the Finnish term for a standard population is "vakioväestö" which literally means "default population".
And anyway, if I calculate ASMR using a non-standard standard population, what am I supposed to call it if I'm not allowed to call it ASMR?
I asked ChatGPT: "what is a term for a weighted average of age-specific mortality rates where the weight is the number of people that are included in each age group during a reference year like 2020". But it answered: "The term for a weighted average of age-specific mortality rates where the weights are the number of people in each age group during a reference year is known as the Age-Standardized Mortality Rate (ASMR). Age-standardized mortality rates are used to compare mortality rates between populations that have different age structures. This method adjusts for age by applying the age-specific mortality rates of a study population to a standard age distribution. The weights, in this context, are the population numbers in each age group from a reference year (e.g., 2020)."
If you calculate ASMR for the Czech Republic at Mortality Watch, it uses the 2020 Czech population as the standard population by default (which is not any standardized standard): https://www.mortality.watch/explorer/?c=CZE&v=2.
You wrote that you "succeeded in having him admitting his mistake" because I said that I sometimes used a non-standardized standard population to calculate ASMR. But it wasn't any admission of a mistake, because I don't think the standard population for ASMR has to be any standardized standard population like ESP2013.
You wrote: "To calculate the date of birth, he picks a random birth date corresponding to the birth year, for each subject… which is a great way to ensure his code will be impossible to reproduce, and that the population & resulting summaries will change on each iteration of his script." However I'm setting a seed in my code before I generate the random birthdays: http://sars2.net/czech.html#Bucket_analysis.
You wrote: "From a second file provided by Eurostats, he picks his reference population - being the 2021 CZ data." In the plot I used the resident population estimates in the 2021 Czech census as my standard population. But I didn't get them from Eurostat but from the website of the Czech Statistical Office: https://vdb.czso.cz/vdbvo2/faces/en/index.jsf?page=vystup-objekt&z=T&f=TABULKA&skupId=4449&katalog=33517&pvo=SLD21022-VSE&pvo=SLD21022-VSE&str=v335.
You wrote: "He then employs various hacks & smoothing to make the gum stick". But what hacks did I employ? I just calculated ASMR the regular way, but my plot showed daily data so of course it made sense to display it as a moving average.
On Twitter you wrote: "I hate ASMR and never uses it, my point is that this calculation isn't an ASMR, and will never be an ASMR. Simply not the same math applied, making the chart titles misleading." But when I asked you to explain how my math was different from regular ASMR, you didn't explain it. And I didn't find any place where you explained it in this Substack post either. You just said that I applied "hacks" to calculate ASMR but you didn't explain what the hacks were. But if I'm using the regular formula to calculate ASMR but I simply use a non-standard standard population instead of a standardized standard population, the math should still be the same.
I'm calculating ASMR as a weighted average of age-specific mortality rates, where the weight is the number of people from each age group that is included in the standard population. For example here my standard population was the estimated Czech resident population on January 1st 2020 by single year of age (where ages 100 and above were aggregated together):
> library(data.table) > t=fread("http://sars2.net/f/czpopdead.csv") > t=merge(t,t[year==2020,.(std=pop/sum(pop),age)]) > t[,.(asmr=round(sum(dead/pop*std)*1e5)),year][year>=2015]|>print(r=F) year asmr 2015 1164 2016 1105 2017 1114 2018 1105 2019 1076 2020 1209 2021 1305 2022 1113
You pointed out that the record-level data might be missing vaccination records for people who died in case the database record for the vaccination could not be joined with the database record for the death.
That might be the case, but the yearly number of deaths and yearly number of vaccinations in the record-level data are both identical or nearly identical to other sources.
The Czech Statistical Office has published Excel files which show the yearly number of deaths by ICD code, age group, and region: https://csu.gov.cz/produkty/zemreli-podle-seznamu-pricin-smrti-pohlavi-a-veku-v-cr-krajich-a-okresech-fgjmtyk2qr. I combined the Excel files into a single CSV file here: http://sars2.net/czech2.html#Deaths_by_ICD_code_region_age_group_and_year.
The yearly number of deaths in the Excel files was identical to yearly number of deaths at Eurostat in 2020-2022. And both were otherwise identical to the record-level data except the record-level data was only missing a single death in 2021:
> rec=fread("curl -Ls github.com/skirsch/Czech/raw/main/data/CR_records.csv.xz|xz -dc") > reclev=rec[,.(reclev=.N),.(year=year(DatumUmrti))] > eurostat=fread("http://sars2.net/f/czpopdead.csv")[,.(eurostat=sum(dead)),year] > icd=fread("http://sars2.net/f/czicd.csv.gz")[,.(icd=sum(dead)),year] > merge(icd,merge(reclev,eurostat))|>print(r=F) year icd reclev eurostat 2020 129289 129289 129289 2021 139891 139890 139891 2022 120219 120219 120219This calculation also shows that in CSV files published by the Czech Ministry of Health, the number of vaccine doses given by up to the end of 2023 is nearly identical to the record-level data: http://sars2.net/czech2.html#Number_of_vaccine_doses_by_type_in_record_level_data_compared_to_MoH_data.
This PDF describes the national health information system used in the Czech Republic (NZIS) along with the "infectious disease information system" (ISIN) which contains information about COVID cases and vaccines: https://vladci.cz/archive/covid/106/UZIS_2022-02_Struktura_NZIS_106.pdf.
You wrote that I aggregated together ages 0-4 even though most deaths in ages 0-4 are in age 0. But that was actually a good point to criticize, and I have switched to treating 0 and 1-4 as separate age groups in some of my newer scripts for the Czech data.
You wrote that Eurostat's population estimates for January 1st 2023 were "census data". But I think the population estimates also incorporate other data which was not part of the publication for the 2021 census, so it's not necessarily accurate to call them "census data".
This dataset has resident population estimates on December 31st 2022 by age group, region, and sex: https://data.gov.cz/datová-sada?iri=https%3A%2F%2Fdata.gov.cz%2Fzdroj%2Fdatové-sady%2F00025593%2Fa129a5408e8e5fd99497e9a22c39775e. I now found that the total population size in the dataset is identical to Eurostat's population estimate for January 1st 2023:
> system("wget csu.gov.cz/docs/107508/a53bbc83-5e04-5a74-36f9-549a090a806e/130142-24data051724.zip ;unzip 130142-24data051724.zip") > pop=fread("130181-23data2022.csv") > pop[uzemi_typ=="stát"&is.na(vek_txt)&pohlavi_txt=="",hodnota] # region type is whole country, age is empty (all ages), and sex is empty (both males and females) [1] 10827529 > fread("http://sars2.net/f/czpopdead.csv")[year==2023,sum(pop)] [1] 10827529
You posted a plot which showed that in ages 10-24 unvaccinated people had higher ASMR than vaccinated people. But it might be because of the healthy vaccinee effect, because if for example you look at single years of age in the age range of 10-24, people who died from a suicide or drug overdose were probably less likely to be vaccinated than other people with the same age (but if you compare ages 10-24 as a whole then it's of course biased because people at the upper end of the age group are more likely to die of a suicide but they're also more likely to be vaccinated).
It might be a coincidence, but your plot actually showed that unvaccinated ASMR had a peak around February 2022, which was around the same time when hospitalizations for COVID peaked in ages 0-11, 12-17, and 18-27: http://sars2.net/czech2.html#Hospitalizations_peaked_in_2022_in_the_youngest_age_groups. In ages 80+ my moving average for daily hospitalizations peaked in November 2020, but it peaked in March 2021 in ages 60-79, 40-59, and 20-39, and it only peaked around February 2022 in ages 0-19. So in younger age groups that got vaccinated later, the hospitalizations also peaked later.
However if you look at only unvaccinated people, the hospitalization rate remained high in February 2022 even in the oldest age groups: http://sars2.net/czech3.html#Hospitalizations_by_age_group_and_vaccination_status.
There's not enough COVID deaths in ages 10-24 to reliably use COVID deaths as the outcome for estimating vaccine efficacy, but there's a lot more hospitalizations than deaths. In the file
nakazeni-hospitalizace-testy.csv
there's 4,686 hospitalizations in ages 0-11, 1,074 in 12-15, 666 in 16-17, and 2,899 in 18-24.
And I posted this comment about the exchange on Twitter which motivated the Substack post:
Jikkyleaks has blocked me so I don't know if he saw my reply on Twitter. He said that I keep quiet about the gp120 sequences, but I wrote about them here: https://sars2.net/pradhan.html#Arkmedics_BLAST_searches_for_Pradhan_et_al_s_inserts. And I still haven't finished writing my article about Modernagate, but I posted a short version of it here: https://kirschsubstack.com/p/how-can-millions-of-people-all-exhibiting/comment/59033015.
When Fredy13_Backup showed you Bobby_Network's old research on Reiner Füllmich, you were making it seem like Bobby couldn't be trusted because he once had something negative to say about Jikkyleaks. But I pointed out that the Modernagate story doesn't hold water, so Bobby_Network and McKernan were correct in criticizing it.
Bobby posted this reply to a thread where McKernan showed why Modernagate was a bogus story: "Kevin's steps & arguments are clearly highly consistent, incorporating this for my future efforts too. I'll also re-BLAST his steps, also known as Science101. I'm open for real counters, Yuri-Ebright-Basket-Case-Frames resemble rather Trust the Plan-LARPs." (https://x.com/Bobby_Network/status/1496386212399095811) Then Jikky replied: "You believe that the presence of the 19nt sequence in esoteric bacteria explains the presence in furin cleavage site in #SARSCOV2?" And then Bobby replied: "I'm not talking about bacteria to be the source template, neither is actual GoF discussed atm. At the heart of it all: Will your core paradigm & dogma #Modernagate result into free of doubt success at any court, while the 19nt seq can be found in Nature too? Clearly no, too weak." And he continued in another reply: "No court in the world will say Yes & Amen here, when in fact the precise 19nt is not patented and also occures in nature, sole focus on reasoning only this is too naive, we need a variety of attack vectors (ranked) to be fully convincing, and they might yield the desired success."
Then in another thread McKernan posted a screenshot that showed that Jikky had blocked him and wrote "Sorry to see it devolve into this." And Bobby posted a screenshot of McKernan's tweet and wrote: "Will Jikky resort to blocking #metoo, instead of discussing this through? The digital artist is looking for more watertight attack vectors VS @moderna_tx which could hold at any court through qualitative & quantitative persuasion. Am I now licking ass for my upcoming Moderna job?" (https://x.com/Bobby_Network/status/1496449231481184257) After that Jikky also blocked Bobby.
BTW yesterday when I was searching for Twitter accounts that had posted about Limeng Yan in Japanese, I found this bot that promotes Qanon and that posts in a mixture of Japanese, Arabic, and English: https://x.com/glorious_harmo (https://x.com/search?q=lang%3Aja+limeng+yan&f=live). I wonder if it's an acquaintance of Arkmedic, because its pinned tweet features the same image that Arkmedic uses as his profile picture at Gab and his banner image at Twitter, which features the text "WHERE WE GO ONE WE GO ALL" and the hashtag "#QAnon". It's now been about three years since even Alex Jones, Mike Flynn, and Robert David Steele all started to say that Q was a psyop. So why does Arkmedic still use that image as his profile picture? Does he still trust the plan?
It's common to use a non-standardized standard population to calculate ASMR:
The first plot below was produced when canceledmouse ran a simplified version of my R code, where the standard population was the 2021 census population by single year of age up to ages 100+. And in the second plot he used his own buggy Perl code to calculate ASMR using the WHO standard population, which consists of 5-year age groups up to ages 100+:
He posted his Perl code here: https://github.com/OpenVaet/mongol_asmr/blob/b5d39c7894d915453a5454e165a5f52ca3458fbf/calculate_asmr_v2.pl. His code had 4 different errors, but I was able to fix them all by changing this line:
$deaths_per_100k_ASMR = $deaths_per_100k * $standard_percent / $age_group_percent_of_total;
To this:
$deaths_per_100k_ASMR = $deaths_per_100k * $standard_percent / 100 * 365 / ($rolling_days + 1);
When he calculated an 11-day moving average of deaths, he added together the deaths on 11 days but he forgot to divide them by 11. And the percentage of each age group in the standard population should've been divided by 100 to get the fraction of the age group. And the deaths per person-days should've been multiplied by 365 to get deaths per person-years. And a division by the "age group percent of total" is not part of the ASMR formula.
Canceledmouse acknowledged the errors in his Substack comments and he fixed them in this commit: https://github.com/OpenVaet/mongol_asmr/commit/b5d39c7894d915453a5454e165a5f52ca3458fbf.
A more minor reason why his Perl script didn't reproduce the results of my R code was that he calculated an 11-day moving average for deaths where the window extended 10 days to the past, and he didn't use a moving average for the population size. But in my R code I calculated a 21-day centered moving average for both deaths and population size where the window extended 10 days into the past and 10 days into the future. So it partially explains why the lines in my plot look more smooth.
However after I fixed the 4 errors in his Perl script I mentioned above, the lines became even more smooth and the Perl script produced the same result as this R code:
library(data.table);library(ggplot2);library(stringr) ma=\(x,b=1,f=b){x[]=rowMeans(embed(c(rep(NA,b),x,rep(NA,f)),f+b+1),na.rm=T);x} agecut=\(x,y)cut(x,c(y,Inf),paste0(y,c(paste0("-",y[-1]-1),"+")),T,F) std=fread("https://github.com/OpenVaet/mongol_asmr/raw/main/data/who_standard_population_age_group_5.csv") buck=fread("http://sars2.net/f/czbucketsdaily.csv.gz") d=buck[,.(dead=sum(dead),alive=sum(alive)),.(age=agecut(age,0:20*5),dose=ifelse(dose>0,"Vaccinated","Unvaccinated"),date)] d=unique(rbind(d,cbind(expand.grid(lapply(d[,1:3],unique)),alive=0,dead=0)),by=1:3)[order(date)] d=merge(d,std[,.(age=age_group,std=who_world_standard_percent/100)]) d=d[,.(date=date,alive,dead=ma(dead,10,0),std),.(age,dose)] sub="Moving average for deaths extends 10 days backwards, population is not moving average"|>str_wrap(80) # d=d[,.(date=date,alive=ma(alive,10),dead=ma(dead,10),std),.(age,dose)] # sub="Moving average for both deaths and population extends 10 days backwards and 10 days forwards"|>str_wrap(80) p=d[,.(y=sum(dead/alive*std*365e5,na.rm=T),pop=sum(alive)),.(x=date,z=factor(dose,unique(dose)))] p[pop<2e3,y:=NA] xstart=as.Date("2020-1-1");xend=as.Date("2023-1-1") yend=1800;ystep=200;yend2=100;ystep2=25;secmult=yend/yend2 color=hcl(c(210,120)+15,90,50);fill=hcl(c(210,120)+15,80,70) label=data.frame(x=xstart+.02*(xend-xstart),y=seq(yend,,-yend/15,nlevels(p$z))-yend/16,label=levels(p$z)) ggplot(p,aes(x,y))+ geom_hline(yintercept=c(0,yend),linewidth=.3,lineend="square")+ geom_vline(xintercept=c(xstart,xend),linewidth=.3,lineend="square")+ geom_area(data=merge(p,p[,.(popsum=sum(pop)),x]),aes(color=z,fill=z,y=pop/popsum*99.99*secmult),linewidth=.1,alpha=.22)+ geom_line(aes(color=z),linewidth=.4)+ geom_label(data=label,aes(x=x,y=y,label=label),fill=alpha("white",.7),label.r=unit(0,"pt"),label.padding=unit(1,"pt"),label.size=0,color=color[1:nrow(label)],size=2.7,hjust=0)+ labs(title="Age-standardized mortality rate in Czech record-level data",subtitle=sub,x=NULL,y="ASMR per 100k person-years")+ scale_x_date(limits=c(xstart,xend),breaks=seq(xstart,xend,"2 month"),date_labels="%b\n%y")+ scale_y_continuous(limits=c(0,yend),breaks=seq(0,yend,ystep),labels=\(x)ifelse(x<1e3,x,paste0(x/1e3,"k")),sec.axis=sec_axis(trans=~./secmult,breaks=seq(0,yend2,ystep2),name="Percentage of people with dose"))+ scale_color_manual(values=color)+ scale_fill_manual(values=fill)+ coord_cartesian(clip="off",expand=F)+ theme(axis.text=element_text(size=6.5,color="black"), axis.ticks=element_line(linewidth=.3), axis.title.y.left=element_text(margin=margin(0,4,0,0)), axis.title.y.right=element_text(margin=margin(0,-1,0,4)), axis.ticks.length=unit(3,"pt"), axis.title=element_text(size=7.5), legend.position="none", panel.background=element_rect(fill="white"), plot.margin=margin(4,4,4,4), plot.subtitle=element_text(size=6.8,margin=margin(0,0,4,0)), plot.title=element_text(size=7.5,face="bold",margin=margin(2,0,4,0))) ggsave("0.png",width=4.3,height=2.8,dpi=400*4) system("magick 0.png -resize 25% 1.png")
The next GIF file shows the result of the code above so that it compares the two different ways of calculating the moving averages. When canceledmouse used an 11-day moving average with a backwards window for deaths but no moving average for the population size, the deaths were lagging 5 days behind the population size, so the ASMR of vaccinated people was underestimated in early 2021 when the population size of vaccinated people was increasing rapidly:
The reason why the y-axis values in the GIF file above are so low is because I used the WHO standard population as my standard population like canceledmouse. But the WHO standard population is a very poor fit to the Czech population structure, because the WHO standard population is based on the world population so it includes developing countries that have a much lower percentage of people in elderly age groups than European countries. So the WHO standard population doesn't give enough weight to the elderly age groups which account for most deaths in the Czech Republic:
> who=fread("https://github.com/OpenVaet/mongol_asmr/raw/main/data/who_standard_population_age_group_5.csv") > pop=fread("http://sars2.net/f/czcensus2021pop.csv")[,.(czech=sum(pop)),.(age=age%/%5*5)][,czech:=czech/sum(czech)*100] > pop$who=head(who$who,-1) > print(round(pop[,whopct:=who/czech*100],1),r=F) age czech who whopct 0 5.3 8.9 166.3 5 5.2 8.7 166.3 10 5.5 8.6 155.8 15 4.7 8.5 182.0 20 4.5 8.2 181.3 25 5.7 7.9 138.8 30 6.6 7.6 116.0 35 6.8 7.2 104.6 40 8.2 6.6 80.8 45 8.3 6.0 72.4 50 6.5 5.4 82.1 55 6.3 4.6 71.9 60 5.9 3.7 63.5 65 6.3 3.0 46.6 70 5.9 2.2 37.6 # ages 70-74 make up about 5.9% of the Czech 2021 census population but only about 2.2% of the WHO standard population 75 4.0 1.5 38.1 80 2.3 0.9 39.4 85 1.3 0.4 34.2 90 0.5 0.1 29.9 95 0.1 0.0 42.7 100 0.0 0.0 81.3 age czech who whopct
In the following code I calculated an excess number of deaths relative to the 2013 to 2019 linear trend. I didn't normalize the results for age, because there were too many combinations of age group and cause of death that had too few deaths to calculate a linear trend accurately within age groups. And when I used a the total mortality rate in 2013 to 2019 within age groups as the baseline instead of a linear trend, it exaggerated the excess mortality of causes of death which had an increasing trend in deaths.
I included only ICD codes for underlying causes of death that had at least 100 total deaths in 2020 to 2022, but out of them I got far the highest excess mortality percent was for J12 ("Viral pneumonia, not elsewhere classified"), which had about 2500% total excess deaths in 2020 to 2022.
Among the ten causes of death that had the highest excess mortality in 2020 to 2022, there was also intentional self-harm by jumping off a high place and intentional self-harm by handgun discharge.
For some reason cardiac arrest got about 5% excess deaths in 2020 but it fell to -73% in 2021 and -69% in 2022. I don't know if it's because cardiac deaths started to be classified under other ICD codes instead.
library(data.table) pop=fread("czpopdead.csv")[,.(pop=sum(pop)),year] t=fread("czicd.csv.gz")[,.(dead=sum(dead)),.(year,icd)]|>merge(pop) t=t[icd%in%t[,.(dead=sum(dead)),icd][dead>=1000,icd]] totaldead=t[year>=2020,.(totaldead=sum(dead)),.(icd)] t=merge(t[year<2020,.(year=2013:2022,base=predict(lm(dead/pop~year),.(year=2013:2022))),icd],t)[,base:=base*pop] t=rbind(t,t[year>=2020,.(dead=sum(dead),pop=sum(pop),base=sum(base),year="2020-2022"),icd]) t[,year:=factor(year,unique(year))] code=unique(fread("codes.csv")[,.(icd=V1,name=V6)],by="icd") t=merge(unique(fread("codes.csv")[,.(icd=V1,name=V6)],by="icd"),t) t=merge(t,totaldead) t[,label:=paste0(icd,":",name," (",totaldead,")")] m1=t[,tapply(dead,.(label,year),c)] m2=t[,tapply(base,.(label,year),c)] m=(m1-m2)/ifelse(m1>m2,m2,m1)*100 disp=(m1/m2-1)*100;disp=ifelse(disp>1e3,sprintf("%.1fk",disp/1e3),round(disp)) maxcolor=300;exp=.8 ord=order(-m[,ncol(m)]);m=m[ord,];disp=disp[ord,] pheatmap::pheatmap(abs(m)^exp*sign(m),filename="0.png",display_numbers=disp, gaps_row=nrow(m)-1,gaps_col=ncol(m)-1, cluster_rows=F,cluster_cols=F,legend=F,cellwidth=16,cellheight=16,fontsize=9,fontsize_number=8, border_color=NA,na_col="gray90", number_color=ifelse(abs(m)^exp>maxcolor^exp*.4,"white","black"), breaks=seq(-maxcolor^exp,maxcolor^exp,,256), colorRampPalette(hsv(rep(c(7/12,0),5:4),c(.9,.75,.6,.3,0,.3,.6,.75,.9),c(.4,.65,1,1,1,1,1,.65,.4)))(256)) system("w=`identify -format %w 0.png`;pad=22;magick -pointsize 40 -font Arial -interline-spacing -2 \\( -size $[w-pad*2]x caption:'Excess percentage of deaths by ICD code relative to 2013-2019 linear trend. Based on raw deaths and not adjusted for age or population size. The total number of deaths in 2020-2022 is shown in parentheses. Only ICD codes that have at least 100 deaths in 2020-2022 aer included. Source: csu.gov.cz/produkty/zemreli-podle-seznamu-pricin-smrti-pohlavi-a-veku-v-cr- krajich-a-okresech-fgjmtyk2qr.' -splice $[pad]x20 \\) 0.png -append 1.png")
Steve Kirsch published a Substack post about the Czech data where he included two of my plots: https://kirschsubstack.com/p/a-summary-of-why-the-czech-republic. I posted the following comment to his post:
I'm not a data scientist but just a conspiracy theorist, and I don't necessarily believe that vaccines are either safe or effective, and I'm not vaccinated myself. I don't know that much about vaccines because I'm focused on researching other aspects of COVID. [Kirsch wrote that I'm an "independent data scientist who believes that the COVID vaccines are safe and effective".]
However based on my plot you posted which shows the monthly ASMR by vaccine type, it would seem like Moderna vaccines had high efficacy, because during the COVID wave in November to December 2021, there was almost no increase in ASMR for people who got a Moderna vaccine for the first dose, but the increase was slightly bigger for Pfizer and even bigger for AstraZeneca and Janssen.
You included this comment about my plot: "The gap size between the M and P shots didn't widen or narrow during COVID periods meaning both vaccines had the same protection against COVID death".
However in my plot if you calculate Moderna ASMR divided by Pfizer ASMR, the ratio falls from about 1.44 in October 2021 to 1.25 in November, because Pfizer had a bigger increase in ASMR at the start of the COVID wave in November 2021:
> std=fread("http://sars2.net/f/czcensus2021pop.csv")[,.(pop=sum(pop)),.(age=pmin(age,95)%/%5*5)][,pop/sum(pop)] > b=fread("http://sars2.net/f/czbucketskeep.csv.gz")[dose<=1&type!="Other"][,age:=pmin(age,95)%/%5*5] > a=b[,.(dead=sum(dead),alive=sum(as.double(alive))),.(type,month,age)] > a=rbind(a,a[,.(dead=sum(dead),alive=sum(alive),type="All people"),.(month,age)]) > p=a[,.(pop=sum(alive),asmr=sum(dead/alive*std[age/5+1]*365e5)),.(month,type)] > p[month>="2021-01",.(ratio=round(asmr[type=="Moderna"]/asmr[type=="Pfizer"],2)),month]|>print(r=F) month ratio 2021-01 1.74 2021-02 1.62 2021-03 1.84 2021-04 1.48 2021-05 1.28 2021-06 1.34 2021-07 1.49 2021-08 1.37 2021-09 1.48 2021-10 1.44 2021-11 1.25 2021-12 1.26 2022-01 1.25 2022-02 1.26 2022-03 1.28 2022-04 1.35 2022-05 1.33 2022-06 1.23 2022-07 1.28 2022-08 1.27 2022-09 1.29 2022-10 1.31 2022-11 1.34 2022-12 1.35 month ratioYou also wrote: "When comorbidities are highest (start of rollout) or lowest (everyone else is added to the mix diluting the comorbidity effect), the separation between the vaccine mortality curves remains constant which means it likely wasn't caused by comorbidities: there is no correlation between comorbidities and MR."
However in my triangle plot if you look at the age-normalized Moderna-Pfizer ratio by month of vaccination and not month of death, the ratio is about 2.3 for people who got the first dose in January 2021.
You wrote that "The result is 1.3 over a wide range of ages and time periods investigated."
But it's also not 1.3 over wide ranges of other periods, like for example for people who got vaccinated in January 2021.
My age-normalized Moderna-Pfizer ratio was also about 1.02 for people who got the first dose from October 2021 to January 2022, which included a sample size of about 700,000 people who received a first dose from either a Pfizer or Moderna vaccine:
> library(data.table) > b=fread("http://sars2.net/f/czbucketskeep.csv.gz")[,age:=pmin(age,100)] > d=merge(b[dose==1&vaxmonth>="2021-10"&vaxmonth<="2022-01"],b[dose<=1,.(base=sum(dead)/sum(alive)),age])[,base:=alive*base] > d=d[,.(dead=sum(dead),base=sum(base)),.(type,vaxmonth)] > d=rbind(d,d[,.(dead=sum(dead),base=sum(base),vaxmonth="2021-10 to 2022-01"),type]) > d[,.(ratio=(dead/base)[type=="Moderna"]/(dead/base)[type=="Pfizer"]),vaxmonth][order(vaxmonth)]|>print(r=F) vaxmonth ratio 2021-10 0.9325833 2021-11 1.0229378 2021-12 1.0174279 2022-01 1.0072136 2021-10 to 2022-01 1.0194936And the same ratio was only about 0.56 for people who got the first dose from February 2022 to December 2022. They had a sample size of about 40,000 people:
> rec=fread("curl -Ls github.com/skirsch/Czech/raw/main/data/CR_records.csv.xz|xz -dc") > rec[grep("Comirnaty|SPIKEVAX",OckovaciLatka_1),.N,Datum_1][Datum_1>="2022-2-1",sum(N)] [1] 43476
You wrote: "There were 110 different data points with sufficient counts to make a determination of the Mortality Rate Ratio (MRR) and Moderna was more deadly in 110/110 opportunities."
By 110 combinations I believe you were referring to different combinations of month of vaccination and month of death in my triangle plot (where I calculated an age-normalized Moderna-Pfizer ratio so the age group didn't have to be considered as an additional variable).
However there's over 300,000 people who got a first dose from either a Moderna or Pfizer vaccine in November 2021, so if you calculate the Moderna-Pfizer ratio grouped only by the month of vaccination and not also by the month of death, I think there is a sufficient sample size for people who were vaccinated in November. And their Moderna-Pfizer ratio was about 1.02.
You wrote: "There is a healthy vaccinee effect, but it only lasts 3 weeks; after that, you hit baseline mortality if the vaccine is safe." However in the Czech data the temporal/time-invariant HVE seems to last at least about 15-20 weeks: http://sars2.net/czech.html#Deaths_by_weeks_after_first_dose, http://sars2.net/czech2.html#Excess_mortality_by_weeks_after_vaccination. (The "temporal HVE" was a term coined by Jeffrey Morris, who has now renamed it to the "time-invariant HVE".)
You wrote that "the VA study showed the COVID shots didn't impact probability of being hospitalized".
However when I used hospitalization data published by the Czech MoH to calculate an age-normalized hospitalization ratio relative to a baseline of unvaccinated people, I got about -79% excess hospitalizations among people who got a Pfizer vaccine for the first dose, -77% for Moderna, -71% for AstraZeneca, and -68% for Janssen. So in other words unvaccinated people had about 5 times higher age-normalized hospitalizations per capita than people with a Pfizer vaccine: http://sars2.net/czech3.html#Efficacy_against_hospitalization_relative_to_a_baseline_of_unvaccinated_people.
In December 2021, unvaccinated people had about 6 times higher age-standardized hospitalization rate than vaccinated people. However the ratio between the rates got smaller over time, which may have been because unvaccinated people acquired natural immunity over time. So by December 2022 the rate was only about 1.7 times higher in unvaccinated people than in vaccinated people, and by December 2023 it was only about 1.5 times higher: http://sars2.net/czech3.html#Age_standardized_hospitalization_rate_by_vaccination_status. But if the reason why unvaccinated people had a higher hospitalization rate was solely due to confounders, you would expect the ratio between the unvaccinated and vaccinated hospitalization rates to remain more stable over time, since the demographics of the vaccinated and unvaccinated cohorts didn't change that much after 2021 because there were few new people getting vaccinated.