Part 1 is here: nopandemic.html.
Jean Fisch posted this thread on Twitter: [https://x.com/Jean%5f%5fFisch/status/1860944498743816408]
An old myth got revived in GER around "Bergamo mass deaths must be a fake because there was no mass death deficit afterwards"
What gets forgotten is that such a deficit is
a) small (4% of the excess with covid)
b) mostly taken care off in mortality trend based expected deathsFirst of all: why is the displacement small? Here is the simplest way to explain it
The average age of a covid death in Spring 2020 in Bergamo was 80
If you look at the annual probability of dying in Italy of an 80 yo, <5% with die aged 81 or 82 (ie in 2021 or 2022 for Bergamo)
If you get now confused because you get 83 years after googling "Life Expectancy Italy", that's because the 83 years is the expectancy at BIRTH
Once you reach 80, your life expectancy is 10 years and that explains why the stats say that only <5% of 80s old die annually at first
Let's go back to Bergamo: The huge spike of deaths led to 6,000 excess deaths on a usual basis of 10,000 (rounded)
So these 6,000, aged 80 on average, will be "missing" in the death statistics of the following years and expectations need to be corrected accordingly
How much does make out annually? It means approx. 250-300 fewer deaths to be expected in 2021-2023 (4-5% of 6,000)
So you see, despite a 60% excess in 2020, stats tell you that there are ONLY 2.5-3% fewer deaths (ie a tiny %) to be expected less annually in 2021-2023
But there is more: Most advanced stats and health institutions monitor excess mortality against trends from MORTALITY RATES, not deaths
The reason is that the rate of improvement of mortality rates has been more or less linear over the past years (so a good indicator)
What one then does is to multiply these expected mortality rates (ideally by age band) by population actuals to get expected deaths
This also has the advantage to automatically incorporate the changes in age pyramid seen lately in western democracies, especially in old age bands
But this has the additional advantage that this method for expected deaths immediately also incorporates a massive excess in one year
Why? Because (here for Bergamo) the population will be smaller the year there after and the expected deaths automatically corrected accordingly
And this is exactly what happens in Bergamo
After an "upwards trend up to 2019", expected deaths from mortality rates then go down after 2020 due to the sizeable excess
(also they are roughly the same as the one calculated using corrected deaths, as they should be)
In summary, there are two "ahas" on death displacement following excess
TAKEAWAY 1: Views that "Bergamo's huge death peak is fake because we didn't see a massive deficit in the subsequent years" are just an expression of a lack of understanding how mortality works
TAKEAWAY2: In principle, high excess in a year does not require tweaking the expected deaths thereafter if these were estimated from mortality trends
I mention this because I see this argument made regularly
FWIW, all my mortality analysis is based on mortality trend
END
I took the 2019 US life table for both sexes combined from the spreadsheet linked here: https://www.cdc.gov/nchs/data/nvsr/nvsr70/nvsr70-19.pdf. I then calculated excess deaths for each age in NYC relative to a 2010-2019 linear trend, but the average life expectancy of the people who died in excess was about 15 years, even though the average age of the people who died in excess was about 73. I excluded ages 0 to 15 because they had some years with less than 10 deaths so the number of deaths was suppressed by CDC WONDER. And even when I excluded all ages below 50 to reduce the impact of drug deaths, the average life expectancy remained at about 13 years:
download.file("https://ftp.cdc.gov/pub/Health_Statistics/NCHS/Publications/NVSR/70-19/Table01.xlsx","Table01.xlsx") t=data.table(readxl::read_excel("Table01.xlsx",skip=2)) t=t[,.(age=as.numeric(sub("\\D.*","",t[[1]])),expectancy=ex)][1:101] # deaths where county of residence was one of 5 counties of NYC at CDC WONDER dead=fread("http://sars2.net/f/nycyearlydead.csv")[age>=16] base=dead[year%in%2010:2019,.(year=2020,base=predict(lm(dead~year),.(year=2020))),age] me=merge(t,merge(base,dead)[,.(excess=dead-base,age)]) me[,weighted.mean(age,excess)] # 73.04446 (average age of excess deaths) me[,weighted.mean(expectancy,excess)] # 15.1019 (life expectancy for ages 16 and above) me[age>=50,weighted.mean(expectancy,excess)] # 12.98951 (life expectancy for ages 50 and above)
Ethical Skeptic wrote: "The function we currently use for PFE is described by 6.6 years (345 weeks - April 2021 - Oct 2027) of Chi-squared arrival, with an anticipated x_mode (function peak) of mid-late 2023 at 6.02% of Excess Non-Covid Natural Cause Mortality." [https://theethicalskeptic.com/2024/11/20/the-state-of-things-pandemic/] And he linked this image:
However if the mean life expectancy of people who died in excess in NYC in 2020 was about 15 years, many of the people aren't expected to have died even 15 years after 2020, so the PFE adjustment should last much longer than until the end of October 2027.
Ethical Skeptic came up with the figure of 6.6 years because he said the average age of COVID deaths in Florida was 82, and he got a life expectancy of 6.6 years for age 82 from some unspecified source (even though in the 2019 US life expectancy table the life expectancy for age 82 is about 8.2 years): [https://theethicalskeptic.com/2024/11/20/the-state-of-things-pandemic/]
At CDC WONDER the average age of UCD COVID deaths was 73.8 in Florida and about 73.9 in the whole US. And in the 2019 life table the life expectancy for age 74 is about 13.1 years, so it's about twice as high as Ethical Skeptic's figure of 6.6 years. And the average of life expectancies for each age weighted by the number of COVID deaths for the age was about 14.5 years:
# 2019 life expectancy table for both sexes combined download.file("https://ftp.cdc.gov/pub/Health_Statistics/NCHS/Publications/NVSR/70-19/Table01.xlsx","Table01.xlsx") ex=read_excel("Table01.xlsx",skip=2,n_max=101)$ex # UCD COVID deaths by single year of age from CDC WONDER (last value is age 100 and above) coviddead=c(363,116,46,41,39,30,38,32,33,47,40,35,41,56,51,97,88,130,162,210,255,288,367,407,499,530,620,707,819,979,1112,1255,1329,1441,1607,1703,1850,2066,2237,2502,2713,2987,3253,3486,3942,4197,4610,4912,5686,6278,7075,7365,8017,8227,8942,9822,11084,11748,12866,14070,15174,16061,17259,18347,19229,20037,20443,20874,22012,22623,23586,24446,25783,27417,27760,26024,26237,27531,28188,27690,27437,27093,27200,27223,26753,26849,26190,25121,24827,24104,22756,21336,19309,17787,15069,12656,10416,8337,6321,4626,8773) weighted.mean(ex,coviddead) # 14.50457
The following code shows the average life expectancy by month for deaths with UCD COVID. I now calculated the life expectancy separately for males and females even though it didn't make much difference. CDC WONDER suppresses the number of deaths on rows with less than 10 deaths, so I got the number of COVID deaths from the fixed-width files for the NVSS data instead: stat.html#Download_fixed_width_and_CSV_files_for_the_NVSS_data_used_at_CDC_WONDER. The life expectancy was unsurprisingly much higher in 2021 than 2020 (which is because the percentage of COVID deaths in 2021 relative to 2020 is much higher in working-age people than in elderly people, which might be because in 2021 working-age people were less likely to be vaccinated than elderly people):
dlf=\(x,y=sub(".*/","",x),...)for(i in 1:length(x))download.file(x[i],y[i],quiet=T,...) dlf(paste0("https://ftp.cdc.gov/pub/Health_Statistics/NCHS/Publications/NVSR/70-19/Table0",2:3,".xlsx")) ex=data.table(age=0:100,sex=rep(c("M","F"),each=101)) ex$ex=c(sapply(2:3,\(i)read_excel(paste0("Table0",i,".xlsx"),skip=2,n_max=101)$ex)) t=do.call(rbind,lapply(2020:2022,\(x)fread(paste0(x,".csv.gz")))) a=t[age!=9999&ucod=="U071",.N,.(year,month=monthdth,sex,age=pmin(100,ifelse(age<2000,age-1000,0)))] merge(ex,a)[,weighted.mean(ex,N),.(year,month)][,xtabs(round(V1)~year+month)] # month # year 1 2 3 4 5 6 7 8 9 10 11 12 # 2020 18 19 15 13 12 14 15 14 13 12 12 12 # 2021 13 14 15 17 18 18 19 19 19 18 17 17 # 2022 15 14 14 14 12 12 12 12 11 11 11 11
Next I made the simulation below using data from ISTAT and Eurostat: http://dati.istat.it/Index.aspx?QueryId=42869&lang=en, https://ec.europa.eu/eurostat/api/dissemination/sdmx/2.1/data/demo_r_mwk3_t?format=TSV. At first I also tried simulating different people having different levels of health, but I didn't find an accurate way to do it so I left it out of the simulation. When I simply assigned a fixed health multiplier to each person, the people with poor health tended to die at the start of the simulation which artificially elevated deaths at the start of the simulation compared to the end of the simulation. So I should've also modeled the health status of people getting worse over time so that over time new people would've gradually entered the verge of death, but it seemed too difficult to model realistically. But anyway, here in the red scenario where I simulated a spike in excess deaths in spring 2020, I got only about 3.4% less deaths in 2021-2024 than in the black scenario without the spike:
library(data.table);library(ggplot2) euro=fread("https://ec.europa.eu/eurostat/api/dissemination/sdmx/2.1/data/demo_r_mwk3_t?format=TSV") x=unlist(euro[euro[[1]]%like%"ITC46"])[-1];x=x[x!=":"] d=data.table(year=as.integer(substr(names(x),1,4)),week=as.integer(substr(names(x),7,8)))[,x:=.I] d$dead=as.integer(sub(" .*","",x)) d$trend=d[year%in%2013:2019,predict(lm(dead~x),d)] d=merge(d,d[year%in%2013:2019,.(weekly=mean(dead-trend)),week]) mult=d[year==2020&week%in%9:19,dead/(trend+weekly)] pop=c(7406,7625,7880,7909,8570,8867,9439,9725,9850,10350,10545,10991,11427,11617,11950,11864,11816,11905,11706,11809,11864,11769,11942,12225,11878,12078,12004,11527,11715,11477,11525,11993,11984,12231,12064,12631,12406,12205,12752,13256,13417,13627,14005,14351,14612,15773,16026,16577,17824,18126,18084,18163,17907,18161,18285,18240,18270,18805,18653,18690,17266,16377,15528,15325,14804,14145,13294,13184,12725,12372,12000,11680,11546,12114,11851,11748,10684,10850,8014,8288,7876,7855,7494,7977,7060,6379,5353,4372,4140,3445,2908,2471,2016,1701,1185,908,660,456,294,229,281) life=fread("http://sars2.net/f/bergamolifetable.csv") cmr=life[,sum(deaths)/(sum(deaths)+sum(survivors)),.(age=pmin(age,100))]$V1/52 simweeks=52*7 weekmult=rep(1,simweeks);weekmult[113:123]=mult people=data.table(age=rep(0:100,pop))[,age:=age+runif(.N)] people2=copy(people) sim=data.table(week=1:simweeks,dead=0,dead2=0) set.seed(0) for(i in 1:simweeks){ died=people[,rbinom(.N,1,pmin(1,cmr[age+1]*weekmult[i]))] people=people[died!=1][,age:=pmin(100,age+1/52)] sim$dead[i]=sum(died) } set.seed(0) for(i in 1:simweeks){ died=people2[,rbinom(.N,1,pmin(1,cmr[age+1]))] people2=people2[died!=1][,age:=pmin(100,age+1/52)] sim$dead2[i]=sum(died) } sim$base=sim[week<105,predict(lm(dead~week),sim)] lab=c("High-mortality event in 2020","No high-mortality event","2018-2019 linear trend") p=sim[,.(x=week,y=unlist(.SD[,-1]),z=factor(rep(lab,each=.N),lab))] sum=sim[week%in%105:156,colSums(.SD)] sum2=sim[week>156,colSums(.SD)] note=stringr::str_wrap(sprintf("The scenario with the high-mortality event has %.1f%% more deaths in 2020 and %.1f%% less deaths in 2021 to 2024.",(sum[2]/sum[3]-1)*100,(sum2[3]/sum2[2]-1)*100),32) xstart=1;xend=simweeks+1;xbreak=seq(xstart,xend,26);xlab=c(rbind("",2018:2024),"") ybreak=pretty(c(0,p$y),8);ystart=0;yend=max(p$y) ggplot(p,aes(x,y))+ geom_rect(xmin=113,xmax=123,ymin=0,ymax=yend,fill="#ffe5e5")+ geom_vline(xintercept=seq(xstart,xend,52),linewidth=.25,color="gray85",lineend="square")+ geom_hline(yintercept=c(ystart,yend),linewidth=.3,lineend="square")+ geom_vline(xintercept=c(xstart,xend),linewidth=.3,lineend="square")+ geom_line(aes(color=z,linetype=z),linewidth=.35)+ labs(title="Simulation for deaths per week in Bergamo",x=NULL,y=NULL)+ annotate(geom="label",x=.46*xend,y=yend*.44,label=note,hjust=0,label.r=unit(2,"pt"),label.padding=unit(3,"pt"),label.size=.2,size=2.4,lineheight=.9)+ scale_x_continuous(limits=c(xstart,xend),breaks=xbreak,labels=xlab)+ scale_y_continuous(limits=c(ystart,yend),breaks=ybreak)+ scale_color_manual(values=c("#f66","black","gray60"))+ scale_linetype_manual(values=c("solid","solid","42"))+ coord_cartesian(clip="off",expand=F)+ theme(axis.text=element_text(size=7,color="black"), axis.text.y=element_text(margin=margin(,1)), axis.ticks=element_line(linewidth=.3), axis.ticks.length=unit(3,"pt"), axis.ticks.length.x=unit(0,"pt"), legend.direction="vertical", legend.justification=c(1,1), legend.key=element_blank(), legend.key.height=unit(9,"pt"), legend.key.width=unit(20,"pt"), legend.background=element_rect(color="black",linewidth=.3), legend.margin=margin(3,3,3,3), legend.position=c(1,1), legend.spacing.x=unit(1.5,"pt"), legend.spacing.y=unit(0,"pt"), legend.text=element_text(size=7), legend.title=element_blank(), panel.background=element_blank(), plot.margin=margin(4,4,2,4), plot.title=element_text(size=7.3,face="bold",margin=margin(1,,3))) ggsave("1.png",width=3.7,height=2,dpi=380*4) sub=paste0("Resident population estimates for Bergamo in 2019 were used as the intial population size for each age (from dati.istat.it/Index.aspx?QueryId=42869). The total initial population size was 1,111,228. Mortality rates from the 2019 life table for Bergamo at ISTAT are used throughout the simulation, except in the high-mortality scenario the mortality rates during weeks 9 to 19 of 2020 were multiplied by actual deaths in Bergamo on the same weeks of 2020 divided by a seasonality-adjusted 2013-2019 baseline (from Eurostat dataset demo_r_mwk3_t).") sub=paste0(sub," Each year is exactly 52 weeks long. Births were not simulated. Seasonal variation in mortality or differences in the health of people were not simulated. The same seed was used for both scenarios so the scenarios are identical in 2018 and 2019.") system(paste0("mar=120;w=`identify -format %w 1.png`;magick 1.png \\( -size $[w-mar*2]x -font Arial -interline-spacing -4 -pointsize 146 caption:'",gsub("'","'\\'",sub),"' -gravity southwest -splice $[mar]x70 \\) -append -resize 25% -colors 256 1.png"))
The difference between my simulated scenarios is easier to see in the next plot where I truncated the y-axis and I plotted the deaths as moving averages. But in the case of Bergamo the excess deaths were concentrated to such a brief period of time that the "PFE arrival function" is more like a square wave than a sine wave, and the depth of the PFE adjustment is more shallow than in plots by ES (or actually my red line is about 3.4% lower on average than the black line in 2021-2024, so it's not much lower than the average depth of Ethical Skeptic's PFE arrival function in 2021-2024, but the depth is still shallower relative to the percentage of excess deaths which was higher in Bergamo than the United States):
In the plot below which shows real data for deaths in Bergamo, there seems to be a clear reduction in deaths below the baseline in ages 90+ but not in younger age groups (but ages 90+ also had the biggest percentage reduction in population size in spring 2020):
library(data.table);library(ggplot2) agecut=\(x,y)cut(x,c(y,Inf),paste0(y,c(paste0("-",y[-1]-1),"+")),T,F) ma=\(x,b=1,f=b){x[]=rowMeans(embed(c(rep(NA,b),x,rep(NA,f)),f+b+1),na.rm=T);x} dead=fread("comuni_giornaliero_31agosto24.csv.gz",na="n.d.")[NOME_PROVINCIA=="Bergamo"] dead=dead[,.(age=c(0,1,1:20*5)[CL_ETA+1],year=rep(2011:2024,each=.N),month=GE%/%100,day=GE%%100,dead=unlist(.SD[,38:51]))] ages=c(0,40,60,80,90) a=dead[,.(dead=sum(dead,na.rm=T)),.(age=agecut(age,ages),year,month,day)] a=a[,.(age,y=dead,x=as.Date(paste(year,month,day,sep="-")))] p=rbind(a,a[,.(y=sum(y),age="Total"),x])[year(x)>=2013][,yday:=yday(x)] p=merge(p[year(x)<2020,{i=unique(p$x);.(x=i,trend=predict(lm(y~x),.(x=i)))},age],p) p=merge(p[year(x)<2020,mean(y-trend),.(yday,age)],p) lab=c("Actual deaths","2013-2019 linear trend") p=p[,.(x,y=c(y,trend+V1),age,z=factor(rep(lab,each=.N),lab))][order(x)] p[,y:=ma(y,10),.(z,age)] xstart=as.Date("2018-1-1");xend=as.Date("2024-1-1") p=p[x>=xstart&x<=xend] xbreak=seq(xstart,xend,"6 month");xlab=ifelse(month(xbreak)==7,year(xbreak),"") ggplot(p,aes(x,y))+ facet_wrap(~age,scales="free",dir="v",ncol=2)+ geom_vline(xintercept=seq(xstart,xend,"year"),linewidth=.25,color="gray85",lineend="square")+ geom_line(aes(color=z),linewidth=.35)+ geom_label(data=p[,max(y,na.rm=T),age],aes(label=sprintf("\n %s \n",age),y=V1),x=xend,lineheight=.5,hjust=1,vjust=1,size=2.3,fill=alpha("white",1),label.r=unit(0,"pt"),label.padding=unit(0,"pt"),label.size=.25)+ labs(title="Daily deaths in Bergamo by age group, ±10-day moving average",subtitle="Source: istat.it/notizia/dati-di-mortalita-cosa-produce-listat",x=NULL,y=NULL)+ scale_x_continuous(limits=c(xstart,xend),breaks=xbreak,labels=xlab)+ scale_y_continuous(limits=c(0,NA),breaks=\(x)pretty(x,5))+ scale_color_manual(values=c("black","#58f"))+ coord_cartesian(clip="off",expand=F)+ theme(axis.text=element_text(size=7,color="black"), axis.text.y=element_text(margin=margin(,1)), axis.ticks=element_line(linewidth=.3), axis.ticks.length=unit(3,"pt"), axis.ticks.length.x=unit(0,"pt"), legend.box.spacing=unit(0,"pt"), legend.direction="horizontal", legend.justification=c(0,1), legend.key=element_blank(), legend.key.height=unit(9,"pt"), legend.key.width=unit(20,"pt"), legend.background=element_blank(), legend.margin=margin(,,3), legend.position="top", legend.spacing.x=unit(1.5,"pt"), legend.spacing.y=unit(0,"pt"), legend.text=element_text(size=7), legend.title=element_blank(), panel.background=element_blank(), panel.border=element_rect(fill=NA,linewidth=.3), panel.spacing.x=unit(2,"pt"), panel.spacing.y=unit(1,"pt"), plot.margin=margin(4,4,4,4), plot.subtitle=element_text(size=7,margin=margin(,,1)), plot.title=element_text(size=7.5,face="bold",margin=margin(,,3)), strip.background=element_blank(), strip.text=element_blank()) ggsave("1.png",width=5.8,height=4,dpi=380*4) system("magick 1.png -resize 25% -colors 256 1.png")
Here when I divided the number of excess deaths on weeks 9 to 19 of 2020 with the resident population estimate on January 1st 2020, it was about 7% in ages 90+ but only about 5% in ages 85-89 and 3% in ages 80-84:
euro=fread("estat_demo_r_mweek3.tsv.gz",na=":") meta=fread(text=euro[[1]],header=F) euro=cbind(meta[,.(age=V4)],euro[,-1])[meta[,V3=="T"&V5=="ITC46"&!V4%in%c("TOTAL","Y_GE85","UNK")]] euro[,age:=ifelse(age=="Y_LT5",0,as.integer(sub("\\D+(\\d+).*","\\1",age)))] euro=euro[,.(age,date=rep(names(.SD)[-1],each=.N),dead=as.integer(sub(" .*","",unlist(.SD[,-1],,F))))] euro[,year:=as.integer(substr(date,1,4))][,week:=as.integer(substr(date,7,8))] a=euro[week%in%9:19,.(dead=sum(dead)),.(age,year)][year>=2013] ex=merge(a[year==2020],a[,predict(lm(dead~year),.(year=2020)),age])[,.(age,excess=dead-V1)] # resident population estimate on January 1st 2020 from dati.istat.it/Index.aspx?QueryId=42869 pop=data.table(age=0:18*5,pop=c(35349,51917,58050,57058,57116,55881,59096,65303,76378,89485,91969,87414,72505,63069,61768,45496,40271,24711,15290)) merge(ex,pop)[,.(age,excess=round(excess),pop,poppct=round(excess/pop*100,2))]|>print(r=F) # age excess pop poppct # 0 -2 35349 -0.01 # 5 0 51917 0.00 # 10 0 58050 0.00 # 15 1 57058 0.00 # 20 0 57116 0.00 # 25 -1 55881 0.00 # 30 1 59096 0.00 # 35 -3 65303 0.00 # 40 3 76378 0.00 # 45 29 89485 0.03 # 50 57 91969 0.06 # 55 115 87414 0.13 # 60 194 72505 0.27 # 65 347 63069 0.55 # 70 609 61768 0.99 # 75 857 45496 1.88 # 80 1185 40271 2.94 # 85 1137 24711 4.60 # 90 1095 15290 7.16
In the next plot I did a similar simulation for NYC as the earlier simulation for Bergamo. Now the number of deaths in the red scenario where I simulated the COVID event was only about 3.1% lower in 2021-2024 than in the black scenario without the COVID event. I don't know why the reduction was so much smaller than in the simulation for Bergamo, even though the total percentage of excess deaths in the first half of 2020 was only slightly higher in Bergamo than NYC:
library(data.table);library(ggplot2) simmonths=12*7 monthmult=rep(1,simmonths) nyc=fread("http://sars2.net/f/nycmonthlydead.csv") monthmult[25:30]=merge(nyc[year<2020,mean(dead),month],nyc)[year==2020,dead/V1][1:6] pop=fread("http://sars2.net/f/uspopdead.csv")[year==2018] cmr=pop[,dead/pop]/12 popfrac=pop[,sum(pop)/8.8e6] people=data.table(age=pop[,rep(age,pop/popfrac)])[,age:=age+runif(.N)] people$health=1 # don't model health of people # sd=1;people[,health:=2^rnorm(.N,log(2)*sd^2/-2,sd)] # include health multipliers in model sim=data.table(month=1:simmonths,dead=0,dead2=0) people2=copy(people) set.seed(0) for(i in 1:simmonths){ died=people[,rbinom(.N,1,pmin(1,cmr[age+1]*health*monthmult[i]))] people=people[died!=1][,age:=pmin(100,age+1/12)] sim$dead[i]=sum(died) } set.seed(0) for(i in 1:simmonths){ died=people2[,rbinom(.N,1,pmin(1,cmr[age+1]*health))] people2=people2[died!=1][,age:=pmin(100,age+1/12)] sim$dead2[i]=sum(died) } sim$base=sim[month<25,predict(lm(dead~month),sim)] lab=c("With high-mortality event","No high-mortality event","2018-2019 linear trend") p=sim[,.(x=month,y=unlist(.SD[,-1]),z=factor(rep(lab,each=.N),lab))] sum=sim[month%in%25:36,colSums(.SD)] sum2=sim[month>36,colSums(.SD)] note=stringr::str_wrap(sprintf("The scenario with the high-mortality event has %.1f%% more deaths in 2020 and %.1f%% less deaths in 2021 to 2024.",(sum[2]/sum[3]-1)*100,(sum2[3]/sum2[2]-1)*100),32) xstart=1;xend=simmonths+1;xbreak=seq(xstart,xend,6);xlab=c(rbind("",2018:2024),"") ybreak=pretty(c(0,p$y),8);ystart=0;yend=max(p$y)+.02 ggplot(p,aes(x+.5,y))+ geom_vline(xintercept=seq(xstart,xend,12),linewidth=.3,color="gray83",lineend="square")+ geom_hline(yintercept=c(ystart,yend),linewidth=.3,lineend="square")+ geom_vline(xintercept=c(xstart,xend),linewidth=.3,lineend="square")+ geom_line(aes(color=z,linetype=z),linewidth=.35)+ geom_point(aes(color=z,alpha=z),stroke=0,size=.7)+ labs(title="Simulation for deaths per month in NYC",x=NULL,y=NULL)+ annotate(geom="label",x=42,y=yend*.48,label=note,hjust=0,label.r=unit(2,"pt"),label.padding=unit(3,"pt"),label.size=.2,size=2.4,lineheight=.9)+ scale_x_continuous(limits=c(xstart,xend),breaks=xbreak,labels=xlab)+ scale_y_continuous(limits=c(ystart,yend),breaks=ybreak)+ scale_color_manual(values=c("#ff6666","black","gray60"))+ scale_alpha_manual(values=c(1,1,0))+ scale_linetype_manual(values=c("solid","solid","42"))+ coord_cartesian(clip="off",expand=F)+ theme(axis.text=element_text(size=7,color="black"), axis.ticks=element_line(linewidth=.3), axis.ticks.length=unit(3,"pt"), axis.ticks.length.x=unit(0,"pt"), legend.direction="vertical", legend.justification=c(1,1), legend.key=element_blank(), legend.key.height=unit(9,"pt"), legend.key.width=unit(20,"pt"), legend.background=element_rect(color="black",linewidth=.3), legend.margin=margin(3,3,3,3), legend.position=c(1,1), legend.spacing.x=unit(1.5,"pt"), legend.spacing.y=unit(0,"pt"), legend.text=element_text(size=7), legend.title=element_blank(), panel.background=element_blank(), plot.margin=margin(4,4,2,4), plot.title=element_text(size=7.5,face="bold",margin=margin(1,,3))) ggsave("1.png",width=3.7,height=2,dpi=380*4) sub=paste0("The initial population size was 8.8 million so that the size of each age in the mid-2018 US resident population estimates was divided by about 37.3 (because population estimates for NYC by single year of age were not available). US mortality rates in the year 2018 are used for each single year of age throughout the simulation, except in the high-mortality scenario the monthly mortality rates in the first half of 2020 were multiplied by the number of deaths during the month in NYC divided by the average number of deaths during the same month in NYC in 2018 and 2019.") sub=paste0(sub," The length of each month is 1/12 years. Births are not simulated. Seasonal variation in mortality or differences in the health of people were not simulated. The same seed was used for both scenarios so the scenarios are identical in 2018 and 2019.") system(paste0("mar=120;w=`identify -format %w 1.png`;magick 1.png \\( -size $[w-mar*2]x -font Arial -interline-spacing -4 -pointsize 146 caption:'",gsub("'","'\\'",sub),"' -gravity southwest -splice $[mar]x70 \\) -append -resize 25% -colors 256 1.png"))
In Ethical Skeptic's plots which display weekly excess deaths from 2018 onwards, he fits a linear baseline against deaths from only 2018 and 2019. However his baseline is often too low especially in 2023 and 2024, because the long-term trend in deaths is curved upwards due to the aging population, so in addition to adjusting his baseline downwards to account for the PFE, he should also adjust his baseline upwards to account for the aging population. In my simulation above between the second half of 2020 and 2024, the difference between the black line and gray 2018-2019 baseline was about twice as big than the difference between the black and red lines. But basically Ethical Skeptic should also adjust his linear baseline so it would be slightly curved upwards like the black line in my simulation above.
Jean Fisch pointed out to Jessica Hockett that "the average age of a covid death was 80" and "80yo Italians have only an annual probabilitty of 4% of dying in each of the 3 subsequent years". [https://x.com/Jean%5f%5fFisch/status/1861413598343663657] However it might be more accurate to calculate the weighted average of the probability of dying over the next year where the weight is the number of COVID deaths for each age. I didn't find deaths in Bergamo by single year of age. But when I multiplied the likelihood of dying over the next year from the 2019 US life table with the number of excess deaths for each age in NYC in 2020, the overall probability of dying over the next year was about 6.3%:
# 2019 US life table for both sexes combined download.file("https://ftp.cdc.gov/pub/Health_Statistics/NCHS/Publications/NVSR/70-19/Table01.xlsx","Table01.xlsx") qx=setDT(readxl::read_excel("Table01.xlsx",skip=2,n_max=101))[,.(age=0:100,ex)] # yearly deaths from CDC WONDER where the county of occurrence is one of 5 counties of NYC # ages below 16 are excluded because they had years where the number of deaths was suppressed dead=fread("http://sars2.net/f/nycyearlydead.csv")[age>=16] base=dead[year%in%2010:2019,.(year=2020,base=predict(lm(dead~year),.(year=2020))),age] me=merge(qx,merge(base,dead)[,.(excess=dead-base),age]) me[,weighted.mean(qx,excess)] # 0.06320128 (about 6.3% probability of dying over the next year)
In the next code for Bergamo I used 5-year age groups up to 90+ because I didn't find data for deaths by single year of age. But when I took a weighted average of 2019 mortality rates where the weight was the number of excess deaths for each age group on weeks 9 to 19 of 2020, the resulting likelihood of dying over the next year was about 7.0%:
euro=fread("estat_demo_r_mweek3.tsv.gz",na=":") meta=fread(text=euro[[1]],header=F) euro=cbind(meta[,.(age=V4)],euro[,-1])[meta[,V3=="T"&V5=="ITC46"&!V4%in%c("TOTAL","Y_GE85","UNK")]] euro[,age:=ifelse(age=="Y_LT5",0,as.integer(sub("\\D+(\\d+).*","\\1",age)))] euro=euro[,.(age,date=rep(names(.SD)[-1],each=.N),dead=as.integer(sub(" .*","",unlist(.SD[,-1],,F))))] euro[,year:=as.integer(substr(date,1,4))][,week:=as.integer(substr(date,7,8))] a=euro[week%in%9:19,.(dead=sum(dead)),.(age,year)][year>=2013] ex=merge(a[year==2020],a[,predict(lm(dead~year),.(year=2020)),age])[,.(age,excess=dead-V1)] # from dati.istat.it/Index.aspx?QueryId=42869 pop=data.table(age=0:18*5,pop=c(39390,48231,56530,59100,59678,58801,59797,63250,70012,84326,90600,92658,79300,65720,59191,49584,38262,23689,13109)) cmr=merge(pop,eu[year==2019,sum(dead),age])[,V1/pop] weighted.mean(cmr,ex$excess) # 0.07036192
In December 2024 Rancourt published a paper titled "Medical Hypothesis: Respiratory epidemics and pandemics without viral transmission". [https://correlation-canada.org/respiratory-epidemics-without-viral-transmission/] In the paper he wrote: "Rancourt et al. (2021a) showed this in detail, into 2021 (their Figures 34a through 34i). They also pointed out that more than half of the deaths assigned as COVID-19 deaths could include life-threatening co-occurring bacterial pneumonia, according to CDC tabulations of death certificates, and that prescriptions of antibiotics were significantly reduced in the same period."
The figures from his old paper showed data from CDC WONDER, where during COVID waves the number of deaths with MCD pneumonia was about half or more of the number of deaths with MCD COVID: [http://researchgate.net/publication/355574895]
The figures included various forms of pneumonia and not only bacterial pneumonia, even though the paper didn't seem to specify which ICD codes were included under the label of pneumonia.
The death record data used at CDC WONDER can be downloaded from here: https://cdc.gov/nchs/nvss/mortality_public_use_data.htm. In the files for 2020-2022, about 51% of deaths with UCD U07.1 have MCD J18.9 (pneumonia, unspecified), which is by far the most common MCD for deaths with UCD COVID. The most common MCD for specifically bacterial pneumonia is J15.9 (Bacterial pneumonia, unspecified), which is only included for about 1.2% deaths with UCD COVID, followed by J15.2 (Pneumonia due to staphylococcus, 0.5%), J15.1 (Pneumonia due to Pseudomonas, 0.3%), and J15.0 (Pneumonia due to Klebsiella pneumoniae, 0.1%):
system("wget https://ftp.cdc.gov/pub/Health_Statistics/NCHS/Datasets/DVS/mortality/mort202{0..2}us.zip;unzip mort202?us.zip") library(readr);library(data.table) t=do.call(rbind,lapply(Sys.glob(paste0("VS",20:22,"MORT.*")),\(i)setDT(read_fwf(i,fwf_cols(restatus=c(20,20),age=c(70,73),ucod=c(146,149),enicon_1=c(167,170),enicon_2=c(174,177),enicon_3=c(181,184),enicon_4=c(188,191),enicon_5=c(195,198),enicon_6=c(202,205),enicon_7=c(209,212),enicon_8=c(216,219),enicon_9=c(223,226),enicon_10=c(230,233),enicon_11=c(237,240),enicon_12=c(244,247),enicon_13=c(251,254),enicon_14=c(258,261),enicon_15=c(265,268),enicon_16=c(272,275),enicon_17=c(279,282),enicon_18=c(286,289),enicon_19=c(293,296),enicon_20=c(300,303)))))) t[age==9999,age:=NA][,age:=ifelse(age<2000,age%%1000,0)] sub=t[restatus!=4&ucod=="U071"] icd=setkey(fread("http://sars2.net/f/wonderbyicd.csv.gz")[rowid(code)==1][,code:=sub(".","",code,fixed=T)],"code") mcd=na.omit(sub[,.(id=.I,age,pos=rep(1:20,each=.N),code=unlist(.SD[,-(1:3)],,F))]) mcd$cause=icd[mcd$code,cause] o=merge(mcd[rowid(id,code)==1,.(pct=.N/sub[,.N]*100,age=mean(age)),code],icd) o[order(-pct)][1:200,writeLines(sprintf("%5.1f %4.1f %4-s %s",pct,age,code,cause))]
100.0 73.5 U071 COVID-19 50.7 72.7 J189 Pneumonia, unspecified 25.3 72.8 J960 Acute respiratory failure 16.3 72.8 J969 Respiratory failure, unspecified 15.1 75.6 I10 Essential (primary) hypertension 11.7 71.8 I469 Cardiac arrest, unspecified 10.9 67.5 J80 Adult respiratory distress syndrome [...] 3.0 73.3 J129 Viral pneumonia, unspecified [...] 1.2 71.9 J159 Bacterial pneumonia, unspecified [...] 1.0 78.5 J690 Pneumonitis due to food and vomit [...] 0.6 69.3 A499 Bacterial infection, unspecified [...] 0.5 64.4 J152 Pneumonia due to staphylococcus [...] 0.3 66.1 J151 Pneumonia due to Pseudomonas [...] 0.2 68.5 A498 Other bacterial infections of unspecified site [...] 0.2 72.6 J181 Lobar pneumonia, unspecified [...] 0.2 69.4 J180 Bronchopneumonia, unspecified [...] 0.1 64.7 J150 Pneumonia due to Klebsiella pneumoniae [...] 0.1 72.8 J128 Other viral pneumonia
There's a total of 954,269 records with UCD COVID but only 21,350 records had any of the ICD codes that specifically designate bacterial pneumonia (J13-J15). There's of course a lot of people who had the MCD for unspecified pneumonia even though they actually had bacterial pneumonia or pneumonia that was due to a bacterial coinfection along with SARS-CoV-2.
But in any case pneumonia is a symptom of COVID itself, so the looking at the prevalence of any form of pneumonia among people who died from COVID is not a reasonable way to estimate the prevalence of bacterial pneumonia among people who died from COVID.
USMortality posted this tweet: [https://x.com/USMortality/status/1864219839725810048]
His mortality data does not even include China or Iran, and in China the peak in COVID cases was around early February so the peak in COVID deaths may have been a week or two later. [stat.html#Plot_COVID_cases_by_province_and_day_in_early_2020_in_China]
I posted this plot to him and asked that if you make a SEIR model with parameters similar to the COVID pandemic in 2020, then if you introduce a single infection in October 2019 then how long does it take for the deaths to rise clearly above zero relative to the eventual peak in deaths?
library(deSolve);library(ggplot2) seir=\(time,state,parameters){ with(as.list(c(state,parameters)),{ dS<--beta*S*I/N dE=beta*S*I/N-sigma*E dI=sigma*E-gamma*I dR=gamma*I dD=death_rate*I # deaths are a fraction of infected list(c(dS,dE,dI,dR,dD)) }) } N=8e9 # population size beta=0.3 # transmission rate sigma=1/5 # incubation period 5 days gamma=1/8 # infectious period 8 days death_rate=.0001 # infection fatality rate 0.01% xstart=0;xend=400;xstep=20;days=0:xend init=c(S=N-1,E=0,I=1,R=0,D=0) par=c(beta=beta,sigma=sigma,gamma=gamma,death_rate=death_rate,N=N) p=as.data.frame(ode(y=init,times=days,func=seir,parms=par)) p$dead=diff(c(0,p$D)) sub=paste0("A single infection was introduced on the first day of the simulation. Population size is 8 billion, transmission rate beta is 0.3, incubation rate sigma is 1/5 days, recovery rate gamma is 1/8 days, infection fatality rate is 0.01%. Total deaths ",formatC(sum(p$dead),digits=0,format="f",big.mark=","),".") ybreak=pretty(p$dead);ystart=0;yend=max(p$dead)*1.04 ggplot(p,aes(x=time,y=dead))+ geom_vline(xintercept=c(xstart,xend),lineend="square",linewidth=.3)+ geom_hline(yintercept=c(ystart,yend),lineend="square",linewidth=.3)+ geom_line(color="red",linewidth=.8)+ annotate(geom="label",label=sub,x=10,y=77e3,fill="white",label.r=unit(2,"pt"),label.padding=unit(3,"pt"),label.size=.2,size=2.5,hjust=0,lineheight=.9)+ labs(x="Days since start of simulation",y=NULL,title="SEIR model for daily deaths")+ scale_x_continuous(limits=c(xstart,xend),breaks=seq(xstart,xend,xstep))+ scale_y_continuous(limits=c(ystart,yend),breaks=ybreak,labels=kim)+ coord_cartesian(clip="off",expand=F)+ theme(axis.text=element_text(size=7,color="black"), axis.ticks=element_line(linewidth=.3), axis.ticks.length=unit(3,"pt"), axis.title=element_text(size=7.5), panel.background=element_blank(), panel.grid=element_blank(), plot.margin=margin(4,10,4,4,"pt"), plot.title=element_text(face=2,size=8.4,margin=margin(1,,4))) ggsave("1.png",width=4,height=2.3,dpi=400*4) system("magick 1.png -resize 25% PNG8:1.png")
A lot of people from the no-pandemic crowd claim that there were no detectable excess deaths before the day of the WHO pandemic declaration anywhere in the world, even though they don't even have mortality data for China. USMortality probably knows better because he has done a lot of work aggregating worldwide mortality data. But I pointed out to him anyway that in Bergamo the deaths had already reached about halfway to the peak on the day of the WHO pandemic declaration:
library(data.table);library(ggplot2) agecut=\(x,y)cut(x,c(y,Inf),paste0(y,c(paste0("-",y[-1]-1),"+")),T,F) ma=\(x,b=1,f=b){x[]=rowMeans(embed(c(rep(NA,b),x,rep(NA,f)),f+b+1),na.rm=T);x} dead=fread("comuni_giornaliero_31agosto24.csv.gz",na="n.d.")[NOME_PROVINCIA=="Bergamo"] dead=dead[,.(age=c(0,1,1:20*5)[CL_ETA+1],year=rep(2011:2024,each=.N),month=GE%/%100,day=GE%%100,dead=unlist(.SD[,38:51]))] p=dead[,.(y=sum(dead)),.(x=yea(year,month,day))] xstart=as.Date("2019-1-1");xend=as.Date("2022-1-1") p=p[x>=xstart&x<=xend] xbreak=seq(xstart+182,xend,"year");xlab=year(xbreak) annox=as.Date("2020-3-11");annoy=p[x==annox,y] ggplot(p,aes(x,y))+ geom_vline(xintercept=as.Date("2020-3-11"),color="blue",linewidth=.3,linetype="42",lineend="square")+ geom_vline(xintercept=seq(xstart,xend,"month"),linewidth=.25,color="gray85",lineend="square")+ geom_vline(xintercept=seq(xstart,xend,"year"),linewidth=.3,color="black",lineend="square")+ geom_line(linewidth=.35)+ annotate(geom="label",x=annox-20,hjust=1,y=annoy,label=paste0(annoy," deaths on day\nof WHO pandemic\ndeclaration (March 11th)"),size=2.4,label.size=.15,color="blue",fill="white",label.r=unit(2,"pt"),label.padding=unit(3,"pt"),label.size=.2,lineheight=.9)+ annotate(geom="segment",x=annox-20,xend=annox,,y=annoy,yend=annoy,color="blue",linewidth=.3)+ labs(title="Daily deaths in Bergamo",subtitle="Source: istat.it/notizia/dati-di-mortalita-cosa-produce-listat",x=NULL,y=NULL)+ scale_x_continuous(limits=c(xstart,xend),breaks=xbreak,labels=xlab)+ scale_y_continuous(limits=c(0,NA),breaks=pretty)+ coord_cartesian(clip="off",expand=F)+ theme(axis.text=element_text(size=7,color="black"), axis.text.y=element_text(margin=margin(,1)), axis.ticks=element_line(linewidth=.3), axis.ticks.length=unit(3,"pt"), axis.ticks.length.x=unit(0,"pt"), legend.box.spacing=unit(0,"pt"), legend.direction="horizontal", legend.justification=c(0,1), legend.key=element_blank(), legend.key.height=unit(9,"pt"), legend.key.width=unit(20,"pt"), legend.background=element_blank(), legend.margin=margin(,,3), legend.position="top", legend.spacing.x=unit(1.5,"pt"), legend.spacing.y=unit(0,"pt"), legend.text=element_text(size=7), legend.title=element_blank(), panel.background=element_blank(), panel.border=element_rect(fill=NA,linewidth=.3), panel.spacing.x=unit(2,"pt"), panel.spacing.y=unit(1,"pt"), plot.margin=margin(4,4,4,4), plot.subtitle=element_text(size=7,margin=margin(,,3)), plot.title=element_text(size=7.5,face="bold",margin=margin(,,3))) ggsave("1.png",width=4,height=2.5,dpi=380*4) system("magick 1.png -resize 25% -colors 256 1.png")
USMortality told me: "It's almost impossible for a single point emergence to cause peaks in the exact same week all over the world!" But I pointed out that the peak in deaths was not even close to the same week all over the world, and I posted this plot of countries with at least 80% monthly excess deaths at OWID in 2020:
download.file("https://covid.ourworldindata.org/data/owid-covid-data.csv","owid-covid-data.csv") download.file("https://gist.githubusercontent.com/tadast/8827699/raw/61b2107766d6fd51e2bd02d9f78f6be081340efc/countries_codes_and_coordinates.csv","latlongiso3.csv") library(data.table) yemo=\(x){u=unique(x);format(u,"%Y-%m")[match(x,u)]} t=fread("owid-covid-data.csv") lat=r("f/countrylatlongiso3")[,.(iso_code=`Alpha-3 code`,lat=`Latitude (average)`,long=`Longitude (average)`)] t=merge(t,lat) o[,continent:=factor(continent,c("North America","South America","Europe","Africa","Asia"))] a=o[date<="2020-12-31",.(excess=mean(excess_mortality,na.rm=T)),.(location,month=yemo(date),lat,long,continent)] a=a[location%in%a[excess>=80,location]] a[,location:=factor(location,a[rowid(location)==1,location[order(continent,-lat+long)]])] m=a[,xtabs(excess~location+month)] disp=round(m) maxcolor=max0(m) pal=colorRampPalette(hex(HSV(c(210,210,210,210,0,0,0,0,0),c(.9,.75,.6,.3,0,.3,.6,.75,.9),c(.4,.65,1,1,1,1,1,.65,.4))))(256) pheatmap::pheatmap(m,filename="0.png",display_numbers=disp, cluster_rows=F,cluster_cols=F,legend=F,cellwidth=16,cellheight=16,fontsize=9,fontsize_number=8, border_color=NA,na_col="gray90", number_color=ifelse(abs(m)>maxcolor*.6,"white","black"), breaks=seq(-maxcolor,maxcolor,,256), pal) sub="Monthly excess percentage of deaths at OWID. Weekly data for excess deaths was converted to monthly by taking the average of weeks which ended during the month. The excess deaths are based on raw deaths and not CMR or ASMR, and they have a 2015-2019 linear trend that was adjusted for seasonal variation in mortality. Source: covid.ourworldindata.org/data/owid-covid-data.csv." system(paste0("mar=22;w=`identify -format %w 0.png`;convert \\( -size $[w-mar*2]x -font Arial -interline-spacing -3 -pointsize 38 caption:'",sub,"' -splice $[mar]x10 \\) 0.png -append 1.png"))
In China reported COVID deaths already peaked around mid-February:
cases=fread("https://github.com/cheongsa/Coronavirus-COVID-19-statistics-in-China/raw/master/China_daily_new_infections.csv",header=T) dead=fread("https://github.com/cheongsa/Coronavirus-COVID-19-statistics-in-China/raw/master/China_daily_new_deaths.csv",header=T) m=as.matrix(rowsum(cases[,-(1:4)],cases[[2]])) m[is.na(m)]=0 colnames(m)=sub(" .*","",colnames(m)) rownames(m)=sub(" (Uygur |Zhuang |Hui )?(Autonomous Region|Special Administrative Region|Municipality)","",rownames(m)) m=rbind(m,"Total cases"=colSums(m),"Deaths (Hubei)"=colSums(dead[dead$Pro=="Hubei",-(1:4)],na.rm=T),"Total deaths"=colSums(dead[,-(1:4)],na.rm=T)) disp=ifelse(m>=1e4,sprintf("%.0fk",m/1e3),ifelse(m>=1e3,sprintf("%.1fk",m/1e3),m)) max=apply(m,1,max,na.rm=T) max[is.infinite(max)]=NA m=m/max pheatmap::pheatmap(m,filename="1.png",display_numbers=disp,gaps_row=nrow(m)-3, cluster_rows=F,cluster_cols=F,legend=F,cellwidth=16,cellheight=13,fontsize=9,fontsize_number=8, border_color=NA,na_col="gray90",number_color=ifelse(!is.na(m)&m>.45,"white","black"), breaks=seq(0,1,,256),sapply(255:0/255,\(i)rgb(i,i,i))) system("w=`identify -format %w 1.png`;convert \\( -size $[w-44]x -pointsize 40 -font Arial caption:'COVID cases in China (github.com/cheongsa/Coronavirus-COVID-19-statistics-in-China). Each row has its own color scale where the maximum value is shown in black. On February 12th Hubei health authorities began including clinically diagnosed cases, such as cases with a chest CT scan showing signs of pneumonia even if a positive PCR test was not available, so many cases and deaths were added retroactively.' -splice 22x20 \\) 1.png -append 1.png")
Bergamo is a NUTS3-level region within Lombardy which is a NUTS2-level region. But on Eurostat when I sorted NUTS2 regions by the peak weekly excess deaths in 2020, the two highest-ranking regions were Madrid and Castilla-La Mancha in central Spain, and Lombardy only came on third place:
download.file("https://ec.europa.eu/eurostat/api/dissemination/sdmx/2.1/data/demo_r_mwk3_t?format=TSV","demo_r_mwk3_t.tsv") download.file("https://ec.europa.eu/eurostat/api/dissemination/sdmx/2.1/data/demo_r_mwk2_ts?format=TSV","estat_demo_r_mwk2_ts.tsv") download.file("https://ec.europa.eu/eurostat/documents/345175/629341/NUTS2021.xlsx","NUTS2021.xlsx") library(data.table);library(ggplot2) ua=\(x,y,...){u=unique(x);y(u,...)[match(x,u)]} isoweek=\(year,week,weekday=7){d=as.Date(paste0(year,"-1-4"));d-(as.integer(format(d,"%w"))+6)%%7-1+7*(week-1)+weekday} nuts=setDT(readxl::read_excel("NUTS2021.xlsx",sheet=2)) nuts=na.omit(nuts[,.(name=unlist(.SD[,2:4],,F),nuts=.SD[[1]])])[!name%like%"Extra-Regio"] euro=fread("estat_demo_r_mwk2_ts.tsv",na=":") euro=euro[euro[[1]]%like%"^W,T"] eu=melt(euro,id=1,,"week","dead") eu[,week:=ua(week,\(x)isoweek(as.integer(substr(x,1,4)),as.integer(substr(x,7,8)),4))] eu=eu[,.(nuts=sub(".*,","",eu[[1]]),dead=as.integer(sub(" .*","",dead)),week)] eu=merge(eu[year(week)==2019,.(base=mean(dead)),nuts],eu) eu=merge(nuts,eu) eu[,name:=paste0(nuts,": ",sub("/.*","",name))] eu=eu[!nuts%like%"LI00|ES64"&nchar(nuts)==4] pick=eu[year(week)==2020,max(dead/base),nuts][order(-V1)][1:24,nuts] p=eu[nuts%in%pick][,nuts:=factor(nuts,pick)] p[,pct:=dead/base*100] xstart=as.Date("2020-1-1");xend=as.Date("2021-1-1");xbreak=seq(xstart,xend,"month") ybreak=p[,pretty(c(0,dead))] p=p[week>=xstart&week<=xend] colors=read.csv(text="country,color ES,c6111a IT,009246 FR,0055a4 BE,000000 SE,ffcd00 CH,ff8888 PL,888888 BG,005000 NL,ff9b00 SK,cc44cc LI,8888ff") p[,country:=factor(substr(nuts,1,2),colors$country)] label=p[,.(pct=max(pct),hjust=as.integer(week[which.max(dead/base)]<=as.Date("2020-7-1"))),.(nuts,name,country)] ggplot(p,aes(x=week,y=pct))+ facet_wrap(~nuts,ncol=4,strip.position="top",scales="free")+ geom_vline(xintercept=seq(xstart,xend,"year"),linewidth=.25,color="gray85",lineend="square")+ geom_hline(yintercept=100,linetype="22",linewidth=.3,color="gray50")+ geom_line(aes(color=country),linewidth=.35)+ geom_label(data=label,aes(hjust=hjust,x=c(xstart+3,xend-3)[hjust+1],label=name,color=country,y=pct),vjust=1.2,fill=alpha("white",.85),label.r=unit(0,"pt"),label.padding=unit(1,"pt"),label.size=0,size=2.4)+ labs(title="Weekly deaths in 2020 as percentage of average weekly deaths in 2019 (Eurostat dataset demo_r_mwk2_ts)",subtitle="Sorted by peak weekly percentage. Two small regions were excluded because they had peaks due to noise caused by a small sample size.",x=NULL,y=NULL)+ scale_x_date(limits=c(xstart,xend),breaks=xbreak)+ scale_y_continuous(limits=c(0,NA),breaks=\(x)pretty(x,7)[-1])+ scale_color_manual(values=paste0("#",colors$color))+ coord_cartesian(clip="off",expand=F)+ theme(axis.text=element_text(size=7,color="black"), axis.text.x=element_blank(), axis.text.y=element_text(margin=margin(,1)), axis.ticks=element_line(linewidth=.3), axis.ticks.length=unit(-2,"pt"), legend.position="none", panel.background=element_blank(), panel.border=element_rect(fill=NA,linewidth=.3), panel.spacing.x=unit(2,"pt"), panel.spacing.y=unit(2,"pt"), plot.margin=margin(4,4,2,4), plot.subtitle=element_text(size=7,margin=margin(,,3)), plot.title=element_text(size=7.5,face="bold",margin=margin(1,,2)), strip.background=element_blank(), strip.text=element_blank()) ggsave("1.png",width=7.4,height=6,dpi=380*4) system("magick 1.png -resize 25% -colors 256 1.png")