Comments to Ethical Skeptic - sars2.net

Contents

Plot for cardiac deaths in ages 0 to 54

Ethical Skeptic posted this tweet where he commented on a plot by Truth In Numbers: [https://x.com/EthicalSkeptic/status/1757181136306848244]

  1. He used the wrong ICD-codes: used 'heart disease in older persons' (many of whom died in 2020/21 from Covid), not 'sudden cardiac death in younger persons' (see panel 1 chart done correctly).

  2. Used an ICD R99/999 and Lag depleted 2023 data set (see data in panel 2). This is simply malicious.

  3. Diluted the group with population growth (this cohort shrank actually, but he used total population - cheating, idiocy or both).

  4. Did not use pull forward effect, as a professional does.

  5. Quashed the chart into a full range y-axis magnitude to make it appear flat.

  6. Put in a fictitious regression line, when the weekly line shows clearly an increase (save for his corrupted 2023 numbers)

He didn't specify how he calculated the baseline for excess deaths. But if he only fitted the baseline based on data from 2018 and 2019, then it might explain why the line for excess deaths appears to be so flat in 2018 and 2019, and the line would've probably looked less flat if he used a longer fitting period so it wouldn't have fitted as closely to the data in 2018-2019. His plot is also missing data before 2018 so it's not clear what the trend was before COVID.

I did a query at CDC WONDER where I used the same ICD codes as Ethical Skeptic but I did one query where multiple causes of death included one of the cardiac-related ICD codes he used in his plot, and I did another query for the two unknown causes of death he included in his plot (R96 and R99): https://wonder.cdc.gov/mcd.html. I excluded COVID deaths by doing a query where the multiple causes of death included both one of the cardiac ICD codes and a code for COVID, and then I subtracted the weekly COVID deaths from the cardiac total (but I didn't bother subtracting COVID deaths from the unknown causes of death).

I noticed that there was a spike in cardiac deaths in early 2018, but for some reason Ethical Skeptic omitted approximately the first two months of 2018 from his plot. The peak in deaths had a similar height as the peaks in the winters after COVID, so at first I thought he may have omited in early 2018 from his plot which made the spikes after COVID seem higher in comparison. But then I noticed that he also omitted approximately the first two months of 2018 from his other plots, so it might have something to do with his "lag adjustment" method.

Ethical Skeptic omitted the last 26 weeks of 2023 from his plot because there was a big increase in deaths where the cause had not yet been specified. But his plot had an increase in deaths in the first half of 2023, but it seems to be due to deaths where the cause has not yet been specified and not due to cardiac deaths:

library(tempdisagg);library(ggplot2);library(stringr)

wonder=\(x){t=readLines(x);t=paste(t[1:(grep("^\"(---|Total)",t)[1]-1)],collapse="\n");read.table(sep="\t",text=t,header=T,na.strings="Not Applicable")[-1]}

old=wonder("Multiple Cause of Death, 1999-2020.txt")
old=td(data.frame(as.Date(paste(old$Month.Code,1),"%Y/%m %d"),old$Deaths)~1,,"daily","fast")$values

new=wonder("Provisional Mortality Statistics, 2018 through Last Week.txt")
covid=wonder("wondercovid.txt")
mt=match(covid$MMWR.Week.Code,new$MMWR.Week.Code)
new$Deaths[mt]=new$Deaths[mt]-covid$Deaths
new=td(data.frame(as.Date(sub(".*ending ","",new$MMWR.Week),"%B %d, %Y")+3,new$Deaths)~1,,"daily","fast")$values

ill=wonder("ill.txt")
ill=td(data.frame(as.Date(sub(".*ending ","",ill$MMWR.Week),"%B %d, %Y")+3,ill$Deaths)~1,,"daily","fast")$values
illold=wonder("illold.txt")
illold=td(data.frame(as.Date(paste(illold$Month.Code,1),"%Y/%m %d"),illold$Deaths)~1,,"daily","fast")$values

xy=rbind(old[!old$time%in%new$time,],new)
xy$z="MCoD potentially involving sudden cardiac death, COVID excluded"

xy=rbind(xy,cbind(rbind(ill,illold[!illold$time%in%ill$time,]),z="Other sudden death with unknown cause, other ill-defined and unspecified causes of mortality"))

colnames(xy)=c("x","y","z")
xy$z=factor(xy$z,unique(xy$z))

xstart=as.Date("2011-1-1")
xend=as.Date("2024-1-1")
xbreak=seq(xstart,xend,"6 month")
xlab=c(rbind("",2011:2023),"")
cand=c(sapply(c(1,2,5),\(x)x*10^c(-10:10)))
ymax=max(xy$y,na.rm=T)
ystart=0
ystep=cand[which.min(abs(cand-ymax/5))]
yend=ystep*ceiling(ymax/ystep)
ybreak=seq(ystart,yend,ystep)

color=c("#aa0000","#888888")

sub="Data from wonder.cdc.gov/mcd.html. Weekly data from 2018-2023 and monthly data from 1999-2017 was interpolated to daily data. Multiple causes of death that potentially involve sudden cardiac death consist of I45 (Other conduction disorders), I46 (Cardiac arrest), I47 (Paroxysmal tachycardia), I48 (Atrial fibrillation and flutter), I49 (Other cardiac arrhythmias), I50 (Heart failure), I51 (Complications and ill-defined descriptions of heart disease), ROO (Abnormalities of heart beat), R01 (Cardiac murmurs and other cardiac sounds), R09.2 (Respiratory arrest), R09.8 (Other specified symptoms and signs involving the circulatory and respiratory systems). Ill-defined and unknown causes of death consist of multiple causes of death R96 (Other sudden death, cause unknown) and R99 (Other ill-defined and unspecified causes of mortality)."

labels=data.frame(x=xstart+.02*(xend-xstart),y=seq(yend*.93,,-yend/13,nlevels(xy$z)),label=levels(xy$z))

ggplot(xy,aes(x=x,y=y,color=z))+
geom_hline(yintercept=c(ystart,0,yend),color="gray65",linewidth=.25)+
geom_vline(xintercept=c(xstart,xend),color="gray65",linewidth=.25)+
geom_line(aes(color=z),linewidth=.3)+
geom_label(data=labels,aes(x=x,y=y,label=label),fill=alpha("white",.8),label.r=unit(0,"lines"),label.padding=unit(.04,"lines"),label.size=0,color=color[1:nrow(labels)],size=2.4,hjust=0)+
labs(title=str_wrap("CDC Wonder, ages 0-54: daily number of deaths with multiple cause of death potentially involving sudden cardiac death. MCoD U07.1 (COVID) and U09.9 (long COVID) excluded.",88),caption=str_wrap(sub,95),x=NULL,y=NULL)+
coord_cartesian(clip="off",expand=F)+
scale_x_date(limits=c(xstart,xend),breaks=xbreak,labels=xlab)+
scale_y_continuous(limits=c(ystart,yend),breaks=ybreak)+
scale_color_manual(values=color)+
theme(axis.text=element_text(size=6.5,color="black"),
  axis.ticks=element_line(linewidth=.25,color="gray65"),
  axis.ticks.x=element_line(color=alpha("gray65",c(1,0))),
  axis.ticks.length=unit(.15,"lines"),
  axis.title=element_text(size=8),
  legend.position="none",
  panel.background=element_rect(fill="white"),
  panel.grid=element_blank(),
  plot.background=element_rect(fill="white"),
  plot.margin=margin(.4,.65,.4,.4,"lines"),
  plot.caption=element_text(size=7,hjust=0,margin=margin(.6,0,0,0,"lines")),
  plot.title=element_text(size=7.5))
ggsave("1.png",width=4.6,height=3.3,dpi=450)

In Ethical Skeptic's plot the excess deaths looked flat in 2018-2019, but it might partially be because he fitted the trend against data from 2018-2019, because if you fit a seasonality-adjusted trend against only two years of data then it's easy to get the trend to match the data closely. In the plot below where I fitted the trend against data from 2021-2022, I also got my excess deaths to look mostly flat in 2021-2022:

library(tempdisagg);library(ggplot2);library(stringr)

wonder=\(x){t=readLines(x);t=paste(t[1:(grep("^\"(---|Total)",t)[1]-1)],collapse="\n");read.table(sep="\t",text=t,header=T,na.strings="Not Applicable")[-1]}

old=wonder("Multiple Cause of Death, 1999-2020.txt")
old=td(data.frame(as.Date(paste(old$Month.Code,1),"%Y/%m %d"),old$Deaths)~1,,"daily","fast")$values

new=wonder("Provisional Mortality Statistics, 2018 through Last Week.txt")
covid=wonder("wondercovid.txt")
mt=match(covid$MMWR.Week.Code,new$MMWR.Week.Code)
new$Deaths[mt]=new$Deaths[mt]-covid$Deaths
new=td(data.frame(as.Date(sub(".*ending ","",new$MMWR.Week),"%B %d, %Y")+3,new$Deaths)~1,,"daily","fast")$values

ill=wonder("ill.txt")
ill=td(data.frame(as.Date(sub(".*ending ","",ill$MMWR.Week),"%B %d, %Y")+3,ill$Deaths)~1,,"daily","fast")$values
illold=wonder("illold.txt")
illold=td(data.frame(as.Date(paste(illold$Month.Code,1),"%Y/%m %d"),illold$Deaths)~1,,"daily","fast")$values

d=rbind(old[!old$time%in%new$time,],new)|>"colnames<-"(c("x","y"))
xy=cbind(d,z="Actual deaths")

ma=\(x,b=1,f=b)rowMeans(embed(c(rep(NA,b),x,rep(NA,f)),f+b+1),na.rm=T)

prediction=d$x>="2018-1-1"&d$x<="2019-12-31"
linear=predict(lm(y~x,d[prediction,]),d)
days=substr(d$x,6,10)
daily=tapply(d$y[prediction]-linear[prediction],days[prediction],mean)
daily=ma(rep(daily,3),10)[(length(daily)+1):(2*length(daily))]|>setNames(names(daily))
excess=d$y-(linear+daily[days])
xy=rbind(xy,data.frame(x=d$x,y=excess,z="Excess deaths using seasonality-adjusted linear trend fitted against data from 2018-2019"))

prediction=d$x>="2021-01-01"&d$x<="2022-12-31"
linear=predict(lm(y~x,d[prediction,]),d)
days=substr(d$x,6,10)
daily=tapply(d$y[prediction]-linear[prediction],days[prediction],mean)
daily=ma(rep(daily,3),10)[(length(daily)+1):(2*length(daily))]|>setNames(names(daily))
excess=d$y-(linear+daily[days])
xy=rbind(xy,data.frame(x=d$x,y=excess,z="Excess deaths using seasonality-adjusted linear trend fitted against data from 2021-2022"))

xy$z=factor(xy$z,unique(xy$z))

xstart=as.Date("2011-1-1");xend=as.Date("2024-1-1")
xy=xy[xy$x>=xstart&xy$x<=xend,]

xbreak=seq(xstart,xend,"6 month")
xlab=c(rbind("",2011:2023),"")
cand=c(sapply(c(1,2,5),\(x)x*10^c(-10:10)))
ymax=max(xy$y,na.rm=T);ymin=min(xy$y,na.rm=T)
ystep=cand[which.min(abs(cand-(ymax-ymin)/5))]
ystart=ystep*floor(ymin/ystep)
yend=ystep*ceiling(ymax/ystep)
ystart=-120;yend=220
ybreak=seq(-100,yend,ystep)

color=c("#aa0000","#889944","#44aa44")

labels=data.frame(x=xstart+.02*(xend-xstart),y=seq(ystart+(yend-ystart)*.72,,-(yend-ystart)/15,nlevels(xy$z)),label=levels(xy$z))

sub=str_wrap("Data from wonder.cdc.gov/mcd.html. Weekly data from 2018-2023 and monthly data from 1999-2017 was interpolated to daily data. Multiple causes of death that potentially involve sudden cardiac death consist of I45 (Other conduction disorders), I46 (Cardiac arrest), I47 (Paroxysmal tachycardia), I48 (Atrial fibrillation and flutter), I49 (Other cardiac arrhythmias), I50 (Heart failure), I51 (Complications and ill-defined descriptions of heart disease), ROO (Abnormalities of heart beat), R01 (Cardiac murmurs and other cardiac sounds), R09.2 (Respiratory arrest), and R09.8 (Other specified symptoms and signs involving the circulatory and respiratory systems).",95)
sub=paste0(sub,"\n\n",str_wrap("The trend was calculated by first doing linear regression for daily data, then calculating the average difference to the trend for each 365 or 366 days of the year, then taking a 21-day centered moving average of the vector for daily differences, and then adding the difference to the trend on the corresponding day of each year.",95))

ggplot(xy,aes(x=x,y=y,color=z))+
geom_hline(yintercept=c(ystart,0,yend),color="gray65",linewidth=.25)+
geom_vline(xintercept=c(xstart,xend),color="gray65",linewidth=.25)+
geom_line(aes(color=z),linewidth=.3)+
geom_label(data=labels,aes(x=x,y=y,label=label),fill=alpha("white",.8),label.r=unit(0,"lines"),label.padding=unit(.04,"lines"),label.size=0,color=color[1:nrow(labels)],size=2.4,hjust=0)+
labs(title=str_wrap("CDC Wonder, ages 0-54: daily number of deaths with multiple cause of death potentially involving sudden cardiac death. MCoD U07.1 (COVID) and U09.9 (long COVID) excluded.",88),x=NULL,y=NULL,caption=sub)+
coord_cartesian(clip="off",expand=F)+
scale_x_date(limits=c(xstart,xend),breaks=xbreak,labels=xlab)+
scale_y_continuous(limits=c(ystart,yend),breaks=ybreak)+
scale_color_manual(values=color)+
theme(axis.text=element_text(size=6.5,color="black"),
  axis.ticks=element_line(linewidth=.25,color="gray65"),
  axis.ticks.x=element_line(color=alpha("gray65",c(1,0))),
  axis.ticks.length=unit(.15,"lines"),
  axis.title=element_text(size=8),
  legend.position="none",
  panel.background=element_rect(fill="white"),
  panel.grid=element_blank(),
  plot.background=element_rect(fill="white"),
  plot.margin=margin(.4,.65,.4,.4,"lines"),
  plot.caption=element_text(size=6.7,hjust=0,margin=margin(.6,0,0,0,"lines")),
  plot.title=element_text(size=7.5))
ggsave("1.png",width=4.6,height=4.5,dpi=450)

Added later: Ethical Skeptic later explained to me was that the reason why his plot seemed to be missing approximately the first two months of data was that his plot showed a moving average, so the first two months of data were actually included in the moving average for the point that was plotted around the start of March 2018.

Updated plot for cardiac deaths in ages 0 to 54 with drug deaths included

After Ethical Skeptic had published the plot that I wrote about in the previous section of this HTML file, he published an updated version of the plot where he included deaths due to accidental drug poisoning, and he included this note at the bottom of the plot: "X42-44 ((narcotics, heroin, fentanyl, meth, other drug) excess mortality surge removed (for 2020 only - lockdown impact is not trend data)". [https://theethicalskeptic.substack.com/p/the-state-of-things-pandemic-week-706] The plot now had a sudden increase in deaths between December 2020 and January 2021 which was missing from the earlier version, but it's because he included drug overdose deaths in 2021 to 2023 but not in 2020:

The reason why the increase between 2020 and 2021 in the plot above is not instant but it's spread over several weeks is because Ethical Skeptic told me that he used a 7-week moving average in the plot (which probably has a centered window which extends 3 weeks backwards and forwards, since in some of his earlier plots he has used a moving average with a window that extended 3 weeks backwards and 2 weeks forwards). [https://theethicalskeptic.substack.com/p/the-state-of-things-pandemic-week-706/comment/53237226, https://www.covid-datascience.com/post/evaluating-claims-of-excess-cancer-deaths-in-the-usa-during-the-pandemic]

It's not clear if the plot above includes only X44 deaths or also X42 and X43 deaths, because the line at the bottom indicates that X42 to X44 deaths were removed in 2020, but the second line from the bottom seems to indicate that the plot only includes X44 deaths. In the plot below I got about equally many X44 deaths as X42 and X43 deaths combined.

The plot below also shows that there's spikes in cardiac deaths during COVID waves, so most of the excess deaths in Ethical Skeptic's plot can probably be attributed to either drug deaths or COVID deaths:

library(data.table);library(ggplot2)

t=fread("http://sars2.net/f/wonderskeptic.csv")
p=t[,.(x=date-3,y=as.double(dead),z=factor(cause,unique(cause)))]

xstart=as.Date("2018-1-1");xend=as.Date("2025-1-1")
xbreak=seq(xstart,xend,"6 month");xlab=c(rbind("",2018:2024),"")
ystart=0;yend=3500;ystep=500;ybreak=seq(ystart,yend,ystep)

color=c("#bb0000",hcl(225,100,40),hcl(90,120,50),hcl(280,120,50),hcl(300,40,30),"gray50")

ggplot(p,aes(x,y,color=z))+
coord_cartesian(clip="off",expand=F)+
geom_hline(yintercept=seq(ystart,yend,ystep),color="gray90",linewidth=.25)+
geom_vline(xintercept=seq(xstart,xend,"3 month"),color="gray90",linewidth=.25)+
geom_vline(xintercept=seq(xstart,xend,"year"),linewidth=.25)+
geom_hline(yintercept=c(ystart,yend),linewidth=.25,lineend="square")+
geom_vline(xintercept=c(xstart,xend),linewidth=.25,lineend="square")+
geom_line(linewidth=.3)+
labs(title="CDC WONDER, weekly deaths by multiple cause of death in ages 0 to 54"|>stringr::str_wrap(80),x=NULL,y=NULL)+
scale_x_continuous(limits=c(xstart,xend),breaks=xbreak,labels=xlab)+
scale_y_continuous(limits=c(ystart,yend),breaks=ybreak)+
scale_color_manual(values=color)+
guides(colour=guide_legend(ncol=2,byrow=F))+
theme(axis.text=element_text(size=7,color="black"),
  axis.ticks=element_blank(),
  axis.ticks.length=unit(0,"pt"),
  axis.title=element_text(size=7),
  legend.background=element_blank(),
  legend.box.just="left",
  legend.key=element_blank(),
  legend.spacing.x=unit(2,"pt"),
  legend.key.height=unit(10,"pt"),
  legend.key.width=unit(17,"pt"),
  legend.position=c(0,1),
  legend.justification=c(0,1),
  legend.box.background=element_rect(fill=alpha("white",.85),color="black",linewidth=.3),
  legend.margin=margin(-2,5,4,5),
  legend.text=element_text(size=7,vjust=.5),
  legend.title=element_blank(),
  panel.background=element_blank(),
  panel.grid=element_blank(),
  plot.margin=margin(5,5,5,5),
  plot.subtitle=element_text(size=7),
  plot.title=element_text(size=7.3,face=2))
ggsave("1.png",width=4.3,height=2.7,dpi=400*4)
system("magick 1.png -resize 25% 1.png")

In the plot above I didn't exclude deaths with MCD COVID. In 2020 to 2023 in ages 0 to 54, only about 0.5% of deaths with MCD X42 to X44 also had MCD COVID, but about 9% of deaths with MCD I45 to I51 also had MCD COVID.

When someone asked Ethical Skeptic why there was a huge increase in deaths in early 2021 even though only few people in ages 0 to 54 had been vaccinated, he answered that it was the vaccines but he didn't even mention that he added in drug deaths to his plot in 2021 but not 2020:

And when someone asked Ethical Skeptic why he included drug deaths in his plot, he said that the drug deaths were somehow associated with the vaccines:

In one tweet he further explained that the reason why he included a certain proportion of drug deaths is because if someone died from a cardiac cause induced by a drug overdose, only the drug overdose would be listed as MCD because he claims there is only one MCD listed for deaths from external causes: "The reason we can and ethically should add the incremental vaccine-caused portion of these deaths (from Chart 4c) into this analysis, is because this is a multiple cause of death (MCoD) analysis, not an underlying cause of death (UCoD) analysis. MCoD analysis is used to detect prevalence and trend, when an overlap between causes is determined. However, when death is ascribed to a Non-Natural Cause, no MCoD overlap is afforded that pool of mortality, as only one cause of death is allowed - thereby depleting the sudden cardiac death trend numbers artificially. This has been abused (as one can ascertain from Chart 4c) - therefore, competent analysis on this must include the incremental portion of 'unknown drug overdose' which is clearly sensitive to the introduction of the vaccine." [https://x.com/EthicalSkeptic/status/1778166029601943622] However for example in 2021 in all ages, CDC WONDER returned 4702 deaths where the MCD included X42-X44 (which are the ICD codes for accidental drug overdoses that Ethical Skeptic included in his plot) and where the MCD also included I45-I51, R00, or R01 (which were the cardiac ICD codes in his plot). So I don't think he is right that only one cause of death is allowed for deaths from non-natural causes.

The GIF file below shows that an earlier version of the plot that Ethical Skeptic posted on Twitter was even more misleading, because it didn't include the line at the bottom which said: "X42-44 ((narcotics, heroin, fentanyl, meth, other drug) excess mortality surge removed (for 2020 only - lockdown impact is not trend data)": [https://x.com/EthicalSkeptic/status/1775711298061275434]

X42 is "Accidental poisoning by and exposure to narcotics and psychodysleptics [hallucinogens], not elsewhere classified", X43 is "Accidental poisoning by and exposure to other drugs acting on the autonomic nervous system", and X44 is "Accidental poisoning by and exposure to other and unspecified drugs, medicaments and biological substances". In the year 2021 CDC WONDER returned 45,053 deaths with MCD X42, 35 deaths with MCD X43, and 45,710 deaths with MCD X44.

Someone on Twitter also asked: "Wonder what that graph looks like without that bottom code (and similar others)" (where the bottom line meant the ICD code X44, because the X42 and X43 codes were not included in the version of the post that was posted on Twitter). [https://x.com/All_Im_sayn/status/1775728867543535904] But Ethical Skeptic replied: "It is 268 deaths per week out of the 929. Decrement that, and the injury-death tally is far below the actual vaccine impact - by conducting cherry picking to force a desired conclusion - and it does not even accomplish the task." 929 refers to the number of excess deaths on week 39 of 2023, so I don't know if the number of excess drug deaths would've been similar on other weeks. And I don't know if the figure of 268 only included drug deaths under X44 or also X42 and X43 (which would likely be around double of X44 alone).

One of the three "vaccine uptake periods" that are highlighted in Ethical Skeptic's plot is in mid-2022. I don't know if it's supposed to refer to the period when people got second boosters, because the period in winter 2021-2022 when people got first boosters is not highlighted. in the CDC dataset which shows the number of vaccines administered by age group, the percentage of people in ages 25 to 49 who had received the first booster didn't increase that much after February 2022, but the CDC dataset doesn't even include data for second boosters administered in ages 25 to 49 (even though the total percentage of people in ages 25-49 who ever got a second booster is fairly low): [https://data.cdc.gov/Vaccinations/COVID-19-Vaccination-Age-and-Sex-Trends-in-the-Uni/5i5k-6cmh]

> t=data.table::fread("COVID-19_Vaccination_Age_and_Sex_Trends_in_the_United_States__National_and_Jurisdictional_20240131.csv")
> t=t[Demographic_Category=="Ages_25-49_yrs"&Location=="US"]
> a=aggregate(t[,9:12],list(sub("(..)/../..(..).*","\\2-\\1",t$Date)),mean,na.rm=T)
> colnames(a)=sub("_(Pop|Vax|pct).*","",colnames(a))
> round(t(data.frame(a,row.names=1)))
                   20-12 21-01 21-02 21-03 21-04 21-05 21-06 21-07 21-08 21-09 21-10 21-11 21-12
Administered_Dose1     1     6    10    20    39    51    56    60    64    69    72    74    77
Series_Complete        0     1     6    11    23    40    49    53    56    60    63    65    67
Booster_Doses        NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN     0     1     3    10    23
Second_Booster       NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN
                   22-01 22-02 22-03 22-04 22-05 22-06 22-07 22-08 22-09 22-10 22-11 22-12 23-01
Administered_Dose1    80    81    82    82    83    83    83    83    84    84    85    85    85
Series_Complete       68    69    70    70    70    70    71    71    71    71    72    72    72
Booster_Doses         33    38    39    40    40    41    41    41    42    42    43    43    43
Second_Booster       NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN
                   23-02 23-03 23-04 23-05
Administered_Dose1    85    85    85    86
Series_Complete       72    72    72    72
Booster_Doses         43    43    43    43
Second_Booster       NaN   NaN   NaN   NaN

However actually the reason why Ethical Skeptic highlighted mid-2022 as a vaccine uptake period might have been because his plot had a noticeable spike in all-cause deaths around the same time. But in my plot above which showed similar data grouped by cause of death, there wasn't any cause of death which had a clear increase in deaths around mid-2022, apart from a small increase in COVID deaths (which weren't even included in Ethical Skeptic's plot). But it might be because his plot displayed excess deaths and not raw deaths like my plot, so the increase in mid-2022 might have been an artifact of the way he calculated his baseline.

In his blog Ethical Skeptic had the chutzpah to say that it would equate to obfuscation if he also added drug deaths to his plot in 2020. He wrote: "It should be noted that an initial mid-2020 surge in excess deaths due to overdoses (X42-44) during the lockdown period has been removed from this data. This is a death-surge which is non-Natural/non-Vaccine in its basis and can be viewed as to its arrival shape and magnitude in Chart 4c above. This 2020 surge is not trend data, hence its removal in favor of the true arrival trend attributable to the mRNA vaccine. There exists a slight elevation in these deaths which remains on the chart, attributable to Covid-19, but the permanent surge in mortality arrived as a result of the Covid vaccine beginning in December (Week 52 - 14 days after the vaccine start) 2020. To conflate that pre-vaccine surge with the vaccine arrival data, would constitute obfuscation." [https://theethicalskeptic.com/2024/04/04/the-state-of-things-pandemic-week-50-2023/]

Added later: Later Ethical Skeptic posted an updated version of his plot which was now more transparent and honest, because he added another line for 2020 where he didn't omit the excess drug deaths from his plot (even though I don't necessarily agree with his attribution of the excess drug deaths in 2020 to COVID or the lockdowns): [https://x.com/EthicalSkeptic/status/1846690187297919231]

Plot for X44 drug deaths

Ethical Skeptic included this plot in one of his articles: [https://theethicalskeptic.com/2024/04/04/the-state-of-things-pandemic-week-50-2023/]

Chart 4c - Excess Mortality from Overdose from Unspecified Drug and Climate Change - Yes, the combination of lockdowns and Covid-19 resulted in a temporary increase in undetermined drug overdoses. However, this is not 'trend' data. The trend begins at the end of 2020 with the introduction of the vaccine (which peaks the very week of vaccine maximum uptake). Vaccine-induced myocarditis rendered this population vulnerable to sudden cardiac death - which was then falsely attributed to drug overdose. This excess attribution, as of Week 25 2023, is in the range of 270-300 deaths per week, for 'unspecified drug' and 'climate'. A query-determined non-overlapping portion of this tally is added back into the Sudden Cardiac Death mortality totals show in DFT Chart 10 below. It suggests an alarming increase in young person Sudden Cardiac Death.

In the year 2021, I got 556 deaths for MCD X30 ("Exposure to excessive natural heat (hyperthermia)") but 35,671 deaths for MCD X44 ("Accidental poisoning by and exposure to other and unspecified drugs, medicaments and biological substances"). So in the plot above the deaths associated with hyperthermia have little impact on total mortality.

There's about as many deaths with MCD X42 ("Accidental poisoning by and exposure to narcotics and psychodysleptics [hallucinogens], not elsewhere classified") as MCD X44 ("Accidental poisoning by and exposure to other and unspecified drugs, medicaments and biological substances"). So I don't know why Ethical Skeptic only included X44 deaths in his plot. But it might be because X42 had a higher peak in 2020 than 2021 and the peak in 2020 cannot be attributed to the vaccines, but X44 had a higher peak in 2021 than 2020 so it's easier to blame the deaths on vaccines. Or it might also be because X44 had about 81% more deaths in 2021-2022 than 2018-2019 but X42 only had about 50% more deaths in 2021-2022 than 2018-2019. Or it might also be because X44 deaths have a more flat trend in 2018-2019 so the level of deaths in 2021-2023 looks like it's far above the trend from his plot, but X42 has a trend that is more sloped upwards in 2018-2019 so the level of deaths in 2022 doesn't seem that far off the trend even if you're not seeing past data before 2018:

The plot by Ethical Skeptic started from 2018 and it only included X44 deaths, so it looked like there was an anomalous sustained increase in drug deaths in 2021-2023. But from my plot above where you can see a more long-term trend and where X44 deaths are also included, it looks like 2022 isn't that far off the long-term trend but rather the level of deaths in mid-2021 looks elevated, and there's still probably a lot of deaths missing in 2023 because drug-related deaths have a long registration delay.

Before COVID the long-term trend in drug deaths seemed to be curved upwards, so the 2015-2019 linear trend in my plot above might be too low, and in the plot below where I used a 2010-2019 polynomial trend instead, the X42 deaths were actually below the trend in 2022:

Effect of fitting a baseline against data from only 2018 and 2019

When I asked Ethical Skeptic how he calculates his baseline in the plots which show weekly data from 2018 onwards, he posted this reply: [https://theethicalskeptic.substack.com/p/the-state-of-things-pandemic-week-706/comment/53292593]

I'm not analyzing generational or population-diluted trends (potentially misleading statistics), I am detecting deviation from trend dynamics (detecting system inflection).

I cannot use Wonder 2017 and earlier because the numbers are boosted by the small county single-record effect. This is why Wonder 2018 and beyond is a new database (shows as a variance in your chart which does not actually exist), under that HIPAA constraint. Nor can I use 2002 - 2017 data because this contain a dissipating generational effect of those Boomers who suffered cardiac arrest from specific past health impacts which are no longer salient to this age bracket. That trend line must be moved to the 65+ bracket now. This has caused a slight downtrend into 2019, which I do accommodate, yes.

I compared that baseline to a linear regression of the old growth rate (2014-2017) before hand to ensure it was 'in context'. However, I did not use a pure linear regression of 2018 and 19 for this reason. I fit the baseline to the variances linearly, but not by least-squares regression (how I avoided the early 2018 surge). There is of course not enough data to then use a non-linear function beyond that as well. That would be whipsawing fiction.

But much of this is moot, because none of these manipulations serve to make this dramatic an inflection disappear. These are trend manipulations. Four years from now, trend models will have to be adjusted again, but the inflection will still be there. The argument will be over just how much excess death there indeed is... but I anticipate that it will be moot. If you have a strong inflection like this and your trend analysis makes it disappear quickly, odds are the trend analysis is biased.

As a result, I don't have either a Jan 2018 high or 2019 lowering bias in my baseline.

I don't understand how his method of calculating a linear baseline is different from a regular linear regression or how he accounted for the low number of deaths in 2019, but he blocked me before I was able to ask him to clarify it.

There was an exceptionally low number of deaths in 2019 and there was a high number of deaths in early 2018, so if you fit a regular linear baseline using only data from the years 2018 and 2019, the slope of the baseline will usually be too low (but I don't know to what extent Ethical Skeptic's undocumented regression method suffers from the same problem):

Ethical Skeptic included the following point on his list of crimes committed by other analysts: "16. Add volatility of peak winter mortality periods into the trend regression, and not index off the summer base. That way a rough flu year will raise the baseline artificially to your advantage." [https://x.com/EthicalSkeptic/status/1773066903000396096] So I don't know if it means that he omits weeks with high mortality in early 2018 when he calculates the baseline.

The plot below shows that when I fitted a seasonality-adjusted baseline against only one year of weekly data so that the baseline was adjusted to match the actual deaths each week, there was 0% excess deaths during each week of the fitting period, so then the standard deviation of the residuals during the fitting period was zero, so I got infinite z-scores when I divided excess deaths with the standard deviation of the residuals. And similarly when I fitted a seasonality-adjusted baseline against only two years of data, I subsequently got higher z-scores than with a 4-year fitting period. The plot also demonstrates that shorter seasonality-adjusted baselines tend to produce a lower standard deviation for the residuals during the fitting period, and they also tend to produce higher z-scores after the fitting period:

library(ggplot2)

isoweek=\(year,week,weekday=1){d=as.Date(paste0(year,"-1-4"));d-(as.integer(format(d,"%w"))+6)%%7-1+7*(week-1)+weekday}
ma=\(x,b=1,f=b)rowMeans(embed(c(rep(NA,b),x,rep(NA,f)),f+b+1),na.rm=T)

t=read.csv("https://www.mortality.org/File/GetDocument/Public/STMF/Outputs/NZL_NPstmfout.csv")|>subset(Sex=="b")
t$date=isoweek(t$Year,t$Week,4)

xy=do.call(rbind,lapply(c(2011,2012,2014),\(i){
  pred=t$Year%in%2011:i
  trend=predict(lm(Total~date,t[pred,]),t)
  weekly=tapply(t$Total[pred]-trend[pred],t$Week[pred],mean)
  ma(rep(weekly,3),0)[53:154]
  o=data.frame(x=t$date,z=paste0("2011-",i),dead=t$Total)
  o$trend=trend+weekly[pmin(t$Week,52)]
  o$y=ma(o$dead-o$trend,1)
  o$sd=sd(o$dead[pred]-o$trend[pred])
  o}))

xstart=as.Date("2011-1-1");xend=as.Date("2016-1-1")
xy=xy[xy$x>=xstart&xy$x<=xend,]

xbreak=seq(xstart,xend,"6 month")
xlab=c(rbind("",2011:2015),"")
cand=c(sapply(c(1,2,5),\(x)x*10^c(-10:10)))
ymax=max(xy$y,na.rm=T);ymin=min(xy$y,na.rm=T)
ystep=cand[which.min(abs(cand-(ymax-ymin)/5))]
ystart=ystep*floor(ymin/ystep)
yend=ystep*ceiling(ymax/ystep)
ybreak=seq(ystart,yend,ystep)

color=c(hcl(c(210,260,310)+15,100,50))

sd=sprintf("%.1f",tapply(xy$sd,xy$z,unique))
sigma=c("Inf",sprintf("%.1f",tapply(abs(xy$y/xy$sd),xy$z,max))[-1])
lab=paste0(unique(xy$z)," (SD ",sd,", max abs sigma ",sigma,")")
xy$z=lab[factor(xy$z)]

tit="Weekly excess deaths in New Zealand. The baseline is a linear regression which was adjusted for seasonality by for example adding the average difference between the actual deaths and the baseline on each week 1 to the baseline for week 1, and similarly for other weeks."

ggplot(xy,aes(x=x,y=y,color=z))+
geom_hline(yintercept=c(ystart,0,yend),linewidth=.25)+
geom_vline(xintercept=c(xstart,xend),linewidth=.25)+
geom_line(aes(color=z),linewidth=.3)+
labs(title=stringr::str_wrap(tit,85),x=NULL,y=NULL)+
coord_cartesian(clip="off",expand=F)+
scale_x_date(limits=c(xstart,xend),breaks=xbreak,labels=xlab)+
scale_y_continuous(limits=c(ystart,yend),breaks=ybreak)+
scale_color_manual(values=color)+
guides(colour=guide_legend(title="Fitting period",override.aes=list(linewidth=.4)))+
theme(axis.text=element_text(size=6.5,color="black"),
  axis.ticks=element_line(linewidth=.25),
  axis.ticks.x=element_line(color=alpha("black",c(1,0))),
  axis.ticks.length=unit(.15,"lines"),
  axis.title=element_text(size=8),
  legend.background=element_blank(),
  legend.box.just="left",
  legend.key=element_rect(fill="white"),
  legend.spacing.x=unit(.15,"lines"),
  legend.key.size=unit(.8,"lines"),
  legend.position=c(0,0),
  legend.justification=c(0,0),
  legend.box.background=element_rect(fill=alpha("white",.85),color="black",linewidth=.25),
  legend.margin=margin(.3,.4,.3,.4,"lines"),
  legend.text=element_text(size=7,vjust=.5),
  legend.title=element_text(size=8),
  panel.background=element_rect(fill="white"),
  panel.grid=element_blank(),
  plot.background=element_rect(fill="white"),
  plot.margin=margin(.4,.65,.4,.4,"lines"),
  plot.caption=element_text(size=6.7,hjust=0,margin=margin(.6,0,0,0,"lines")),
  plot.title=element_text(size=7.8))
ggsave("1.png",width=4.6,height=3.4,dpi=450)

Ethical Skeptic has published three articles in a series called "Houston We Have a Problem" where he explains some of his methodology. [https://theethicalskeptic.com/2024/04/04/the-state-of-things-pandemic-week-50-2023/] In the third part he included the image below where he described how he calculates his baseline, but it sounds like he's just doing a regular linear regression and he's not applying any special method to account for the low number of deaths in 2019 or for the high number of deaths in early 2018 (even though here he seems to describe the methodology he uses to make his plots that start from 2014, so it could be that he uses different methodology in the plots that start from 2018):

COVID deaths per capita compared to percentage of vaccinated population in US counties

Ethical Skeptic posted these tweets: [https://x.com/EthicalSkeptic/status/1531517512906592256]

Covid Deaths per 100 K population trend down as the percentage of the population vaccinated rises.

Vaccines working right?

No, this is the function which existed prior to the vaccines being rolled out (22 Jan 2020 - 4 Apr 2021).

Watch for charlatans exploiting this effect.

In actuality, when we compare both periods before & after the launch of the vax (MMWR Wk 14 2021) - we observe that there exists no difference between pre & post vax death rates by US county. It is EXACTLY the same function pre & post vaccine.

Any benefit MUST show here.

However for some reason he extended his pre-vaccination period up to April 4th 2021, even though about 30% of the US population was already vaccinated at that point (and the majority of people were vaccinated in the elderly age groups which account for most COVID deaths):

$ curl https://data.cdc.gov/api/views/rh2h-3yt2/rows.csv>statesvax.csv
$ csvtk filter2 -f '$Date=="04/03/2021"&&$date_type=="Report"&&$Location=="US"' statesvax.csv|csvtk cut -f Date,Location,Administered_Dose1_Pop_Pct
Date,Location,Administered_Dose1_Pop_Pct
04/03/2021,US,31.4

He also didn't explain how he calculated the curved trend line, and it looks too low near the end of the x-axis, and for some reason he only drew the trend up to the point where the value of the x-axis was about 85%.

In the plot below I tried plotting the percentage of vaccinated population on March 1st 2022 on the x-axis like Ethical Skeptic, but I compared COVID deaths in 2020 against COVID deaths in 2021-2022 so that I didn't include the first three months of 2021 as part of the pre-vaccination period. When I splitted a spline trend to the data, it was a lot steeper in 2021-2022 than 2020. And even in 2020 the overall trend looks more flat in my plot, because I drew the trend line up to the end of the x-axis and I didn't smooth out the trend line as heavily as Ethical Skeptic. The trend got even flatter when I weighted it by population size, because larger counties have a higher percentage of vaccinated people than smaller counties, and in 2020 there were many large northeastern counties that had a high number of COVID deaths per capita:

library(data.table);library(ggplot2)

# download.file("https://github.com/nytimes/covid-19-data/raw/master/us-counties-2020.csv","us-counties-2020.csv")
# download.file("https://github.com/nytimes/covid-19-data/raw/master/us-counties-2022.csv","us-counties-2022.csv")
# download.file("https://www2.census.gov/programs-surveys/popest/datasets/2020-2022/counties/totals/co-est2022-alldata.csv","uscountypop.csv")

# download manually:
# https://data.cdc.gov/Vaccinations/COVID-19-Vaccinations-in-the-United-States-County/8xkx-amqh/about_data

t=fread("COVID-19_Vaccinations_in_the_United_States_County_20240227.csv")
t=t[Date=="03/01/2022",c("FIPS","Administered_Dose1_Recip")]
colnames(t)=c("fips","vax")
t$fips=as.integer(t$fips)

pop=fread("uscountypop.csv")[,.(fips=as.integer(sprintf("%d%03d",STATE,COUNTY)),pop=POPESTIMATE2022)]

nyt=fread("us-counties-2022.csv")[date=="2022-12-31",.(fips,deaths)]
nyt2=fread("us-counties-2020.csv")[date=="2020-12-31",.(fips,deaths)]
nyt=merge(nyt,nyt2,by="fips")[,.(fips,deaths=(deaths.x-deaths.y)/2)]|>na.omit()

states=read.csv("https://github.com/cphalpert/census-regions/raw/master/us%20census%20bureau%20regions%20and%20divisions.csv")[,c(2,3)]
fips=read.csv("https://gist.githubusercontent.com/dantonnoriega/bf1acd2290e15b91e6710b6fd3be0a53/raw/11d15233327c8080c9646c7e1f23052659db251d/us-state-ansi-fips.csv",strip.white=T)[,c(3,2)]
states=merge(states,fips,by=1)
states=setNames(states[,2],states[,3])

me=merge(merge(t,nyt),pop)[vax>0]|>na.omit()
xy=data.frame(x=pmin(100,me$vax/me$pop*100),y=pmin(500,me$deaths/me$pop*1e5),pop=me$pop)
xy$size=pmax(.3,log(me$pop,20)-2.8)
xy$region=factor(states[sub("...$","",me$fips)],c("Northeast","Midwest","South","West"))

xstart=10;xend=100;xstep=10;ystart=0;yend=500;ystep=100
xbreak=seq(xstart,xend,xstep)
ybreak=seq(ystart,yend,ystep)

smooth1=as.data.frame(predict(smooth.spline(xy$x,xy$y,spar=.85,w=xy$pop/max(xy$pop)),seq(0,100,.1)))
smooth2=as.data.frame(predict(smooth.spline(xy,spar=.85),seq(0,100,.1)))

leg1=data.frame(x=11.4,y=seq(yend*.94,,-yend/15,2),label=c("Smoothed spline (weighted by population size, spar=.85)","Smoothed spline (not weighted, spar=.85)"))
leg2=data.frame(x=xend-1.4,y=seq(yend*.94,,-yend/15,4),label=levels(xy$region))

pal1=c("#ee66aa","#993377")
pal2=c(hcl(225,100,50),hcl(55,50,50),"black",hcl(135,100,50))

ggplot(xy,aes(x,y))+
geom_label(data=leg1,aes(x=x,y=y,label=label),fill=alpha("white",.85),label.r=unit(0,"lines"),label.padding=unit(.04,"lines"),label.size=0,size=2.3,hjust=0,color=pal1)+
geom_label(data=leg2,aes(x=x,y=y,label=label),fill=alpha("white",.85),label.r=unit(0,"lines"),label.padding=unit(.04,"lines"),label.size=0,size=2.3,hjust=1,color=pal2)+
geom_vline(xintercept=c(xstart,xend),linewidth=.25,lineend="square")+
geom_hline(yintercept=c(ystart,yend),linewidth=.25,lineend="square")+
geom_point(aes(color=region),stroke=0,size=xy$size,alpha=.3)+
geom_line(data=smooth1,color=pal1[1],linewidth=.35)+
geom_line(data=smooth2,color=pal1[2],linewidth=.35)+
labs(x="Percentage of vaccinated population on March 1st 2022",y="COVID deaths per 100k person-years in 2020",title="US counties: COVID deaths vs percentage of vaccinated population on March 1st 2022",caption="Point size indicates population size. Y-axis values are truncated to 500. Population estimates are for mid-2022. Sources:
data.cdc.gov/Vaccinations/COVID-19-Vaccinations-in-the-United-States-County/8xkx-amqh/about_data,
www2.census.gov/programs-surveys/popest/datasets/2020-2022/counties/totals/co-est2022-alldata.csv,
github.com/nytimes/covid-19-data.")+
scale_x_continuous(limits=c(xstart,xend),breaks=xbreak)+
scale_y_continuous(limits=c(ystart,yend),breaks=ybreak)+
scale_color_manual(values=pal2)+
coord_cartesian(clip="off",expand=F)+
theme(axis.text=element_text(size=5.5,color="black"),
  axis.ticks=element_line(linewidth=.25),
  axis.ticks.length=unit(.15,"lines"),
  axis.title=element_text(size=6.5),
  legend.position="none",
  panel.grid=element_blank(),
  panel.background=element_rect(fill="white"),
  plot.margin=margin(.4,.8,.4,.4,"lines"),
  plot.caption=element_text(size=6.3,hjust=0),
  plot.title=element_text(size=7.3))
ggsave("1.png",width=5.2,height=3.1,dpi=450)

Does the provisional dataset have different suppression behavior than the final dataset?

Ethical Skeptic wrote: "I cannot use Wonder 2017 and earlier because the numbers are boosted by the small county single-record effect. This is why Wonder 2018 and beyond is a new database (shows as a variance in your chart which does not actually exist), under that HIPAA constraint." [https://theethicalskeptic.substack.com/p/the-state-of-things-pandemic-week-706/comment/53292593]

He also wrote: "Please note however, that the provisional mortality figures for 2018-2024 are lower than the figures prior to 2018 because the NVSS suppresses county level data with fewer than nine records. We do not adjust 2018 and later mortality figures upward for this. Therefore, these excess mortality projections are lower than the reality as a result. For this reason, all inflection and DFT charts begin with the 2018 suppressed data, in order to avoid false 'downtrends' or inflections in the data." [https://theethicalskeptic.com/2024/04/04/the-state-of-things-pandemic-week-50-2023/]

The link in the text I quoted was a screenshot of this part of the help article for the provisional deaths dataset: [https://wonder.cdc.gov/wonder/help/mcd-provisional.html]

1. What are the Assurance of Confidentiality constraints for vital statistics?

All statistics representing one through nine (1-9) persons are suppressed, in the provisional mortality online database for years 2018 and later.

Data reports for years 1989 and later must meet the NCHS data use restrictions. Vital statistics data are suppressed due to confidentiality constraints, in order to protect personal privacy. The term "Suppressed" replaces sub-national death counts, births counts, death rates and associated confidence intervals and standard errors, as well as corresponding population figures, when the figure represents one to nine (1-9) persons.

Ethical Skeptic may have been confused by the text which said that the deaths are suppressed "in the provisional mortality online database for years 2018 and later", as if there would have been a change in policy in the year 2018, even though I think it just means that the deaths are suppressed throughout the provisional mortality dataset which starts in 2018. Similar text is also included in the help article for the final mortality data in 1999-2020, which says that the suppression starts in the year 1999: "The term 'Suppressed' replaces death counts, births counts, death rates and associated confidence intervals and standard errors, as well as corresponding population figures, when the figure represents one to nine (1-9) persons for deaths in 1999 and later years." [https://wonder.cdc.gov/wonder/help/mcd.html]

CDC WONDER suppresses the number of deaths on rows with 1-9 deaths both in the datasets for provisional and final deaths, and I haven't noticed any differences in the suppression behavior between the datasets. For example I did a request for provisional mortality statistics here: https://wonder.cdc.gov/mcd-icd10-provisional.html. I grouped the results by residence county and month, I set the state of residence to Rhode Island, I set the year to 2020, and I set underlying cause to intentional self-harm (X60-X84). Rhode Island has 5 counties, but 2 of them had 1-9 deaths so the number of deaths was suppressed, but the suppressed counties were still included in the total number of deaths across all counties. And I got identical results when I did a similar request in the 1999-2020 final dataset:

If there was some major change to the data suppression behavior in the dataset that starts in 2018, then I think it would be highlighted more clearly in the documentation.

Ethical Skeptic wrote that the reason why "Wonder 2018 and beyond is a new database" was because of the change to the suppression behavior. But actually it might be because the data from 2018 onwards includes more race categories than the earlier data, and it includes weekly data in addition to monthly data, and it allows grouping the results by location of occurrence and not only location of residence. There's also a separate dataset for 1999-2004 which only includes 3 different race groups, where monthly deaths are missing, and where Hispanic origin is not indicated:

The natality data at CDC WONDER is also split off into different datasets for different ranges of years, which have different sets of racial categories and other available variables: [https://wonder.cdc.gov/natality.html]

Deaths associated with pregnancy and childbirth

Ethical Skeptic posted the following plot where there was a spike in pregnancy-associated deaths that coincided with the Delta wave, but he suggested that it might have been caused by vaccines instead of COVID. He also got a steadily decreasing number of excess deaths in 2022-2023, but he suggested it might have been because the birth rate was reduced by vaccines: [https://x.com/EthicalSkeptic/status/1778268480619331786]

Not particularly good news here.

1. We don't use the 'placental disorder unspecified' like the UK.

2. Expanding query to all Maternal deaths O00-O99 for Pregnancy, Childbirth & Puerperium reveals

a. large vaccine impact
b. distressing news about implied childbirth rate

Ethical Skeptic told me that in these plots which display weekly deaths from 2018 onwards, he fits the baseline using only data from 2018 and 2019. He said that his baseline is linear but it's somehow different from a regular least squares linear regression, but he didn't explain how it's different. [https://theethicalskeptic.substack.com/p/the-state-of-things-pandemic-week-706/comment/53237226]

The year 2019 had an unusually low number of deaths, so normally if you fit a linear baseline against only data from 2018 and 2019, the slope of the baseline is pointed too much downwards. However in the case of deaths with an underlying cause related to pregnancy and childbirth, 2019 actually had a higher number of deaths than 2018, so in the plot below when I fitted a linear baseline against deaths from 2018 and 2019 only, the slope of the baseline was pointed too much upwards compared to a longer-term linear trend. So it probably also explains why Ethical Skeptic's plot had a steadily decreasing trend in excess deaths in 2022-2023. But when I used a 2013-2019 baseline instead, my excess deaths remained roughly flat in 2022-2023:

library(ggplot2)

t=read.csv("http://sars2.net/f/wonderpregnancymonthly.csv")

x=as.Date(paste0(t$year,"-",t$month,"-15"))
xy=data.frame(x,y=t$dead,z="Actual deaths")

trend1=predict(lm(y~x,xy[data.table::year(xy$x)%in%2018:2019,]),xy)
trend2=predict(lm(y~x,xy[data.table::year(xy$x)%in%2013:2019,]),xy)

rbd=\(x,...)rbind(x,setNames(data.frame(...),names(x)))
xy=rbd(xy,x,trend1,"2018-2019 linear regression")
xy=rbd(xy,x,trend2,"2013-2019 linear regression")
xy=rbd(xy,x,t$dead-trend1,"Excess deaths (2018-2019)")
xy=rbd(xy,x,t$dead-trend2,"Excess deaths (2013-2019)")

xy$z=factor(xy$z,unique(xy$z))

cand=c(sapply(c(1,2,5),\(x)x*10^c(-10:10)))
ymax=max(xy$y,na.rm=T);ymin=min(xy$y,na.rm=T)
ystep=cand[which.min(abs(cand-(ymax-ymin)/6))]
yend=ystep*ceiling(ymax/ystep)
ystart=ystep*floor(ymin/ystep)
ybreak=seq(ystart,yend,ystep)

xstart=as.Date("1999-1-1");xend=as.Date("2024-1-1")
xbreak=seq(xstart,xend,"6 month")
xlab=c(rbind("",seq(1999,2023)),"")

color=c("black",hcl(210,60,60),hcl(260,100,40))

kim=\(x)ifelse(x>=1e3,ifelse(x>=1e6,paste0(x/1e6,"M"),paste0(x/1e3,"k")),x)

ggplot(xy,aes(x,y))+
geom_vline(xintercept=seq(xstart,xend,"year"),color="gray93",linewidth=.25,lineend="square")+
geom_vline(xintercept=as.Date(paste0(c(2013,2018,2020),"-1-1")),color="gray50",linetype=2,linewidth=.25,lineend="square")+
geom_hline(yintercept=c(ystart,0,yend),color="black",linewidth=.25,lineend="square")+
geom_vline(xintercept=c(xstart,xend),color="black",linewidth=.25,lineend="square")+
geom_line(aes(y=y,linetype=z,color=z,alpha=z,linewidth=z))+
labs(title=stringr::str_wrap("CDC WONDER: Monthly deaths with underlying cause O00-O99 (Pregnancy, childbirth and the puerperium)",70),x=NULL,y=NULL)+
coord_cartesian(clip="off",expand=F)+
scale_x_continuous(limits=c(xstart,xend),breaks=xbreak,labels=xlab)+
scale_y_continuous(limits=c(ystart,yend),breaks=ybreak)+
scale_color_manual(values=color[c(1,2,3,2,3)])+
scale_linetype_manual(values=c(1,2,2,1,1))+
scale_linewidth_manual(values=c(.3,.3,.3,.23,.23))+
scale_alpha_manual(values=c(1,1,1,.6,.6))+
theme(axis.text=element_text(size=7,color="black"),
  axis.ticks.x=element_line(color=alpha("black",c(1,0))),
  axis.text.x=element_text(angle=90,vjust=.5,hjust=1),
  axis.ticks=element_line(linewidth=.25,color="black"),
  axis.ticks.length=unit(.2,"lines"),
  axis.title=element_text(size=8),
  legend.background=element_blank(),
  legend.box.just="left",
  legend.key=element_rect(fill="white"),
  legend.spacing.x=unit(.15,"lines"),
  legend.key.size=unit(.65,"lines"),
  legend.key.width=unit(.8,"lines"),
  legend.position=c(0,1),
  legend.justification=c(0,1),
  legend.box.background=element_rect(fill="white",color="black",linewidth=.3),
  legend.margin=margin(-.2,.3,.2,.3,"lines"),
  legend.text=element_text(size=7,vjust=.5),
  legend.title=element_blank(),
  panel.background=element_rect(fill="white"),
  plot.margin=margin(.4,.4,.4,.4,"lines"),
  plot.title=element_text(size=7.7,margin=margin(.1,0,.6,0,"lines")))
ggsave("1.png",width=4.2,height=2.8,dpi=450)

In the previous plot the biggest spike in deaths was in the third quarter of 2021, so it's probably associated with the Delta wave in August-September 2021. Southeastern states had the highest excess mortality from all causes during the Delta wave, but from plot below which shows excess mortality for causes related to childbirth, you can see that in the third quarter of 2021 the excess mortality was by far the highest in southern states. But in northeastern states which had a high percentage of vaccinated people, the excess mortality in 2021 was by far the lowest:

library(colorspace)

t=read.csv("http://sars2.net/f/wonderpregnancyregionquarter.csv")
y=read.csv("http://sars2.net/f/wonderpregnancyregionyearly.csv")|>subset(year%in%2013:2019)

dim=data.frame(cause=y$cause,region=factor(y$region,unique(y$region)),date=y$year)
dim=rbind(dim,data.frame(cause=t$cause,region=t$region,date=paste0(t$year,"Q",t$quarter)))

a=tapply(c(y$dead,t$dead),dim,sum)

for(i in 1:2){
  m=a[i,,]
  m[,1:7]=m[,1:7]/4
  ave=rowMeans(m[,1:7])
  disp=round((m/ave-1)*100)
  m=(m-ave)/ifelse(m>ave,ave,m)

  maxcolor=2
  pal=hex(HSV(c(210,210,210,210,0,0,0,0,0),c(1,.8,.6,.3,0,.3,.6,.8,1),c(.3,.65,1,1,1,1,1,.65,.3)))

  pheatmap::pheatmap(m,filename=paste0("i",i,".png"),display_numbers=disp,
    cluster_rows=F,cluster_cols=F,legend=F,cellwidth=18,cellheight=18,fontsize=9,fontsize_number=8,
    border_color=NA,na_col="gray90",
    number_color=ifelse(abs(m)>maxcolor*.6,"white","black"),
    breaks=seq(-maxcolor,maxcolor,,256),
    colorRampPalette(pal)(256))
}

system("f=i1.png;w=`identify -format %w $f`;convert -interline-spacing -2 -gravity northwest -font Arial -pointsize 38 -size $[w-44]x \\( -splice 22x14 caption:'CDC WONDER: Excess percentage of deaths for cause O00-O99 (Pregnancy, childbirth and the puerperium). The baseline is the average number of deaths per quarter in 2013-2019, and all quarters have the same baseline which is not adjusted for seasonal variation in mortality.' \\) -pointsize 38 -font Arial-Bold \\( -splice 22x14 caption:'Underlying cause of death' \\) i1.png \\( -splice 22x14 caption:'Multiple cause of death' \\) i2.png -append 1.png")

If you look at the bottom half of the plot above which shows excess MCD deaths, for some reason the excess MCD deaths remained far below zero for most of 2020-2023. I don't know why, but there might have been some change to coding practices. I don't think I made any error in my plot either, because I got the similar results when I looked at yearly data in the whole US, where 2018 and 2019 had a much higher number of MCD deaths than 2020:

The plot below shows that in the southern states which had a high number of pregnancy-associated deaths in the third quarter of 2021, deaths with UCD COVID also accounted for a large percentage of all deaths in ages 15-44 in the third quarter of 2021. I included both sexes in the plot, because females alone would've had some months with less than 10 deaths per census region so the number of deaths would've been suppressed:

The plot below also shows that in the third quarter of 2021, the percentage of vaccinated people was the highest in the northeastern states which had a low percentage of excess deaths associated with pregnancy, but the percentage of vaccinated people was much lower in the southern states which had a high percentage of excess deaths associated with pregnancy (R code). So if the deaths were caused by the vaccines, then why wasn't there a higher percentage of excess deaths in the states where more people were vaccinated?

In Ethical Skeptic's plot for deaths associated with childbirth, there was a steady decrease in excess deaths in 2022 and 2023, but I think it's because he fitted his baseline using only data from 2018 and 2019 so the slope of his baseline was pointed too much upwards. He speculated that the decrease in excess deaths could've been because vaccines had caused a decrease in birth rate instead, but I don't think he's right, because in my next plot which shows the monthly number of births at CDC WONDER, the excess percentage of births relative to the prepandemic trend was about -3% in 2020 and 0% in 2021, but in 2022 and 2023 the number of births was actually above the trend:

t=read.csv("http://sars2.net/f/wondernatality.csv")
t$date=as.Date(paste0(t$year,"-",t$month,"-16"))
t$born=t$born/lubridate::days_in_month(t$date)

xstart=as.Date("2016-1-1");xend=as.Date("2024-1-1")
xbreak=seq(xstart,xend,"6 month")
xlab=c(rbind(NA,2016:2023),NA)
t=t[t$date>=xstart&t$date<xend,]
ylim=extendrange(c(t$born,t$trend))
ybreak=pretty(range(t$born))

t$trend=predict(lm(born~date,subset(t,date>=xstart&date<as.Date("2020-1-1"))),t)

yearly=aggregate(t[,c("born","trend")],t[,1,drop=F],sum)
yearly=sprintf("%.1f%%",(yearly$born/yearly$trend-1)*100)

png("1.png",1850,1100,res=300)
par(mar=c(4.2,3,2.1,.8),mgp=c(0,.6,0),adj=0,lend="square")

tit="CDC WONDER: monthly births divided by number of days in month"
sub="Source: wonder.cdc.gov/natality.html. The gray numbers show the yearly
excess percent relative to the baseline."
leg=c("Births per month","2016-2019 linear trend")

plot(t$date,t$born,type="n",main=tit,xlab=NA,ylab=NA,xaxs="i",yaxs="i",yaxt="n",xaxt="n",ylim=ylim,xlim=c(xstart,xend),cex.main=1)
mtext(text=sub,side=1,line=2.7,adj=0)

axis(1,at=xbreak,labels=xlab,tck=0,padj=-.6)
axis(2,at=ybreak,labels=paste0(ybreak/1e3,"k"),las=1)
abline(v=seq(xstart,xend,"year"),col="gray90",lwd=1)

lines(t$date,t$born,lwd=1.5)
points(t$date,t$born,pch=20,cex=.6)
lines(t$date,t$trend,type="l",lty=2,lwd=1.5)
text(xbreak[c(F,T)],ylim[1],yearly,col="gray60",hjust=.5,offset=.4,pos=3,cex=.9)
rect(xstart,ylim[1],xend,ylim[2],lwd=1)
legend("topright",legend=leg,lty=c(1,2),lwd=1.5)

dev.off()

Cancer ASMR compared to historical ASMR starting from 1950

Ethical Skeptic posted this tweet: [https://x.com/EthicalSkeptic/status/1803267649385668791]

I've taken a look at the cancer long term Age Standardized Per Capita stats, aannd....

SV40 cancers only took 50 years to work their way out of the population. Just a blip on an epochal chart really.

Just a minor spike the last 3 years as well. Nothing to worry about.

Really. Honestly. For sure... 🤥

Based on the caption of the plot above, Ethical Skeptic seems to have taken data for 2020 onwards from CDC WONDER but data before 2020 from a dataset at Statista.com. The dataset at Statista.com was behind a paywall so I wasn't able to download it. The description of the dataset said: [https://www.statista.com/statistics/184566/deaths-by-cancer-in-the-us-since-1950/]

This statistic was assembled from several editions of "Health, United States".
* Malignant neoplasms.
Numbers are for malignant neoplasms. All rates are age-adjusted. Age-adjusted rates are calculated using the year 2000 standard population. Prior to 2001, age-adjusted rates were calculated using standard million proportions based on rounded population numbers. Starting with 2001 data, unrounded population numbers are used to calculate age-adjusted rates.

In Ethical Skeptic's plot the ASMR decreased until 2021, but the ASMR increased from 2021 to 2022, from 2022 to 2023, and from 2023 to 2024. However in the plot below where I used data from CDC WONDER, I wasn't able to reproduce the increase in ASMR since 2021, because got lower ASMR in 2022 and 2023 than 2021 regardless of whether I looked at only malignant neoplasms or all neoplasms:

library(data.table);library(ggplot2)

t=fread("http://sars2.net/f/wondercanceryearly.csv")[,cause:=sub("^UCD ","",cause)]
pop=fread("http://sars2.net/f/uspopdead.csv")[,dead:=NULL]
t=merge(t,pop)
t=merge(pop[year==2020,.(age,std=pop/sum(pop))],t)

p=t[year%in%2010:2023,.(asmr=sum(dead/pop*std)*1e5),.(cause,year)]
p[,cause:=factor(cause,sort(unique(p$cause),T))]

cand=c(sapply(c(1,2,5),\(x)x*10^c(-10:10)))
ymin=min0(p$asmr);ymax=max0(p$asmr)
ystep=cand[which.min(abs(cand-(ymax-ymin)/6))]
ystart=ystep*floor(ymin/ystep)
yend=ystep*ceiling(ymax/ystep)
ybreak=seq(ystart,yend,ystep)

xstart=2010;xend=2023
xbreak=seq(xstart-.5,xend+.5,.5)
xlab=c(rbind("",xstart:xend),"")

sub="The age-standardized mortality rates were calculated by single year of age so that the
2020 resident population estimates were used as the standard population. Mid-year
resident population estimates are from www2.census.gov/programs-surveys/popest/
datasets/{2010-2020/national/asrh/nc-est2020-agesex-res.csv,2020-2023/national/
asrh/nc-est2023-agesex-res.csv}. In order to get rid of a sudden jump in the
population estimates at the point when the estimates were merged, the 2010-2020
estimates were multiplied by a linear slope so that they were equal to the
2020-2023 estimates in 2020. The data was retrieved from CDC WONDER in October
2024, when only a small number of deaths were still missing in 2023 because of
a registration delay."

ggplot(p,aes(x=year,y=asmr,color=cause))+
geom_vline(xintercept=seq(xstart-.5,xend+.5,5),linewidth=.3,color="gray80")+
geom_vline(xintercept=c(xstart-.5,xend+.5),linewidth=.3,lineend="square")+
geom_hline(yintercept=c(ystart,yend),linewidth=.3,lineend="square")+
geom_line(linewidth=.3)+
geom_point(size=.4)+
labs(x=NULL,y="Deaths per 100,000 people",title="CDC WONDER: ASMR for underlying cause of death cancer",subtitle=sub)+
scale_x_continuous(limits=c(xstart-.5,xend+.5),breaks=xbreak,labels=xlab)+
scale_y_continuous(limits=c(ystart,yend),breaks=ybreak)+
scale_color_manual(values=c("black","gray60"))+
coord_cartesian(clip="off",expand=F)+
theme(axis.text=element_text(size=6.5,color="black"),
  axis.ticks=element_line(linewidth=.3),
  axis.ticks.x=element_line(color=alpha("black",c(1,0))),
  axis.ticks.length=unit(3,"pt"),
  axis.title=element_text(size=7),
  legend.background=element_blank(),
  legend.box.background=element_rect(fill="white",color="black",linewidth=.3),
  legend.box.just="left",
  legend.justification=c(1,1),
  legend.key.width=unit(15,"pt"),
  legend.key.height=unit(11,"pt"),
  legend.key=element_rect(fill=alpha("white",0)),
  legend.margin=margin(-2,5,4,5),
  legend.position=c(1,1),
  legend.spacing.x=unit(.15,"lines"),
  legend.text=element_text(size=7,vjust=.5),
  legend.title=element_blank(),
  panel.background=element_blank(),
  panel.grid=element_blank(),
  plot.margin=margin(5,5,5,5),
  plot.subtitle=element_text(size=6.6,margin=margin(,,4)),
  plot.title=element_text(size=7.4,face=2,margin=margin(2,,4)))
ggsave("1.png",width=4,height=3.4,dpi=400*4)
system("magick 1.png -resize 25% 1.png")

In the plot above I used population estimates that I downloaded from the website of the US Census Bureau, because CDC WONDER doesn't return population estimates for ages 85 and above and it still uses 2022 population estimates for 2023. I used vintage 2023 estimates for 2020-2023 and vintage 2020 estimates for 2010-2019.

However in the next plot I used population estimates returned by CDC WONDER and I calculated ASMR by 10-year age groups and not by single year of age. CDC WONDER doesn't return population sizes for the oldest age groups when the results are grouped by 5-year age groups or by single year of age, but when the results are grouped by 10-year age groups then the oldest age group is 85+ which does have population sizes available. So therefore Ethical Skeptic might have calculated the ASMR by 10-year age groups in case he simply relied on the population sizes returned by CDC WONDER.

As of October 2024, CDC WONDER still used 2022 population estimates for 2023, which overestimates the CMR of elderly age groups in 2023 because the population sizes of elderly age groups were higher in 2023 than 2022, and therefore it also overestimates the total ASMR in 2023.

CDC WONDER also uses vintage 2021 population estimates for 2021, vintage 2022 estimates for 2022, and so on.

However even when I used the population sizes returned by CDC WONDER, I still wasn't able to reproduce the plot by Ethical Skeptic, because I got lower ASMR in 2022 and 2023 than in 2021 regardless of whether I looked at UCD malignant neoplasms or UCD neoplasms:

library(data.table);library(ggplot2);library(stringr)

t=fread("http://sars2.net/f/nzpopdead.csv")
a=t[,.(dead=sum(dead),pop=sum(pop,na.rm=T)),.(age=pmin(age,95),year)]

nzcmr=fread("http://sars2.net/f/nzcmr.csv")
p=nzcmr[year>=1970,.(x=year,y=cmr,z="Actual CMR")]

lm=a[year%in%2010:2019,.(year=2010:2023,trend=predict(lm(dead/pop~year),.(year=2010:2023))),age]
p=rbind(p,merge(a,lm)[,.(dead=sum(trend*pop),pop=sum(pop)),year][,.(y=dead/pop*1e5,z="2010-2019 trend in CMR by single year of age multiplied by population size"),.(x=year)])

p=rbind(p,nzcmr[year%in%1970:2019,.(x=1970:2023,y=predict(lm(cmr~year),.(year=1970:2023)),z="1970-2019 linear trend in CMR (like Ethical Skeptic's B baseline)")])

owid=a[,.(dead=sum(dead)),year][year%in%2015:2019,.(year=2010:2023,owid=predict(lm(dead~year),.(year=2010:2023)))]
p=rbind(p,merge(a[,.(pop=sum(pop)),year],owid)[,.(x=year,y=owid/pop*1e5,z="2015-2019 linear trend in deaths (like OWID)")])

p$z=factor(p$z,unique(p$z))

xstart=1970;xend=2023

p=p[x>=xstart&x<=xend]

cand=c(sapply(c(1,2,5),\(x)x*10^c(-10:10)))
ymin=min(p$y,na.rm=T);ymax=max(p$y,na.rm=T)
ystep=cand[which.min(abs(cand-(ymax-ymin)/6))]
yend=ymax
ystart=ystep*floor(ymin/ystep)
ybreak=seq(ystart,yend,ystep)
ystart=ymin

color=c("black","#00bb00","#0055ff","#ee0000",hsv(30/36,1,.8))

cap="    Source: infoshare.stats.govt.nz."
cap=paste0(cap,"\n    ","In the green baseline, for example the CMR in 2023 was calculated by first calculating a trend in CMR for each age in 2010-2019, then the trend projected to 2023 was multiplied with the population size of each age in 2023 to get the expected deaths for each age group, and the expected deaths of all age groups were added together and then divided by the population size in 2023."|>str_wrap(90))
cap=paste0(cap,"\n    ","The red line was calculated by doing a linear regression for the raw number of deaths and by then dividing the projected trend with the population size to convert it to CMR."|>str_wrap(90))

ggplot(p,aes(x,y))+
geom_vline(xintercept=c(xstart-.5,xend+.5),linewidth=.3,lineend="square")+
geom_hline(yintercept=c(ystart,yend),linewidth=.3,lineend="square")+
geom_line(aes(color=z),linewidth=.4)+
geom_point(data=p[grep("Actual",z)],aes(color=z),size=.5,show.legend=F)+
labs(title="Crude mortality rate in New Zealand (deaths per 100,000 people)",subtitle=cap,x=NULL,y=NULL)+
scale_x_continuous(limits=c(xstart-.5,xend+.5),breaks=seq(1900,2100,5))+
scale_y_continuous(limits=c(ystart,yend),breaks=ybreak)+
scale_color_manual(values=color)+
coord_cartesian(clip="off",expand=F)+
theme(axis.text=element_text(size=7,color="black"),
  axis.ticks=element_line(linewidth=.3),
  axis.ticks.length=unit(3,"pt"),
  legend.direction="vertical",
  legend.background=element_rect(fill=alpha("white",.8)),
  legend.key.height=unit(9,"pt"),
  legend.key.width=unit(15,"pt"),
  legend.key=element_rect(fill="white"),
  legend.box.spacing=unit(2,"pt"),
  legend.margin=margin(,,3),
  legend.justification=c(0,1),
  legend.position="bottom",
  legend.spacing.x=unit(1.5,"pt"),
  legend.text=element_text(size=7),
  legend.title=element_blank(),
  panel.grid.major=element_blank(),
  panel.background=element_rect(fill="white"),
  plot.margin=margin(3,5,4,4),
  plot.subtitle=element_text(size=6.7,margin=margin(0,0,4,0)),
  plot.title=element_text(size=7.5,face="bold",margin=margin(2,0,4,0)))
ggsave("1.png",width=4.2,height=3.7,dpi=400*4)
system("magick 1.png -resize 25% 1.png")

I don't know if Ethical Skeptic applied some of his usual adjustments to his plot, like his "excess MCD normalization" trick where he adds in a gradually increasing proportion of MCD deaths to his plot in addition to UCD deaths. Or maybe he included a part of deaths where the cause has not yet been resolved under cancer deaths. Or maybe he added in deaths that he estimated to be missing because of a registration delay, because he got even higher ASMR in 2024 than 2023.

Baseline that started from the 1970s for CMR in New Zealand

One of the terms invented by Ethical Skeptic is a "paltered" baseline, which means that 2020 or later years are included in the fitting period of a baseline. The baseline that OWID uses to calculate excess deaths is fitted against data from 2015 to 2019. [https://ourworldindata.org/excess-mortality-covid#how-is-excess-mortality-measured] However for some reason in this tweet Ethical Skeptic said that the baseline used by OWID was "paltered" even though it doesn't fit his usual definition of what the term paltered means: [https://twitter.com/EthicalSkeptic/status/1720267405883027598]

For some reason in the plot above Ethical Skeptic calculated CMR for segments of 5 years grouped together. The A baseline roughly looks like it's fitted against the last 5-year segment displayed in the plot (or it's hard to tell if the last segment is actually a 4-year segment like in my plot below). But OWID's baseline is not even calculated based on CMR but based on the raw number of deaths. His plot would've been more clear if it would've showed the CMR for each year and not for the 5-year blocks, and if he would've calculated the 2015-2019 linear trend in deaths based on the raw number of deaths and then converted it to CMR by dividing it by the population size, so then it would've been closer to the baseline that is actually used by OWID.

I wasn't able to reproduce Ethical Skeptic's plot perfectly, but I think he used a method similar to this to calculate CMR for the 5-year blocks:

library(data.table);library(ggplot2)

t=fread("http://sars2.net/f/nzcmr.csv")
p=t[,.(x=year,y=cmr,z="Actual CMR")]
v=t[,.(y=mean(cmr),z="Average of year and next 4 years"),.(x=(year)%/%5*5)]
lin1=p[,.(x,y=v[x>=1970,predict(lm(y~x),p)],z="Linear regression of blocks for 1970-2023")]
lin2=p[,.(x,y=v[x>=1970&x<=2015,predict(lm(y~x),p)],z="Linear regression of blocks for 1970-2019")]
p=rbind(p,v,lin1[x<=2020],lin2[x<=2020])
p$y=100*p$y

xstart=1969;xend=2023
p=p[x>=xstart&x<=xend]

cand=c(sapply(c(1,2,5),\(x)x*10^c(-10:10)))
ymax=max(p$y,na.rm=T);ymin=min(p$y,na.rm=T)
ystep=cand[which.min(abs(cand-(ymax-ymin)/6))]
yend=ystep*ceiling(ymax/ystep)
yend=ymax
ystart=ystep*floor(ymin/ystep)
ybreak=seq(ystart,yend,ystep)
ystart=ymin

color=c("black","blue","#77f","#ccf")

ggplot(p,aes(x,y))+
geom_vline(xintercept=c(xstart,xend),linewidth=.3,lineend="square")+
geom_hline(yintercept=c(ystart,yend),linewidth=.3,lineend="square")+
geom_line(aes(color=z,linetype=z),linewidth=.4)+
geom_point(data=p[!grep("Linear",z)],aes(color=z),size=.5,show.legend=F)+
labs(title="Crude mortality rate in New Zealand (deaths per 100,000 people)",subtitle=paste0("Source: infoshare.stats.govt.nz/Default.aspx
(\"Population\" > \"Death Rates - DMM\" > \"Crude death rate (Maori and total population) (Annual-Dec)\"). The linear regression was performed based on the 5-year blocks so that the x value for each block was the same year where the block is plotted here. The point for 2020 is an average of the CMR in 2020-2023. The point for 2020 is included in the darker blue dashed line but not in the lighter blue dashed line.")|>stringr::str_wrap(88),x=NULL,y=NULL)+
scale_x_continuous(limits=c(xstart,xend),breaks=seq(1900,2100,5))+
scale_y_continuous(limits=c(ystart,yend),breaks=ybreak,labels=\(x)ifelse(x>=1e3,paste0(x/1e3,"k"),x))+
scale_color_manual(values=color)+
scale_linetype_manual(values=c(1,1,2,2))+
coord_cartesian(clip="off",expand=F)+
theme(axis.text=element_text(size=7,color="black"),
  axis.ticks=element_line(linewidth=.3),
  axis.ticks.length=unit(3,"pt"),
  legend.direction="vertical",
  legend.background=element_rect(fill=alpha("white",.8),color="black",linewidth=.3),
  legend.key.height=unit(9,"pt"),
  legend.key.width=unit(15,"pt"),
  legend.key=element_rect(fill="white"),
  legend.margin=margin(-2,5,3,4),
  legend.justification=c(1,1),
  legend.position=c(1,1),
  legend.spacing.x=unit(1.5,"pt"),
  legend.text=element_text(size=7),
  legend.title=element_blank(),
  panel.grid.major=element_blank(),
  panel.background=element_rect(fill="white"),
  plot.margin=margin(3,5,4,4),
  plot.subtitle=element_text(size=6.7,margin=margin(0,0,4,0)),
  plot.title=element_text(size=7.5,face="bold",margin=margin(2,0,4,0)))
ggsave("0.png",width=4.2,height=2.9,dpi=400*4)
system("magick 0.png -resize 25% 1.png")

In Ethical Skeptic's plot the B baseline looks closer to my lighter blue dashed line which was fitted against the 5-year blocks for 1970-2023 than my darker blue dashed line which was fitted against the 5-year blocks for 1970-2019, or maybe the B baseline somewhere in between my two baselines. I don't know if he just drew the B baseline by hand, but it doesn't look steep enough to be fitted against data from 1970-2019 but it looks more like it's fitted against data up to the present in his plot. (Which is not really an important point, but it would be ironic if he was simultaneously guilty of paltering his own baseline while he falsely accused OWID of paltering.)

For some reason the y-axis of Ethical Skeptic's plot says "deaths per 1000 people" but the values on the y-axis are over 6,000, so I don't know if it's supposed to mean deaths per million people.

But in any case, the age structure of the New Zealand population has completely changed since the 1970s, so it doesn't make any sense to just a linear baseline for CMR that starts from the 70s, because the actual trend in CMR is curved due to the changing age structure. Here I only had population data that went back to 1992, but the change since the 70s would've been even more dramatic:

library(data.table)

agecut=\(x,y)cut(x,c(y,Inf),paste0(y,c(paste0("-",y[-1]-1),"+")),T,F)

t=fread("http://sars2.net/f/nzpopdead.csv")

m=t[,.(pop=sum(pop,na.rm=T)),.(age=agecut(age,0:9*10),year)][year>=1992,xtabs(pop~age+year)]
m=t(t(m)/colSums(m))*100

maxcolor=max(m)

disp=m;disp[]=sprintf("%.*f",ifelse(m>10,0,1),m)

pheatmap::pheatmap(m,filename="1.png",display_numbers=disp,
  cluster_rows=F,cluster_cols=F,legend=F,cellwidth=14,cellheight=14,fontsize=9,fontsize_number=8,
  border_color="#cccccc",number_color="black","white")

system("w=`identify -format %w 1.png`;pad=20;magick -pointsize 42 -font Arial \\( -size $[w-pad] caption:'New Zealand resident population estimates: Percentage of each age group' -splice 20x24 \\) 1.png -append 1.png")

Ethical Skeptic said that these were signs of a data charlatan, so I don't know if he was speaking about himself: [https://x.com/EthicalSkeptic/status/1758546649146593711]

Out of the three baselines in the next plot, I think the green baseline is the most accurate but it's far above the actual CMR in 2020:

library(data.table);library(ggplot2);library(stringr)

t=fread("http://sars2.net/f/nzpopdead.csv")
a=t[,.(dead=sum(dead),pop=sum(pop,na.rm=T)),.(age=pmin(age,95),year)]

nzcmr=fread("http://sars2.net/f/nzcmr.csv")
p=nzcmr[year>=1970,.(x=year,y=cmr,z="Actual CMR")]

lm=a[year%in%2010:2019,.(year=2010:2023,trend=predict(lm(dead/pop~year),.(year=2010:2023))),age]
p=rbind(p,merge(a,lm)[,.(dead=sum(trend*pop),pop=sum(pop)),year][,.(y=dead/pop*1e5,z="2010-2019 trend in CMR by single year of age multiplied by population size"),.(x=year)])

#lm=a[year%in%2010:2019,.(year=1970:2023,lm=predict(lm(dead/pop~poly(year,2)),.(year=1970:2023))),age]
#p=rbind(p,merge(a,lm)[,.(dead=sum(lm*pop),pop=sum(pop)),year][,.(y=dead/pop*1e5,z="1970-2019 quadratic trend in CMR for 5-year age groups"),.(x=year)])

p=rbind(p,nzcmr[year%in%1970:2019,.(x=1970:2023,y=predict(lm(cmr~year),.(year=1970:2023)),z="1970-2019 linear trend in CMR (like Ethical Skeptic's B baseline)")])

owid=a[,.(dead=sum(dead)),year][year%in%2015:2019,.(year=2010:2023,owid=predict(lm(dead~year),.(year=2010:2023)))]
p=rbind(p,merge(a[,.(pop=sum(pop)),year],owid)[,.(x=year,y=owid/pop*1e5,z="2015-2019 linear trend in deaths (like OWID)")])

# p=a[year>=1991,.(y=sum(dead)/sum(pop)*1e5,z="Actual CMR"),.(x=year)]

# p=rbind(p,merge(a,a[year==2020,.(age,std=pop/sum(pop))])[year>=1992,.(y=sum(dead/pop*std*1e5),z="ASMR using 2020 population"),.(x=year)])

p$z=factor(p$z,unique(p$z))

xstart=1970;xend=2023

p=p[x>=xstart&x<=xend]

cand=c(sapply(c(1,2,5),\(x)x*10^c(-10:10)))
ymin=min(p$y,na.rm=T);ymax=max(p$y,na.rm=T)
ystep=cand[which.min(abs(cand-(ymax-ymin)/6))]
# yend=ystep*ceiling(ymax/ystep)
yend=ymax
ystart=ystep*floor(ymin/ystep)
ybreak=seq(ystart,yend,ystep)
ystart=ymin

color=c("black","#00bb00","#0055ff","#ee0000",hsv(30/36,1,.8))

cap="    Source: infoshare.stats.govt.nz/Default.aspx."
cap=paste0(cap,"\n    ","In the green baseline, for example the CMR in 2023 was calculated by first calculating a trend in CMR for each age in 2010-2019, then the trend projected to 2023 was multiplied with the population size of each age in 2023 to get the expected deaths for each age group, and the expected deaths of all age groups were added together and then divided by the population size in 2023."|>str_wrap(90))
cap=paste0(cap,"\n    ","The red line was calculated by doing a linear regression for the raw number of deaths and by then dividing the projected trend with the population size to convert it to CMR."|>str_wrap(90))

ggplot(p,aes(x,y))+
geom_vline(xintercept=c(xstart,xend),linewidth=.3,lineend="square")+
geom_hline(yintercept=c(ystart,yend),linewidth=.3,lineend="square")+
geom_line(aes(color=z),linewidth=.4)+
geom_point(data=p[grep("Actual",z)],aes(color=z),size=.5,show.legend=F)+
labs(title="Crude mortality rate in New Zealand (deaths per 100,000 people)",subtitle=cap,x=NULL,y=NULL)+
scale_x_continuous(limits=c(xstart,xend),breaks=seq(1900,2100,5))+
scale_y_continuous(limits=c(ystart,yend),breaks=ybreak)+
scale_color_manual(values=color)+
coord_cartesian(clip="off",expand=F)+
theme(axis.text=element_text(size=7,color="black"),
  axis.ticks=element_line(linewidth=.3),
  axis.ticks.length=unit(3,"pt"),
  legend.direction="vertical",
  legend.background=element_rect(fill=alpha("white",.8)),
  legend.key.height=unit(9,"pt"),
  legend.key.width=unit(15,"pt"),
  legend.key=element_rect(fill="white"),
  legend.box.spacing=unit(2,"pt"),
  legend.margin=margin(,,3),
  legend.justification=c(0,1),
  legend.position="bottom",
  legend.spacing.x=unit(1.5,"pt"),
  legend.text=element_text(size=7),
  legend.title=element_blank(),
  panel.grid.major=element_blank(),
  panel.background=element_rect(fill="white"),
  plot.margin=margin(3,5,4,4),
  plot.subtitle=element_text(size=6.7,margin=margin(0,0,4,0)),
  plot.title=element_text(size=7.5,face="bold",margin=margin(2,0,4,0)))
ggsave("0.png",width=4.2,height=3.7,dpi=400*4)
system("magick 0.png -resize 25% 1.png")

And also if you look at ASMR instead of CMR, then there was clearly negative excess mortality in 2020 even relative to a long-term trend like my 1992 to 2019 quadratic trend in this plot:

library(data.table)

t=fread("http://sars2.net/f/nzpopdead.csv")
a=t[,.(dead=sum(dead),pop=sum(pop,na.rm=T)),.(age=pmin(95,age),year)]
p=merge(a,a[year==2020,.(age,std=pop/sum(pop))])[year>=1992,.(asmr=sum(dead/pop*std*1e5)),year]
p$trend=p[year%in%2012:2019,predict(lm(asmr~year),.(year=p$year))]
p$poly=p[year%in%1992:2019,predict(lm(asmr~poly(year,2)),.(year=p$year))]

png("1.png",2000,1200,res=300)
par(mar=c(2.2,3,2.1,.8),mgp=c(0,.6,0),las=1)
tit="Age-standardized mortality rate in New Zealand"
leg=c("Actual ASMR","2012-2019 linear trend","1992-2019 quadratic trend")
col=c("black","#0000ff","#00aa00")
plot(p$year,p$asmr,type="l",main=tit,xlab=NA,ylab=NA,ylim=range(p[,-1]),lwd=1.5,xaxs="i")
lines(p$year,p$trend,type="l",col=col[2],lwd=1.5)
lines(p$year,p$poly,type="l",col=col[3],lwd=1.5)
legend("topright",legend=leg,col=col,lty=1,lwd=1.5)
dev.off()

Deaths from malignant neoplasms in ages 0 to 54

In this plot that shows deaths with MCD malignant neoplasms but not MCD COVID, there's has been a steady increase in excess deaths since 2020 or 2021: [https://x.com/EthicalSkeptic/status/1834266396714246408]

At first I thought the steady increase might have been if Ethical Skeptic calculated a linear trend for his baseline using only data from 2018 and 2019, which might have caused his baseline to point too much downwards because there was an unusually low number of deaths in 2019 if you look at deaths from all causes in all ages. However in the case of this data for cancer deaths in ages 0-54, the number of deaths in 2019 fell close to a longer-term prepandemic trend, so a 2018 to 2019 linear trend happens to be an unusually good approximation of the longer-term trend before COVID.

The plot below shows raw deaths but I used a method to calculate the baseline which accounted for the changing age composition of the population, but my baseline was still roughly linear and it didn't have any major inflection point, so it was similar to a simple baseline that would've been produced by fitting a linear trend against raw deaths. In many cases a prepandemic linear trend fitted against raw deaths is inaccurate, but that doesn't seem to be the case here:

library(data.table);library(ggplot2);library(lubridate)

cul=\(x,y)y[cut(x,c(y,Inf),,T,F)]

t=fread("http://sars2.net/f/wondermalignant5year.csv")[cause%like%"Multiple",.(age,date=month,dead)]
pop=fread("http://sars2.net/f/uspopdeadmonthly.csv")
t=merge(t,pop[,.(pop=sum(pop)),.(date,age=cul(age,c(0,1,1:20*5)))],all=T)

t=t[,date:=as.Date(paste0(date,"-1"))]
t[,dead:=nafill(dead/days_in_month(date),,0)]
t=t[year(date)%in%2011:2023&age<55]

months=sort(unique(t$date))
t=merge(t,t[year(date)<2020,.(date=months,base=predict(lm(dead/pop~date),.(date=months))),age],by=names(t)[1:2])
t[,base:=base*pop]

p=t[,.(base=sum(base),dead=sum(dead)),date]

and=fread("http://sars2.net/f/wondermalignantmcdandcovidmcd.csv")[age=="under55"&date!="2024-09"]
and=and[,date:=as.Date(paste0(date,"-1"))][,.(dead=dead/days_in_month(date))]
p[date%in%and$date,dead:=dead-and$dead]

xstart=as.Date("2011-1-1");xend=as.Date("2024-1-1")
p=p[date>=xstart&date<=xend]
xbreak=seq(xstart,xend,"6 month")
xlab=c(rbind("",2011:2023),"")

cand=c(sapply(c(1,2,5),\(x)x*10^c(-10:10)))
ymin=p[,min(base,dead,na.rm=T)];ymax=p[,max(base,dead,na.rm=T)]
ystep=cand[which.min(abs(cand-(ymax-ymin)/7))]
ystart=ystep*floor(ymin/ystep)
yend=ystep*ceiling(ymax/ystep)
ybreak=seq(ystart,yend,ystep)

color=c("black","gray60")

ggplot(p,aes(x=date+14,y=dead))+
geom_vline(xintercept=seq(xstart,xend,"year"),color="gray90",linewidth=.25)+
geom_hline(yintercept=c(ystart,yend),linewidth=.25,lineend="square")+
geom_vline(xintercept=c(xstart,xend),linewidth=.25,lineend="square")+
geom_line(aes(y=base),color="gray60",linewidth=.3)+
geom_line(linewidth=.3)+
geom_point(stroke=0,size=.6)+
labs(title="CDC WONDER, ages 0 to 54, MCD malignant neoplasms but not MCD COVID: Monthly deaths divided by number of days in month"|>stringr::str_wrap(68),subtitle="In order to calculate the gray line which shows the expected number of deaths, first a linear trend in CMR was calculated for each 5-year age group in 2011-2019, and then the population size of each age group was multiplied by the value of the projected trend."|>stringr::str_wrap(94),x=NULL,y=NULL)+
scale_x_continuous(limits=c(xstart,xend),breaks=xbreak,labels=xlab)+
scale_y_continuous(limits=c(ystart,yend),breaks=ybreak)+
coord_cartesian(clip="off",expand=F)+
theme(axis.text=element_text(size=7,color="black"),
  axis.ticks.x=element_line(color=alpha("black",c(1,0))),
  axis.ticks=element_line(linewidth=.25,color="black"),
  axis.ticks.length=unit(.2,"lines"),
  axis.title=element_text(size=8),
  legend.position="none",
  panel.background=element_blank(),
  plot.margin=margin(5,5,5,5),
  plot.subtitle=element_text(size=7,margin=margin(,,4)),
  plot.title=element_text(size=7.4,face=2,margin=margin(1,,4)))
ggsave("1.png",width=4.3,height=2.8,dpi=450)

In the plot above I also got a steady increase in excess deaths which started around 2020 or 2021, so the increase in excess deaths doesn't just seem to be an artifact produced by an inappropriate baseline. However the inflection point in deaths seems to be around late 2020 or early 2021 so I don't know if it can be blamed on the vaccines, because cancers normally take time to develop and there weren't yet that many young people who had been vaccinated during the first quarter of 2021.

When I looked at all ages instead of only ages 0 to 54, there was not such a clear increase in deaths above the baseline in 2022-2023 (even though I guess Ethical Skeptic might argue it was because I didn't adjust sufficiently for the PFE in my baseline, because my baseline only had a slight downwards adjustment due to the reduction in population size caused by excess deaths since 2020):

In the plot above the baseline was not adjusted for seasonal variation in mortality, but the reason why my baseline has the wavy pattern is that the population sizes of elderly age groups are slightly higher in the winter. I calculated the baseline using monthly resident population estimates published by the US Census Bureau.

When Ed Dowd's group published a paper about deaths from neoplasms in ages 15 to 44 which used a 2010-2019 linear baseline for CMR, I pointed out that the 1999-2019 trend in the mortality rate seems to have been curved regardless of whether you look at CMR or ASMR, so the 2010-2019 linear baseline might have been too steep and a flattening of the trend might have been expected during COVID. However that doesn't seem to be the case for deaths from malignant neoplasms in ages 0 to 54, where the 1999-2019 quadratic trend in ASMR is close to linear:

library(data.table);library(ggplot2);library(lubridate)

pop1=fread("https://www2.census.gov/programs-surveys/popest/datasets/2000-2010/intercensal/national/us-est00int-alldata.csv")
pop1=pop1[MONTH==7&AGE!=999&YEAR<2010,.(age=AGE,year=YEAR,pop=TOT_POP)]
pop2=fread("https://www2.census.gov/programs-surveys/popest/datasets/2010-2020/national/asrh/nc-est2020-agesex-res.csv")
pop2=pop2[SEX==0&AGE!=999][,.(age=AGE,year=rep(2010:2019,each=.N),pop=unlist(.SD[,5:14],,F))]
pop3=fread("https://www2.census.gov/programs-surveys/popest/datasets/2020-2023/national/asrh/nc-est2023-agesex-res.csv")
pop3=pop3[SEX==0&AGE!=999][,.(age=AGE,year=rep(2020:2023,each=.N),pop=unlist(.SD[,4:7],,F))]
pop=rbind(pop1,pop2,pop3)

t=fread("http://sars2.net/f/wondercanceryearly.csv")[year<2024]
t=merge(t,pop)
t=merge(pop[year==2020,.(age,std=pop/sum(pop))],t)

t1=cbind(t[(cause%like%"Neo"&age%in%15:44)],group="Neoplasms (C00-D48), ages 15 to 44")
t2=cbind(t[(cause%like%"Malig"&age<55)],group="Malignant neoplasms (C00-C97), ages 0 to 54")
t=rbind(t1,t2)[,group:=factor(group,unique(group))]

p=t[,.(y=sum(dead/pop*std*1e5),z="Actual ASMR"),.(year,group)]
p1=p[year%in%2011:2019,.(year=2000:2023,z="2011-2019 linear trend",y=predict(lm(y~year),.(year=2000:2023))),group]
p2=p[year<2020,.(year=2000:2023,z="2000-2019 quadratic trend",y=predict(lm(y~poly(year,2)),.(year=2000:2023))),group]
p=rbind(p,p1,p2)[,z:=factor(z,unique(z))]

xstart=2000;xend=2023;xbreak=seq(xstart-.5,xend+.5,.5);xlab=c(rbind("",xstart:xend),"")

color=c("black","#0022cc","#009900")

ggplot(p,aes(x=year,y=y))+
facet_wrap(~group,ncol=1,dir="v",scales="free")+
geom_label(data=p[,max(y),group],aes(label=paste0("\n   ",group,"   \n"),y=V1),x=xend+.5,lineheight=.5,hjust=1,vjust=1,size=grid::convertUnit(unit(7,"pt"),"mm"),fill=alpha("white",.8),label.r=unit(0,"lines"),label.padding=unit(0,"lines"),label.size=.3)+
geom_line(aes(color=z),linewidth=.3)+
geom_point(aes(color=z),data=p[z=="Actual ASMR"],stroke=0,size=.8,show.legend=F)+
geom_point(data=p[,max(y)/p[,max(y)/min(y),group][,max(V1)],group],aes(y=V1,x=xstart),stroke=0,size=0)+
labs(title="CDC WONDER: ASMR per 100,000 people for underlying cause of death cancer",caption="ASMR was calculated by single year of age so that the 2020 resident population estimates were used as the standard population."|>stringr::str_wrap(90),x=NULL,y=NULL)+
scale_x_continuous(limits=c(xstart-.5,xend+.5),breaks=xbreak,labels=xlab)+
scale_color_manual(values=color)+
coord_cartesian(clip="off",expand=F)+
theme(axis.text=element_text(size=7,color="black"),
  axis.text.x=element_text(angle=90,vjust=.5,hjust=1.2),
  axis.ticks.x=element_line(color=alpha("black",c(1,0))),
  axis.ticks=element_line(linewidth=.25,color="black"),
  axis.ticks.length=unit(3,"pt"),
  axis.title=element_text(size=8),
  legend.background=element_blank(),
  legend.box.spacing=unit(0,"pt"),
  legend.direction="horizontal",
  legend.justification="right",
  legend.key=element_rect(fill=alpha("white",0)),
  legend.key.height=unit(10,"pt"),
  legend.key.width=unit(17,"pt"),
  legend.margin=margin(,,3),
  legend.position="top",
  legend.spacing.x=unit(1.5,"pt"),
  legend.text=element_text(size=7,vjust=.5),
  legend.title=element_blank(),
  panel.background=element_blank(),
  panel.border=element_rect(fill=NA,linewidth=.3),
  panel.spacing=unit(0,"pt"),
  plot.caption=element_text(hjust=0,size=7,margin=margin(2)),
  plot.margin=margin(4,5,4,5),
  plot.subtitle=element_text(size=7,margin=margin(,,4)),
  plot.title=element_text(size=7.4,face=2,margin=margin(1,,3)),
  strip.background=element_blank(),
  strip.text=element_blank())
ggsave("1.png",width=4.4,height=3.7,dpi=400*4)
system("magick 1.png -resize 25% 1.png")
system("qlmanage -p ~/1.png&>/dev/null")

So in conclusion I didn't find any major flaw in Ethical Skeptic's plot for cancer deaths in ages 0 to 54, and there seems to be a genuine inflection point in mortality around the year 2020 or 2021.

Deaths from malignant neoplasms with excess MCD normalization

Ethical Skeptic posted this plot for deaths from malignant neoplasms where he employed a trick he calls "excess-MCOD normalization". I believe it means that he plotted UCD deaths up to week 9 of 2022, after which he added in a gradually increasing proportion of MCD deaths to his plot, so that the proportion reached the maximum level on week 20 of 2023 after which it remained at the maximum level: [https://theethicalskeptic.com/2024/04/04/the-state-of-things-pandemic-week-50-2023/]


Chart 6 - Excess Cancer Mortality (USA) - stands at 5.9% or 747 persons per week over the last three weeks on average. More alarmingly however, is the novel compound annualized growth rate (CAGR) of 2.2%, a significant departure from the old growth rate of 0.23%, normalized from the years 2014-2019 (dark orange line). This chart is expertly adjusted for reporting lag (procedure here), Pull Forward Effect (345 week PFE method is depicted in Charts 3b and 6), and excess MCoD attributions (critical data mining method depicted in Chart 5 above). In other words it is showing the true cancer UCoD rate (shorted by pressure on physicians to not report cancers unless 'proven' by multiple tests). Despite all these shorts in the data collection, the raw (and misleading) UCoD data nonetheless shows a rise - which can be seen by clicking here.

He justified the use of his trick by showing that the ratio of MCD to UCD deaths for malignant neoplasms has remained elevated since 2020, so he assumes that some cancer deaths which should've been classified as UCD were classified as MCD instead:

The green-to-red line in Chart 5 below shows the CDC Wonder Excess Attributions of underlying cause of death Cancer to multiple cause of death Cancer, per week for the two years prior, as well as years-of and years-post the pandemic. The dotted baseline is normalized from the years 2018-2019, and indexed to the last 7 week average as compared to those same 7 weeks of 2018/19, so it is not a ‘regression’. Since this is a relative index, it should exhibit no trend or growth rate (outside the context of Covid mortality peak periods of course). The current excess of 321 deaths reflects cancer deaths which are concealed from the underlying cause of death ICD code, and must be added back into the mix in order to make Excess Cancer Mortality comparable to its past baseline. We reconcile this into Chart 6 below, applying it only after the pandemic period ended.


Chart 5 - Cancer Concealment by Excess MCOD Attribution - stands at 394 concealed cancer deaths per week. In this practice, cancers are moved from the underlying cause of death line in Part I of the death certificate, to one of the multiple cause of death lines above the UCoD line, in excess of past practice. Once Covid has abated, this balance should trend back to zero. It has not. Therefore, in order to capture the true cancer mortality increase, these cancer deaths falsely attributed to Covid and other underlying causes must be added back into the total in the Excess Cancer Mortality chart below. These missing deaths are phased back in from Week 10 of 2022 (point of large decline in Covid mortality) onward.

However the x-axis in the plot above started from 2018 so you couldn't see that there had been an increasing trend in the MCD to UCD ratio since approximately 2015. And in my next plot if you look at the light gray line where I subtracted deaths with MCD COVID from the MCD cancer deaths, the MCD to UCD ratio since 2020 was below the prepandemic trend on average, so basically all of the increase in the MCD to UCD ratio since 2020 can be explained by either a continuation of the prepandemic trend or by deaths with MCD COVID:

library(data.table);library(ggplot2)

t=fread("http://sars2.net/f/wondermalignant5year.csv")
t=t[,.(dead=sum(dead)),.(month,cause)]

and=fread("http://sars2.net/f/wondermalignantmcdandcovidmcd.csv")

p=t[grep("Multiple",cause),.(x=month,y=dead,z="MCD COVID not subtracted")]
p=rbind(p,merge(p,and[,.(x=month,and=dead)])[,.(x,y=y-and,z="MCD COVID subtracted")])
p=merge(p,t[grep("Under",cause),.(x=month,under=dead)])
p[,y:=y/under*100]

p=p[x!="2024-09"]
p[,x:=as.Date(paste0(x,"-1"))]

xstart=as.Date("1999-1-1");xend=as.Date("2025-1-1")
xbreak=seq(xstart,xend,"6 month")
xlab=c(rbind("",1999:2024),"")

cand=c(sapply(c(1,2,5),\(x)x*10^c(-10:10)))
ymin=p[,min(y,na.rm=T)];ymax=p[,max(y,na.rm=T)]
ystep=cand[which.min(abs(cand-(ymax-ymin)/7))]
ystart=ystep*floor(ymin/ystep)
yend=ystep*ceiling(ymax/ystep)
ybreak=seq(ystart,yend,ystep)

color=c("black","gray60")

ggplot(p,aes(x=x+14,y=y,color=z))+
geom_vline(xintercept=seq(xstart,xend,"year"),color="gray90",linewidth=.25)+
geom_hline(yintercept=c(ystart,yend),linewidth=.25,lineend="square")+
geom_vline(xintercept=c(xstart,xend),linewidth=.25,lineend="square")+
geom_line(linewidth=.3)+
labs(title="CDC WONDER: Monthly deaths with cause C00-C97 (malignant neoplasms): MCD deaths as percentage of UCD deaths"|>stringr::str_wrap(70),x=NULL,y=NULL)+
scale_x_continuous(limits=c(xstart,xend),breaks=xbreak,labels=xlab)+
scale_y_continuous(limits=c(ystart,yend),breaks=ybreak,labels=\(x)paste0(x,"%"))+
scale_color_manual(values=color)+
coord_cartesian(clip="off",expand=F)+
theme(axis.text=element_text(size=7,color="black"),
  axis.text.x=element_text(angle=90,vjust=.5,hjust=1),
  axis.ticks.x=element_line(color=alpha("black",c(1,0))),
  axis.ticks=element_line(linewidth=.25,color="black"),
  axis.ticks.length=unit(.2,"lines"),
  axis.title=element_text(size=8),
  legend.background=element_blank(),
  legend.box.just="left",
  legend.justification=c(0,1),
  legend.key=element_blank(),
  legend.key.height=unit(10,"pt"),
  legend.key.width=unit(17,"pt"),
  legend.position=c(0,1),
  legend.spacing.x=unit(2,"pt"),
  legend.box.background=element_rect(fill="white",color="black",linewidth=.3),
  legend.margin=margin(-2,5,4,4,"pt"),
  legend.text=element_text(size=7,vjust=.5),
  legend.title=element_blank(),
  panel.background=element_blank(),
  plot.margin=margin(5,5,5,5),
  plot.title=element_text(size=7.6,margin=margin(2,,4)))
ggsave("1.png",width=4.2,height=2.8,dpi=450)

In the plot below if you look at the black line for UCD deaths, it remained near the baseline in 2020 and 2021 but by 2023 it had risen clearly above the baseline. And similarly if you look at the light gray line for MCD deaths with COVID subtracted, it also remained close to the baseline in 2020 and 2021 but it rose clearly above the baseline in 2023. So there seems to be a genuine increase in cancer deaths above the trend in 2023 even if you only look at UCD deaths, even though it's much smaller than the increase Ethical Skeptic gets when he applies his excess MCD normalization and PFE adjustment tricks to the data:

library(data.table);library(ggplot2);library(lubridate)

cul=\(x,y)y[cut(x,c(y,Inf),,T,F)]

t=fread("http://sars2.net/f/wondermalignant5year.csv")

pop=fread("http://sars2.net/f/uspopdeadmonthly.csv")
t=merge(t,pop[,.(pop=sum(pop)),.(month=date,age=cul(age,c(0,1,1:20*5)))])

t=t[,month:=as.Date(paste0(month,"-1"))]
t[,dead:=dead/days_in_month(month)]

months=sort(unique(t$month))
base=t[!grepl("COVID",cause)&year(month)%in%2011:2019][,dead:=nafill(dead,,0)]
t=merge(base[,.(month=months,base=predict(lm(dead/pop~month),.(month=months))),.(age,cause)],t)
t[,base:=base*pop]

p=t[,.(base=sum(base),dead=sum(dead)),.(date=month,cause)]

and=fread("http://sars2.net/f/wondermalignantmcdandcovidmcd.csv")
and=and[,month:=as.Date(paste0(month,"-1"))][,.(date=month,andcovid=dead/days_in_month(month))]
p=rbind(p,merge(and,p[cause%like%"Multiple"])[,.(date,base=NA,cause="Multiple cause of death C00-C97 but not COVID",dead=dead-andcovid)])

xstart=as.Date("2015-1-1");xend=as.Date("2024-1-1")
p=p[date>=xstart&date<=xend]
xbreak=seq(xstart,xend,"6 month")
xlab=c(rbind("",2015:2023),"")

cand=c(sapply(c(1,2,5),\(x)x*10^c(-10:10)))
ymin=p[,min(base,dead,na.rm=T)];ymax=p[,max(base,dead,na.rm=T)]
ystep=cand[which.min(abs(cand-(ymax-ymin)/7))]
ystart=ystep*floor(ymin/ystep)
yend=ystep*ceiling(ymax/ystep)
ybreak=seq(ystart,yend,ystep)

color=c("gray60","gray85","black")

ggplot(p,aes(x=date+14,y=dead,color=cause))+
geom_vline(xintercept=seq(xstart,xend,"year"),color="gray90",linewidth=.25)+
geom_hline(yintercept=c(ystart,yend),linewidth=.25,lineend="square")+
geom_vline(xintercept=c(xstart,xend),linewidth=.25,lineend="square")+
geom_line(linewidth=.3)+
geom_point(stroke=0,size=.6,show.legend=F)+
geom_line(aes(y=base),linetype=2,linewidth=.3)+
labs(title="CDC WONDER: Monthly deaths with cause C00-C97 (malignant neoplasms) divided by number of days in month"|>stringr::str_wrap(70),x=NULL,y=NULL)+
scale_x_continuous(limits=c(xstart,xend),breaks=xbreak,labels=xlab)+
scale_y_continuous(limits=c(ystart,yend),breaks=ybreak)+
scale_color_manual(values=color)+
coord_cartesian(clip="off",expand=F)+
theme(axis.text=element_text(size=7,color="black"),
  axis.ticks.x=element_line(color=alpha("black",c(1,0))),
  axis.ticks=element_line(linewidth=.25,color="black"),
  axis.ticks.length=unit(.2,"lines"),
  axis.title=element_text(size=8),
  legend.background=element_blank(),
  legend.box.just="left",
  legend.justification=c(0,1),
  legend.key=element_blank(),
  legend.key.height=unit(10,"pt"),
  legend.key.width=unit(17,"pt"),
  legend.position=c(0,1),
  legend.spacing.x=unit(2,"pt"),
  legend.box.background=element_rect(fill="white",color="black",linewidth=.3),
  legend.margin=margin(-2,5,4,4,"pt"),
  legend.text=element_text(size=7,vjust=.5),
  legend.title=element_blank(),
  panel.background=element_blank(),
  plot.margin=margin(5,5,5,5),
  plot.title=element_text(size=7.6,face=2,margin=margin(2,,4)))
ggsave("1.png",width=4.3,height=2.8,dpi=450)

sub="\u00a0     The dashed line is a baseline for expected deaths, which was calculated by first doing a linear regression for CMR for each 5-year age group in 2011-2019, then multiplying the monthly population estimates of each age group by the value of the projected linear trend, and then adding together the results for each age group. The baseline is not adjusted for seasonal variation in mortality, but the baseline is slightly higher in winter because the population estimates for elderly age groups were higher in winters.
      Monthly resident population estimates by single year of age are from www2.census.gov/
programs-surveys/popest/datasets/2020-2022/national/asrh/nc-est2022-alldata-r-file0{1..8}.csv, and from the corresponding files in the 2010-2020 directory. In order to get rid of a sudden jump in population size after the switch from the 2010-2020 to 2020-2022 estimates, the 2010-2020 estimates were multiplied by a slope so that they matched the 2020-2022 estimates at the point where the estimates were merged."

system(paste0("f=1.png;mar=30;w=`identify -format %w $f`;magick \\( $f -chop 0x10 \\) \\( -size $[w-mar*2]x -font Arial -interline-spacing -3 -pointsize 42 caption:'",sub,"' -gravity southwest -splice $[mar]x20 \\) -append 1.png"))

However I don't think the method of excess MCD normalization is justified, because there was already an increasing trend in the MCD to UCD ratio before COVID, and because deaths with MCD COVID account for basically all of the increase in the MCD to UCD ratio above the prepandemic trend.

And if there would be some other cause of death which had a reduced MCD to UCD ratio since 2020 compared to the ratio before 2020, should people be applying a "negative excess MCD normalization" where they subtract a part of deaths when they plot UCD deaths?

I think Ethical Skeptic hasn't documented what precise calculation he used to determine the magnitude of the excess MCD normalization, or how he picked week 10 of 2022 as the date when he started applying the normalization or week 20 of 2023 as the date when he stopped increasing the amount of added deaths. If he didn't pick the variables programmatically but based on some kind of subjective criteria, it would be difficult for other people to reproduce his work systematically or so that they would be able to apply the same methodology to other causes of death.

In any case in his plot which has a line that shows the number of deaths with excess MCD normalization, he should've also included another line which would've showed the number of deaths without the normalization, so it would've been clearer to his followers what the magnitude of the normalization was and when it started.

(Ethical Skeptic uses the acronym "MCoD" or "MCOD" to refer to multiple cause of death but CDC WONDER uses the acronym "MCD", so I'm following the convention of CDC WONDER on this website.)

Files for population estimates by single year of age up to ages 100+

One annoying feature at CDC WONDER is that it doesn't return population sizes for ages 85 and above, even though the US Census Bureau has published population estimates by single year of age up to ages 100+ from 2010 onwards. The 2000-2010 population estimates by single year of age only go up to ages 85+ however, so in the following code I only included population estimates for 2010 and later years.

The code below downloads the 2010-2020 resident population estimates that are based on the 2010 census and the 2020-2023 resident population estimates that are based on the 2020 census: https://www2.census.gov/programs-surveys/popest/datasets/2010-2020/national/asrh/, https://www2.census.gov/programs-surveys/popest/datasets/2020-2023/national/asrh/.

In order to get rid of a sudden jump in the population estimates at the point when the new and old estimates are merged, the old estimates are multiplied by a linear slope so that they are the same in 2020 as the new estimates. For example for the age 80, the new population estimate for 2020 is 1,450,396 and the old estimate is 1,526,957 so their ratio is about 0.9499, so I'm multiplying the old estimate by about 1-(6/10)*(1-0.9499) for 2016, by about 1-(7/10)*(1-0.9499) for 2017, and so on. (I'm not sure if COVID deaths in the first half of 2020 are accounted for in the mid-year population estimates for 2020, but if they are then a slight problem with my approach is that COVID deaths are reflected in the slope by which the 2010-2019 estimates are multiplied.)

CDC WONDER uses vintage 2021 population estimates for 2021, vintage 2022 estimates for 2022, and so on, but here I'm using vintage 2023 population estimates for 2020 to 2023 and vintage 2020 estimates for 2010 to 2019.

I also added a column for the yearly number of deaths for each age at CDC WONDER as of October 2024.

new=fread("https://www2.census.gov/programs-surveys/popest/datasets/2020-2023/national/asrh/nc-est2023-agesex-res.csv")
old=fread("https://www2.census.gov/programs-surveys/popest/datasets/2010-2020/national/asrh/nc-est2020-agesex-res.csv")

old=old[SEX==0&AGE!=999,.(age=AGE,pop=unlist(.SD[,-(1:4)]),year=rep(2010:2020,each=.N))]
new=new[SEX==0&AGE!=999,.(age=AGE,pop=unlist(.SD[,-(1:3)]),year=rep(2020:2023,each=.N))]

me=merge(merge(old[year==2020,.(age,old=pop)],new[year==2020,.(age,new=pop)])[,.(age,ratio=new/old)],old)
pop=rbind(me[year!=2020,.(year,age,pop=round(pop*(1-(((year-2010)/10))*(1-ratio))))],new)

dead=fread("usyearlydead.csv") # from CDC WONDER
o=merge(pop,dead,all=T)
fwrite(o[order(year,age)],"uspopdead.csv",na="")

I uploaded the output here: f/uspopdead.csv. This is a similar file for monthly data: f/uspopdeadmonthly.csv. Both files are used in some of my scripts on this page.

This shows that particularly ages 90+ have a huge jump between the switch from the estimates based on the 2010 census to the estimates based on the 2020 census:

library(data.table);library(ggplot2)

agecut=\(x,y)cut(x,c(y,Inf),paste0(y,c(paste0("-",y[-1]-1),"+")),T,F)

kim=\(x)ifelse(x>=1e3,ifelse(x>=1e6,paste0(x/1e6,"M"),paste0(x/1e3,"k")),x)

new=fread("https://www2.census.gov/programs-surveys/popest/datasets/2020-2023/national/asrh/nc-est2023-agesex-res.csv")
old=fread("https://www2.census.gov/programs-surveys/popest/datasets/2010-2020/national/asrh/nc-est2020-agesex-res.csv")

old=old[SEX==0&AGE!=999,.(age=AGE,pop=unlist(.SD[,-(1:4)]),year=rep(2010:2020,each=.N))]
new=new[SEX==0&AGE!=999,.(age=AGE,pop=unlist(.SD[,-(1:3)]),year=rep(2020:2023,each=.N))]

me=merge(merge(old[year==2020,.(age,old=pop)],new[year==2020,.(age,new=pop)])[,.(age,ratio=new/old)],old)
p=rbind(me[year!=2020,.(year,age,pop=round(pop*(1-(((year-2010)/10))*(1-ratio))))],new)
p$group="2011-2019 estimates multiplied by a linear slope"

p=rbind(p,rbind(old[year!=2020],new)[,group:="Unadjusted"])
p[,group:=factor(group,unique(group))]

p=p[,.(pop=sum(pop)),.(group,year,age=agecut(age,0:9*10))]

xstart=2010;xend=2023
p=p[year%in%xstart:xend]

levels(p$age)=p[year==2019,sprintf("%s (%.1f%%)",age,(pop[group!="Unadjusted"]/pop[group=="Unadjusted"]-1)*100)]

ggplot(p,aes(x=year,y=pop,color=group))+
coord_cartesian(clip="off",expand=F)+
facet_wrap(~age,ncol=2,dir="v",scales="free_y")+
geom_vline(xintercept=c(2014.5,2019.5),color="gray80",linewidth=.23)+
geom_line(linewidth=.3)+
geom_point(stroke=0,size=.7)+
labs(title="Mid-year resident population estimates by US Census Bureau",subtitle="The number after the age group shows the percentage change between the unadjusted and adjusted estimates for 2019."|>stringr::str_wrap(78),x=NULL,y=NULL)+
scale_x_continuous(limits=c(xstart-.5,xend+.5),breaks=seq(xstart-.5,xend+.5,.5),labels=c(rbind("",xstart:xend),""))+
scale_y_continuous(labels=kim,breaks=\(x)pretty(x,3))+
scale_color_manual(values=c("gray60","black"))+
theme(axis.text=element_text(size=7,color="black"),
  axis.text.x=element_text(angle=90,vjust=.5,hjust=1),
  axis.ticks=element_line(linewidth=.2,color="black"),
  axis.ticks.length=unit(3,"pt"),
  axis.ticks.length.x=unit(0,"pt"),
  legend.background=element_blank(),
  legend.box.spacing=unit(0,"pt"),
  legend.justification="right",
  legend.key=element_blank(),
  legend.key.height=unit(10,"pt"),
  legend.key.width=unit(17,"pt"),
  legend.margin=margin(,,-2),
  legend.position="top",
  legend.spacing.x=unit(2,"pt"),
  legend.text=element_text(size=7,vjust=.5),
  legend.title=element_blank(),
  panel.background=element_blank(),
  panel.border=element_rect(fill=NA,linewidth=.23),
  panel.spacing=unit(-2,"pt"),
  panel.spacing.x=unit(3,"pt"),
  plot.margin=margin(5,5,5,5),
  plot.subtitle=element_text(size=7,margin=margin(,,3)),
  plot.title=element_text(size=7.4,face=2,margin=margin(1,,4)),
  strip.background=element_blank(),
  strip.text=element_text(size=7,vjust=-.3))
ggsave("1.png",width=4.4,height=4.4,dpi=350*4)
system("magick 1.png -resize 25% 1.png")

The decrease in the population size of ages 90+ might partially be due to COVID deaths in case COVID deaths are accounted for in the mid-year population estimates for 2020. But COVID deaths would probably only explain a small part of the decrease, because in the first half of 2020 CDC WONDER returned 21,600 deaths with UCD COVID in ages 90+, which is only about 0.9% of the total population size of ages 90+ based on the 2020 resident population estimates. In case the population estimates are calculated based on yearly data for deaths, it might also be that the mid-year population estimates for 2020 are affected by COVID deaths in the second half of 2020, but even in the whole of 2020, ages 90+ had 59,739 deaths with UCD COVID which was only about 2.5% of the total population size.

Cancer deaths by age group

Ethical Skeptic posted this tweet: [https://x.com/EthicalSkeptic/status/1841879475241300326]

Competence Question: If young age-bracket cancer mortality is up 20+%, while old age-bracket cancer mortality is down 3 to 6% due to PFE pool depletion, what is the underlying rate of cancer increase?

Answer: The young age bracket rate. Especially in the circumstance where the former young rate was negative (prior to inflection).

Beware of analysts who don't comprehend this.

In his plots which display weekly deaths starting from 2018, I believe Ethical Skeptic calculates some kind of a linear baseline based on data from 2018 and 2019 only, because he erroneously believes that CDC WONDER has different suppression behavior for the data before 2018 than the data from 2018 onwards. When I asked him that isn't a trend that is only fitted against data from 2018 and 2019 biased because there was a low number of deaths in 2019 and a high number of deaths in the first two month of 2018, he answered that the method he uses to fit the baseline is somehow different from a regular linear regression, but he didn't explain how: "However, I did not use a pure linear regression of 2018 and 19 for this reason. I fit the baseline to the variances linearly, but not by least-squares regression (how I avoided the early 2018 surge)." [https://theethicalskeptic.substack.com/p/the-state-of-things-pandemic-week-706/comment/53237226]

But in any case, fitting a linear trend based on raw deaths is inaccurate, and it's especially inaccurate when you're fitting the trend based on only two years of data. And it's especially inaccurate if you're plotting deaths within an age group 75 to 84, because its population size has increased dramatically since 2021 because the baby boomers started turning 75 in 2021.

In the next plot I believe the light gray baseline is somewhat similar to the method used by Ethical Skeptic, except he fits his baseline using weekly and not yearly data, and his method of regression might be somehow different from the regular linear regression I used here. But anyway I calculated the black baseline using a more accurate method, where for each single year of age I first calculated the linear trend in crude mortality rate in 2010 to 2019, and then I multiplied the value of the projected linear trend with the population size to get the expected deaths for each age. My method of calculating the baseline gave me close to 0% excess mortality even in ages 85+. But my plot also shows that the 2018 to 2019 linear trend is way too high in ages 85+, which probably explains why Ethical Skeptic got such low excess mortality in ages 85+:

library(data.table);library(ggplot2)

agecut=\(x,y)cut(x,c(y,Inf),paste0(y,c(paste0("-",y[-1]-1),"+")),T,F)

kim=\(x)ifelse(x>=1e3,ifelse(x>=1e6,paste0(x/1e6,"M"),paste0(x/1e3,"k")),x)

t=fread("http://sars2.net/f/wondermalignantyearly.csv")
t=t[year%in%2010:2023&cause%like%"MCD"]
t=merge(t[!cause%like%"COVID"],t[cause%like%"COVID",.(year,age,and=dead)],all=T)

# Some ages below 45 have less than 10 deaths with both MCD cancer and
# MCD COVID per year, so the deaths are suppressed when the results
# are grouped by single year of age, but here I retrieved deaths for
# ages 0-44 aggregated together to avoid the supression.
t[age<45,and:=0][age==0&year>=2020,and:=c(311,495,477,130)]

t=t[,.(year,age,dead=dead-nafill(and,,0))]

t=merge(t,fread("http://sars2.net/f/uspopdead.csv")[,dead:=NULL])

t=merge(t,t[year%in%2010:2019,.(year=2010:2023,trend=predict(lm(dead/pop~year),.(year=2010:2023))),age])

p=t[,.(dead=sum(dead),base=sum(trend*pop)),.(year,group=agecut(age,c(0,45,55,65,75,85)))]

p=p[p[year%in%2018:2019,.(year=2010:2023,linear=predict(lm(dead~year),.(year=2010:2023))),group],on=c("group","year")]

minmax=p[,.(min=min(dead,base,linear),max=max(dead,base,linear)),group]
p=merge(minmax[,.(group,min=max/minmax[,max(max/min)])],p)

xstart=2010;xend=2023

p=p[,.(group,year,min,y=c(dead,base,linear),z=rep(c("Actual deaths","2010-2019 trend in CMR by age","2018-2019 linear trend for raw deaths"),each=.N))]
p[,z:=factor(z,unique(z))]

ggplot(p,aes(year,y))+
facet_wrap(~group,ncol=2,dir="v",scales="free_y",strip.position="top")+
geom_vline(xintercept=c(2014.5,2019.5),color="gray85",linewidth=.25)+
geom_line(aes(color=z,linetype=z),linewidth=.3)+
geom_point(data=p[z=="Actual deaths"],stroke=0,size=.6,show.legend=F)+
geom_point(aes(y=min),size=0,stroke=0)+
labs(title="CDC WONDER: Yearly deaths with MCD malignant neoplasms but not MCD COVID",x=NULL,y=NULL)+
scale_x_continuous(limits=c(xstart-.5,xend+.5),breaks=seq(xstart-.5,xend+.5,.5),labels=c(rbind("",xstart:xend),""))+
scale_y_continuous(labels=kim)+
scale_color_manual(values=c("black","black","gray60"))+
scale_linetype_manual(values=c(1,2,2))+
coord_cartesian(clip="off",expand=F)+
theme(axis.text=element_text(size=7,color="black"),
  axis.text.x=element_text(angle=90,vjust=.5,hjust=1),
  axis.ticks.x=element_line(color=alpha("black",c(1,0))),
  axis.ticks=element_line(linewidth=.3,color="black"),
  axis.ticks.length=unit(.2,"lines"),
  axis.title=element_text(size=8),
  legend.background=element_blank(),
  legend.box.spacing=unit(0,"pt"),
  legend.key=element_blank(),
  legend.key.height=unit(10,"pt"),
  legend.key.width=unit(17,"pt"),
  legend.position="top",
  legend.margin=margin(2,,),
  legend.justification="left",
  legend.spacing.x=unit(2,"pt"),
  legend.text=element_text(size=7,vjust=.5),
  legend.title=element_blank(),
  panel.background=element_blank(),
  panel.border=element_rect(fill=NA,linewidth=.3),
  panel.spacing=unit(0,"pt"),
  panel.spacing.x=unit(3,"pt"),
  plot.margin=margin(5,5,5,5),
  plot.title=element_text(size=7.6,face=2,margin=margin(2,,2)),
  strip.background=element_blank(),
  strip.text=element_text(size=7))
ggsave("1.png",width=5,height=4.1,dpi=380*4)
system("magick 1.png -resize 25% 1.png")

My method of calculating the baseline implements a weak adjustment for the pull-forward effect in elderly age groups, because when I'm multiplying the trend in CMR with the population size, the baseline deaths are reduced from 2020 onwards because the population size has been lowered by COVID deaths. But my adjustment is usually not nearly as strong as Ethical Skeptic's PFE adjustment. And my adjustment doesn't have much effect in young age groups, because their population size wouldn't be affected that much even if they had double the normal number of deaths.

In the next plot where I plotted ASMR instead of raw deaths and I used the linear trend in ASMR in 2010 to 2019 as the baseline, ages 85+ got positive excess ASMR each year from 2020 to 2023:

library(data.table);library(ggplot2)

agecut=\(x,y)cut(x,c(y,Inf),paste0(y,c(paste0("-",y[-1]-1),"+")),T,F)

t=fread("http://sars2.net/f/wondermalignantyearly.csv")
t=t[cause%like%"MCD"&year%in%2010:2023]
t=merge(t[!cause%like%"COVID"],t[cause%like%"COVID",.(year,age,and=dead)],all=T)
t[age<45,and:=0][age==0&year>=2020,and:=c(311,495,477,130)]
t=t[,.(dead=sum(dead-nafill(and,,0))),.(year,age)]

pop=fread("http://sars2.net/f/uspopdead.csv")[,dead:=NULL]
t=merge(pop[year==2020,.(age,std=pop/sum(pop))],merge(t,pop))

p=t[,.(asmr=sum(dead/pop*std*1e5)),.(year,group=agecut(age,c(0,45,55,65,75,85)))]
p=merge(p,p[year%in%2010:2019,.(year=2010:2023,trend=predict(lm(asmr~year),.(year=2010:2023))),group])

minmax=p[,.(max=max(asmr,trend),min=min(asmr,trend)),group]
p=merge(minmax[,.(group,min=max/minmax[,max(max/min)])],p)

xstart=2010;xend=2023

ggplot(p,aes(x=year,y=asmr))+
facet_wrap(~group,ncol=2,dir="v",scales="free_y")+
geom_vline(xintercept=c(2014.5,2019.5),color="gray80",linewidth=.25)+
geom_line(linewidth=.3)+
geom_line(aes(y=trend),linetype=2,linewidth=.3)+
geom_point(stroke=0,size=.6)+
geom_point(aes(y=min),size=0,stroke=0)+
labs(title="CDC WONDER: ASMR per 100k for deaths with MCD malignant neoplasms but not MCD COVID",subtitle="The dashed line is a linear trend fitted against the ASMR in 2010 to 2019. The ASMR was calculated by single year of age so that the 2020 resident population estimates were used as the standard population."|>stringr::str_wrap(104),x=NULL,y=NULL)+
scale_x_continuous(limits=c(xstart-.5,xend+.5),breaks=seq(xstart-.5,xend+.5,.5),labels=c(rbind("",xstart:xend),""))+
coord_cartesian(clip="off",expand=F)+
theme(axis.text=element_text(size=7,color="black"),
  axis.text.x=element_text(angle=90,vjust=.5,hjust=1),
  axis.ticks=element_line(linewidth=.3,color="black"),
  axis.ticks.x=element_line(color=alpha("black",c(1,0))),
  axis.ticks.length=unit(.2,"lines"),
  axis.title=element_text(size=8),
  legend.position="none",
  panel.background=element_blank(),
  panel.border=element_rect(fill=NA,linewidth=.3),
  panel.spacing=unit(0,"pt"),
  panel.spacing.x=unit(3,"pt"),
  plot.margin=margin(5,5,5,5),
  plot.subtitle=element_text(size=7,margin=margin()),
  plot.title=element_text(size=7.4,face=2,margin=margin(2,,4)),
  strip.background=element_blank(),
  strip.text=element_text(size=7))
ggsave("1.png",width=5,height=4.4,dpi=380*4)
system("magick 1.png -resize 25% 1.png")

So basically my plots show that if you calculate the baseline in a way that accounts for the changing population size of age groups, then you don't necessarily even need to adjust for the pull-forward effect.

Non-COVID deaths from natural causes in ages 0 to 24

Ethical Skeptic posted this tweet: [https://x.com/EthicalSkeptic/status/1846676664589148455]

Excess Non-Covid Natural Cause Mortality Ages 0-24

Elevated 30% or 8-sigma

We broke out of a 2023 plateau. This rise appears to be related to annual winter virus dynamics, perhaps indicating immune system impairment in our youngest population.

Not good news for anyone. 😞

Ethical Skeptic didn't exclude deaths under ICD chapter R from his plot, so some of his excess deaths in 2024 and to a less extent late 2023 might be due to deaths from external causes where the cause has not yet been resolved. His plot included the first 15 weeks of 2024. He posted the plot in October 2024 but I don't know if he retrieved the data earlier.

In the plot below where I plotted R deaths separately from A to Q deaths, there was a slight elevation in deaths under R codes during the first 3 to 4 months of 2024 but it was not that massive. But I wasn't able to reproduce Ethical Skeptic's steady increase in excess deaths that started in 2021, and my excess deaths peaked in December 2022 and not in early 2024 like in his plot:

library(data.table);library(ggplot2);library(lubridate)

cul=\(x,y)y[cut(x,c(y,Inf),,T,F)]

t=fread("http://sars2.net/f/wondernatural.csv")[age<25]

pop=fread("http://sars2.net/f/uspopdeadmonthly.csv")
t=merge(t,pop[,.(pop=sum(pop)),.(date,age=cul(age,c(0,1,5,15,25)))])

t=t[,date:=as.Date(paste0(date,"-1"))]
t[,dead:=dead/days_in_month(date)]

months=sort(unique(t$date))
base=t[cause=="A-Q"&year(date)%in%2011:2019][,dead:=nafill(dead,,0)]
t=merge(base[,.(date=months,base=predict(lm(dead/pop~date),.(date=months))),.(age,cause)],t,all=T)
t[,base:=base*pop]

p=t[,.(base=sum(base),dead=sum(dead)),.(date,cause)]
p[,cause:=factor(cause,unique(cause))]
levels(p$cause)=c("A-Q (natural causes)","R (unresolved cause; symptoms and signs; abnormal findings)")

xstart=as.Date("2015-1-1");xend=as.Date("2025-1-1")
p=p[date>=xstart&date<=xend]
xbreak=seq(xstart,xend,"6 month")
xlab=c(rbind("",2015:2024),"")

cand=c(sapply(c(1,2,5),\(x)x*10^c(-10:10)))
ymax=p[,max(base,dead,na.rm=T)]
ystep=cand[which.min(abs(cand-ymax/7))]
ystart=0
yend=ystep*ceiling(ymax/ystep)
ybreak=seq(ystart,yend,ystep)

color=c("black","gray60")

ggplot(p,aes(x=date+14,y=dead,color=cause))+
geom_vline(xintercept=seq(xstart,xend,"year"),color="gray90",linewidth=.25)+
geom_hline(yintercept=c(ystart,yend),linewidth=.25,lineend="square")+
geom_vline(xintercept=c(xstart,xend),linewidth=.25,lineend="square")+
geom_line(linewidth=.3)+
geom_point(stroke=0,size=.6,show.legend=F)+
geom_line(aes(y=base),linetype=2,linewidth=.3)+
labs(title="CDC WONDER, ages 0 to 24: Monthly deaths by underlying cause of death divided by number of days in month"|>stringr::str_wrap(70),x=NULL,y=NULL)+
scale_x_continuous(limits=c(xstart,xend),breaks=xbreak,labels=xlab)+
scale_y_continuous(limits=c(ystart,yend),breaks=ybreak)+
scale_color_manual(values=color)+
coord_cartesian(clip="off",expand=F)+
theme(axis.text=element_text(size=7,color="black"),
  axis.ticks.x=element_line(color=alpha("black",c(1,0))),
  axis.ticks=element_line(linewidth=.25,color="black"),
  axis.ticks.length=unit(.2,"lines"),
  axis.title=element_text(size=8),
  legend.background=element_blank(),
  legend.justification=c(0,.5),
  legend.key=element_blank(),
  legend.key.height=unit(10,"pt"),
  legend.key.width=unit(17,"pt"),
  legend.position=c(0,.5),
  legend.spacing.x=unit(2,"pt"),
  legend.box.background=element_rect(fill="white",color="black",linewidth=.3),
  legend.margin=margin(-2,5,4,4,"pt"),
  legend.text=element_text(size=7,vjust=.5),
  legend.title=element_blank(),
  panel.background=element_blank(),
  plot.margin=margin(5,5,5,5),
  plot.title=element_text(size=7.6,face=2,margin=margin(2,,4)))
ggsave("1.png",width=4.3,height=2.8,dpi=450)

sub="\u00a0     This plot does not include deaths with underlying cause V01-Y89 (external causes of morbidity and mortality) or U00-U99 (codes for special purposes; includes COVID).
      The dashed line is a baseline for expected deaths, which was calculated by first doing a linear regression for CMR for each age group in 2011-2019, then multiplying the monthly population estimates of each age group by the value of the projected linear trend, and then adding together the results for each age group.
      Monthly resident population estimates by age are from www2.census.gov/
programs-surveys/popest/datasets/2020-2022/national/asrh/nc-est2022-alldata-r-file0{1..8}.csv, and from the corresponding files in the 2010-2020 directory. In order to get rid of a sudden jump in population size after the switch from the 2010-2020 to 2020-2022 estimates, the 2010-2020 estimates were multiplied by a slope so that they matched the 2020-2022 estimates at the point where the estimates were merged. The population estimates for 2024 are probably not accurate because they were derived by a linear extrapolation of past population estimates."

system(paste0("f=1.png;mar=30;w=`identify -format %w $f`;magick \\( $f -chop 0x10 \\) \\( -size $[w-mar*2]x -font Arial -interline-spacing -3 -pointsize 42 caption:'",sub,"' -gravity southwest -splice $[mar]x20 \\) -append 1.png"))

I think Ethical Skeptic's heavy adjustment for the pull forward effect is not warranted in young age groups, because do ages 0 to 24 really have that many people who were on the verge of death so they would've died soon anyway if they hadn't died of COVID? There weren't even that many COVID deaths in ages 0 to 24.

In my plot above where I didn't use a PFE-adjusted baseline, there seems to have been a reduced number of deaths in late 2020, but after that deaths from natural causes remained close to the baseline until June 2022 after which they remained consistently above the baseline. So I would say the inflection rather occurred in 2022 and not in early 2021, even though I don't know how to explain the inflection.

But anyway, the increase above the baseline doesn't seem that impressive impressive in my plot where the monthly number of excess deaths peaks at only about 15% above the baseline.

When Ethical Skeptic fits a seasonality-adjusted baseline against only two years of data, it's easy for him to get close to zero excess deaths during the fitting period because the baseline adapts to whatever the number of deaths happened to be during the fitting period, so therefore he gets a low standard deviation for his residuals during the fitting period and he subsequently gets high sigma values after his fitting period. But he would've probably gotten lower sigma values if he would've employed a longer fitting period or if wouldn't have adjusted his baseline for seasonality.

In a similar plot Ethical Skeptic posted on July 8th 2023 UTC, he plotted deaths up to week ending June 17th 2023. So in his old plot he only excluded approximately the last 3 weeks of data instead of about 18 weeks like in the newer plot (assuming that he retrieved the data from CDC WONDER shortly before he tweeted the plots). So therefore his old plot had an even more R99 deaths at the end of the x-axis: [https://x.com/EthicalSkeptic/status/1677513127468974081]

But in his newer plot many of the R99 deaths in 2023 had now been assigned under external causes, so there was no longer the huge increase in excess deaths in 2023: