Hungarian physician Dr. Ignaz Semmelweis worked at the Vienna General Hospital with childbed fever patients. Childbed fever is a deadly disease affecting women who have just given birth, and in the early 1840s, as many as 10% of the women giving birth died from it at the Vienna General Hospital.

Dr.Semmelweis discovered that it was the contaminated hands of the doctors delivering the babies, and on June 1st, 1847, he decreed that everyone should wash their hands, an unorthodox and controversial request; nobody in Vienna knew about bacteria.

in this Markdown we will reanalyze the data that made Semmelweis discover the importance of handwashing and its impact on the hospital.

Datasets

Inspecting Yearly Dataset

Yearly dataset contains the number of women giving birth at the two clinics at the Vienna General Hospital between the years 1841 and 1846.

yearly = read.csv("yearly_deaths_by_clinic.csv", sep = ",", header =  TRUE)
head(yearly)
##   year births deaths   clinic
## 1 1841   3036    237 clinic 1
## 2 1842   3287    518 clinic 1
## 3 1843   3060    274 clinic 1
## 4 1844   3157    260 clinic 1
## 5 1845   3492    241 clinic 1
## 6 1846   4010    459 clinic 1

Inspecting Monthly Dataset

Monthly dataset contains data from ‘Clinic 1’ of the hospital where most deaths occurred.

monthly = read.csv("monthly_deaths.csv", sep = ",", header = TRUE)
head(monthly)
##         date births deaths
## 1 1841-01-01    254     37
## 2 1841-02-01    239     18
## 3 1841-03-01    277     12
## 4 1841-04-01    255      4
## 5 1841-05-01    255      2
## 6 1841-06-01    200     10

Calculating Proportion Deaths

Yearly Proportion Deaths

yearly  = yearly %>%
  mutate(proportion_deaths = deaths/births)
head(yearly)
##   year births deaths   clinic proportion_deaths
## 1 1841   3036    237 clinic 1        0.07806324
## 2 1842   3287    518 clinic 1        0.15759051
## 3 1843   3060    274 clinic 1        0.08954248
## 4 1844   3157    260 clinic 1        0.08235667
## 5 1845   3492    241 clinic 1        0.06901489
## 6 1846   4010    459 clinic 1        0.11446384

Monthly Proportion Deaths

monthly = monthly %>%
  mutate(proportion_deaths = deaths / births)
head(monthly)
##         date births deaths proportion_deaths
## 1 1841-01-01    254     37       0.145669291
## 2 1841-02-01    239     18       0.075313808
## 3 1841-03-01    277     12       0.043321300
## 4 1841-04-01    255      4       0.015686275
## 5 1841-05-01    255      2       0.007843137
## 6 1841-06-01    200     10       0.050000000

Visualizing Yearly proportion deaths

ggplot(yearly, aes(x= year , y= proportion_deaths, color = clinic )) + geom_line() + 
  labs(x= "Year", y= "Proportion Deaths")

Visualiing Monthly proportion deaths

ggplot(monthly, aes(x= as.Date(date) , y= proportion_deaths, group = 1)) + geom_line() + 
  labs( x= "Date", y = "Proportion Deaths") + 
  scale_x_date(date_labels = "%Y-%m", date_breaks = "12 month")

when the handwashing has started

handwashing_start = as.Date("1847-06-01")
monthly = monthly %>%
  mutate(handwashing_started = date >= handwashing_start)
head(monthly)
##         date births deaths proportion_deaths handwashing_started
## 1 1841-01-01    254     37       0.145669291               FALSE
## 2 1841-02-01    239     18       0.075313808               FALSE
## 3 1841-03-01    277     12       0.043321300               FALSE
## 4 1841-04-01    255      4       0.015686275               FALSE
## 5 1841-05-01    255      2       0.007843137               FALSE
## 6 1841-06-01    200     10       0.050000000               FALSE
ggplot(monthly, aes(x= as.Date(date) , y= proportion_deaths, group = 1, color = handwashing_started)) + geom_line() + 
  labs( x= "Date", y = "Proportion Deaths") + 
  scale_x_date(date_labels = "%Y-%m", date_breaks = "12 month")

Calculating the mean proportion of deaths before and after handwashing

monthly_summary = monthly %>%
  group_by(handwashing_started) %>%
  summarise(mean_proportion_deaths = mean(proportion_deaths))
monthly_summary  
## # A tibble: 2 × 2
##   handwashing_started mean_proportion_deaths
##   <lgl>                                <dbl>
## 1 FALSE                               0.105 
## 2 TRUE                                0.0211