
In this post I outline how count data may be modelled using a negative binomial distribution in order to more accurately present trends in time series count data than using linear methods. I also show how to use ANOVA to identify the point at which one model gains explanatory power, and how confidence intervals may be calculated and plotted around the predicted values. The resulting illustration gives a robust visualisation of how the Beslan Hostage crisis has taken on features of a memory event
Recently I wrote up a piece about quantifying memory and news, and proposed that two distinct linear models might be the way to go about it. However, the problem with linear models is they by their nature don't take into account the ways in which trends may be non-linear.They also lead to nonsense predictions, such as negative values.
Generally, then, linear models should be avoided when mapping count data. What are the alternatives? Typically, a Poisson distribution would be ideal way to capture the probability of a clustering of observations being non-random. A feature of the Poisson distribution is that it assumes the sample mean equals the sample variance; this is very frequently violated when dealing with news data, as a story will have a small number of large values, followed by a large number of small values, resulting in a low mean and a high variance. Instead a negative binomial distribution may be used, which takes a value theta specifying the degree to which the variance and mean are unequal. Estimates provided by a negative binomial model are the same as Poisson estimates, but probability values tend to be more conservative.
One strength of R is its ability to model so called generalised linear models. The negative binomial distribution comes from the package MASS; theta may be calculated using glm.nb:
library(MASS)
fmla <- as.formula("count~date+e_news+a1+a2+elections")
theta <- glm.nb(fmla, data = mdb)$theta
results <- glm(fmla, data = mdb, family = negative.binomial(theta))
The above results may be considered in ANOVA to identify which variables contribute significantly to the model. anova(results)
Analysis of Deviance Table
Model: Negative Binomial(10.85), link: log
Response: count
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev
NULL 96 994
date 1 731 95 263
e_news 1 69 94 194
a1 1 52 93 142
a2 1 1 92 141
elections 1 12 91 129
Details about the coding of the variables, and the logic behind models contrasting stories as news or memory events may be found here.From the ANOVA results I can identify which group of variables are contributing the most substantially to describing the data distribution: memory variables, or news variables. As I am interested in distributions where memory effects are apparent, and these develop only over time, I loop through the data deleting the first month's values, until such a time as there is no data left, or the memory variables have greater explanatory power than the news estimator:
mdb2 <- mdb #copy the data
n <- 0
news <- 0
memory <- 0
while (n == 0) {
aov3 <- glm(fmla, data = mdb2, family = negative.binomial(glm.nb(fmla, data = mdb2)$theta))
anova(aov3)
t <- data.frame(t(anova(aov3)[2]))
news <- sum(t[, grep("date|news", colnames(t))])
memory <- sum(t[, grep("a1|a1", colnames(t))])
if (news > memory) {
n <- 0
mdb2 <- mdb2[mdb2$date > min(mdb2$date), ]
} else (n <- 1)
}
From the negative binomial model predictions and confidence intervals for the period identified as of potential memory significance may be created. The 95% confidence interval either side of the predicted value is calculated by multiplying the standard error by 1.96. I want to plot the whole data, but only predicted values for the period identified as significant, so next I removed predictions for the data up until the period with memory potential. Finally I created a data frame containing the interval for which no estimates were calculated (this will be used to blur out data in ggplot).estimate <- (predict.glm(aov3, newdata = mdb, type = "response", se.fit = T))
mdb$estimate <- estimate$fit
mdb$se <- estimate$se.fit
mdb$estimate[mdb$date < min(mdb2$date)] <- NA
mdb$se[mdb$date < min(mdb2$date)] <- NA
mdb$date <- as.Date(mdb$date)
mdb$upper <- mdb$estimate + (1.96 * mdb$se)
mdb$lower <- mdb$estimate - (1.96 * mdb$se)
mdb$lower[mdb$lower < 0] <- 0
rect <- data.frame(min(mdb$date) - months(1), min(as.Date(mdb2$date)))
colnames(rect) <- c("one", "two")
In the plot below I visualise the square root of articles about the Beslan hostage tragedy in the Russian press. The square root is chosen to prevent the high initial interest from obscuring the trend that emerged over time. To create the plot I - add a ribbon representing the confidence interval
- plot the observed values
- add a dotted line representing the fitted values
- edit the formatting and add a title
- add a shaded rectangle over the area I wish to ignore:
ggplot(mdb, aes(date, sqrt(count), ymax = sqrt(upper), ymin = sqrt(lower)),
environment = environment()) + geom_ribbon(colour = "red", fill = "light grey",
alpha = 0.4, linetype = 2) + geom_point(colour = "dark green", size = 3,
alpha = 0.8) + geom_line(aes(date, sqrt(estimate))) + theme_bw() + ggtitle(paste0("Regression graph for the Beslan Hostage crisis, exhibiting possible features of memory event since ",
as.Date(min(mdb2$date)))) + geom_rect(aes(xmin = rect$one, xmax = rect$two,
ymin = -Inf, ymax = +Inf), fill = "light grey", colour = "grey", linetype = 2,
alpha = 0.015)
anova(aov3)
Analysis of Deviance Table
Model: Negative Binomial(6.449), link: log
Response: count
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev
NULL 70 135.3
date 1 19.18 69 116.1
e_news 1 0.73 68 115.4
a1 1 24.54 67 90.9
a2 1 0.65 66 90.2
elections 1 0.29 65 89.9
Notice in the above table how the anniversaries variables exceed the explanatory power of the news and date variables. This indicates that by the end of 2006 Beslan was increasingly featuring as a memory event and less as a news story. Also notice how the remaining deviance is quite large - this model apparently fits the data less well than the model for the entire data (it explained 85% of the deviance), but this is due to the original estimate being biased by the accurate prediction of a few outliers. -->
Casinos are fun filled with online gambling.
ReplyDeleteRuby888 It is a fun and gambling online casino that is famous for its gambler's favorite gambler because it is on the gambling club website. This online casino is open to online gamblers who are online gamers who are willing to gamble online, whether you like playing card games, dice games, roulette. Or even online slot games, gamblers can join in the fun and play online gambling games with the players around the world.
Nowadays, online gambling games are based on just the internet. Players are able to play all online casino games that the bettor wants to play. And a major web casino. GClub allows gamblers from all over the world to play online gambling games simultaneously. Just a member of the G-Club website. Our casino is only able to play online gambling games that you want to play.
If the player is thinking to play casino games online, do not forget to subscribe to the gambling club. Because of the fun of playing online games are waiting for the player to prove that the player is thinking about Poipet. Casinos and Fun Do not forget to opt-in to the website IBCbet
KARNA RASA HATI YANG GEMBIRA BERKAT BANTUAN AKI SOLEH
DeleteMAKANYA SENGAJA NAMA BELIAU SAYA CANTUNKAN DI INTERNET !!!
assalamualaikum wr, wb, saya IBU PUSPITA WATI saya Mengucapkan banyak2
Terima kasih kepada: AKI SOLEH
atas nomor togelnya yang kemarin AKI berikan "4D"
alhamdulillah ternyata itu benar2 tembus AKI
dan berkat bantuan AKI SOLEH saya bisa melunasi semua hutan2…
orang tua saya yang ada di BANK BRI dan bukan hanya itu AKI alhamdulillah,
sekarang saya sudah bisa bermodal sedikit untuk mencukupi kebutuhan keluarga saya sehari2.
Itu semua berkat bantuan AKI SOLEH sekali lagi makasih banyak ya, AKI
yang ingin merubah nasib
seperti saya...?
SILAHKAN GABUNG SAMA AKI SOLEH No; { 082-313-336-747 }
Sebelum Gabung Sama AKI Baca Duluh Kata2 Yang Dibawah Ini
Apakah anda termasuk dalam kategori di bawah ini...!!
1: Di kejar2 tagihan hutang..
2: Selaluh kalah dalam bermain togel
3: Barang berharga sudah
terjual buat judi togel..
4: Sudah kemana2 tapi tidak
menghasilkan, solusi yang tepat..!
5: Sudah banyak dukun ditempati minta angka ritual blom dapat juga,
satu jalan menyelesaikan masalah anda..
Dijamin anda akan berhasil
silahkan buktikan sendiri
Atau Chat/Tlpn di WhatsApp (WA)
No WA Aki : 082313336747
TERIMA KASIH YANG PUNYA
ROOM ATAS TUMPANGANYA SALAM KOMPAK SELALU
"KLIK DISINI BOCORAN TOGEL SGP HK SDY DAN DLL"
KARNA RASA HATI YANG GEMBIRA BERKAT BANTUAN AKI SOLEH
DeleteMAKANYA SENGAJA NAMA BELIAU SAYA CANTUNKAN DI INTERNET !!!
assalamualaikum wr, wb, saya IBU PUSPITA WATI saya Mengucapkan banyak2
Terima kasih kepada: AKI SOLEH
atas nomor togelnya yang kemarin AKI berikan "4D"
alhamdulillah ternyata itu benar2 tembus AKI
dan berkat bantuan AKI SOLEH saya bisa melunasi semua hutan2…
orang tua saya yang ada di BANK BRI dan bukan hanya itu AKI alhamdulillah,
sekarang saya sudah bisa bermodal sedikit untuk mencukupi kebutuhan keluarga saya sehari2.
Itu semua berkat bantuan AKI SOLEH sekali lagi makasih banyak ya, AKI
yang ingin merubah nasib
seperti saya...?
SILAHKAN GABUNG SAMA AKI SOLEH No; { 082-313-336-747 }
Sebelum Gabung Sama AKI Baca Duluh Kata2 Yang Dibawah Ini
Apakah anda termasuk dalam kategori di bawah ini...!!
1: Di kejar2 tagihan hutang..
2: Selaluh kalah dalam bermain togel
3: Barang berharga sudah
terjual buat judi togel..
4: Sudah kemana2 tapi tidak
menghasilkan, solusi yang tepat..!
5: Sudah banyak dukun ditempati minta angka ritual blom dapat juga,
satu jalan menyelesaikan masalah anda..
Dijamin anda akan berhasil
silahkan buktikan sendiri
Atau Chat/Tlpn di WhatsApp (WA)
No WA Aki : 082313336747
TERIMA KASIH YANG PUNYA
ROOM ATAS TUMPANGANYA SALAM KOMPAK SELALU
"KLIK DISINI BOCORAN TOGEL SGP HK SDY DAN DLL"
I'am glad to read the whole content of this blog and am very excited,Thank you for sharing good topic.
ReplyDeleteดูหนังออนไลน์
Treasurebox provide you all the items which you will be get online in a single click in Auckland Newzealand. We provide you quality items and all types of outdoor furniture at lowest price.
ReplyDeleteYou can write an article very much. Thank you for making a great article to read.
ReplyDeleteSa gaming สมัคร
Get all the latest clicksud seriale online of clicksud and all the seriale online daily on this blog.
ReplyDeleteAmazing Article ! I would like to thank you for the efforts you had made for writing this awesome article.
ReplyDeleteThanks for sharing such a nice info.I hope you will share more information like this. please keep on sharing!
internship in chennai
internship in chennai for cse
internship for mba in chennai
internship in chennai for hr
internship in chennai for mba
companies for internship in chennai
internship in chennai for ece
paid internship in chennai
internship in chennai for biotechnology
internship in chennai for b.com students
IOT Training in Coimbatore | Best IOT Training institute in Coimbatore | Internet of Things Course Training in Coimbatore | Best Institutes for IoT Internet of Things Training in Coimbatore | IOT Training Course in Coimbatore | IOT Training in saravanampatti
ReplyDeleteAndroid Training Institute in Coimbatore Best Android Training Institutes in Coimbatore | Android Training Course in Coimbatore | Mobile App Training Institute in Coimbatore | Android Training Institutes in Saravanampatti | Online Android Training Institutes in Coimbatore | Mobile Development Training Institute in Coimbatore
Help full post, lots of information
ReplyDeletegazetted officer
z or r twice
what is isp
space complexity
ng-focus
unexpected token o in json at position 1
do a barrel roll 20 times
cannot set headers after they are sent to the client
how to hack any instagram account 100% working
blink html google trick
Amazing Article, Really useful information to all So, I hope you will share more information to be check and share here.
ReplyDeleteinplant training
inplant training chennai
inplant training in chennai
inplant training at chennai
inplant training
inplant training chennai
inplant training in chennai
inplant training at chennai
inplant training
Inplant Training for cse
Doramasmp4 esperamos que la estés pasando genial viendo. DoramasMP4 siempre será el primero en tener el episodio, así que por favor marque y agréguenos en Facebook para actualizarlo.
ReplyDeletehttps://doramas-sub-espanol.jimdosite.com/
I always appreciated your work, your creation is definitely unique. Great job
ReplyDeleterasmussen student portal
Hvala na ispravci. ažuriramo turske serije sa prevodom. gledajte online turske serije online sa prevodom
ReplyDeletehttps://turskeserije.org/
cryptoanime is honest website and there reviews recommendation is best
ReplyDelete