Topic Modelling Media Coverage of Memory Conflicts

Ostensibly this is a blog about memory conflict. It has become more of a repository of script snippets and visualisations, but here I get back to my roots and apply topic modelling to the representation of memory in the Russian media. 

Topic models are discussed really well elsewhere, and rather superficially by me here. In my topic model for the Russian media over the period of 2003-2013 I found seven or eight topics about history and memory. One of them was clearly about Katyn and about Stalinist repression.
The terms dominating this topic were:


трагедия жертва память поляк семья гибель родственник репрессия расстрел расследование катастрофа власть ответственность реабилитация брат член пострадавший соболезнование годовщина (tragedy, victim, memory, Pole, family, death, relative, repression, execution [rasstrel], investigation, catastrophy, power [vlast'], responsibility, rehabilitation, brother, member, victim [postradavshi], commiseration, anniversary)


It was interesting to note how the language of Katyn was so present in the larger topic about Stalinist repressions (e.g. Pole, brother [the Kaczynski's were identical twins]). But in a way, this is perhaps not surprising. Stalinst crimes against Russians are not frequently written about by the media, while Katyn, due to international pressure from Poland, has for a number of reasons been news-worthy.


Very briefly on topic models and methods
This is what a topic looks like. To achieve this, topic modelling takes a collection of documents and attempts to split every word in every text into distinct topics. Each document is modelled as being made up of different proportions of topics, and each topic consists of a collection of words, which each have a probability of featuring in passages about the topic. Yes - it is a bit complicated. To do mine I followed Matthew Jockers 'secret recipe' for topic modelling: I took the root form of words, attempted to keep only nouns, and excluded nouns tagged as being people or place names, and calculated 500 topics. The motivation for this was twofold: firstly using someone else's schema saves me time, and secondly it means I can't cherry-pick results.


Katyn
Katyn is to my mind the most explosive Eastern European memory conflict in recent years. Katyn -  the symbolic site of Polish WW2 suffering at the hands of the Soviets - was the subject of an epic film by Andrzej Wajda, and long a thorn in Polish-Russian relations. If the release of Wajda's film caused an upset, that was nothing compared to President Kaczynski's plane crashing on the way to participate in the annual commemorations at Katyn in 2010. It was a tragedy of grand scale, and the irony that Kaczynski, the champion of the Polish campaign for greater international recognition of Katyn, should be killed in this way was lost on no one. It was almost inevitable that the incident would give rise to conspiracy theories of Russian involvement - a narrative frequently referred to as Katyn-2.

This is ground well-covered by now - [Shameless plug: consider for instance the Memory at War Project's book about Katyn, or hold on for my article on how Katyn was mobilised during Polish elections - coming to a journal near you in 2014]. The issue here is that this major international incident forced the Russian media to write about Katyn. Previously most texts printed in state-owned media had been reasonably evasive about what exactly happened at Katyn. This did not change - press agency reports featured the line 'President Kaczynski was in Smolensk to participate in a commemorative event (traurnie meropriiatie). These passive constructions are still favoured: consider this text from Komsomol'skaia Pravda earlier this year: 


'On Wednesday 10 April 2013 Poland and Russia remember the members of the Polish delegation who died in the planecrash three years ago. In April 2010 the Polish plane TU-154 crashed while attempting to land at the airfield 'Smolensk-Severnii'. All the passengers and members of the crew - 96 persons, including the president of the republic, Lech Kaczynskii, as well as polish politicians, religious and social figures (deiateli), died.' 



Curiously there is no mention of why the Polish delegation was flying to Smolensk. But, that's just the intro, right? Further on we are sure to read about the Polish officers, executed by the NKVD?

Not so much. The closest we get is the following:

'The Polish minister of Culture, Bogdan Zdroevskii, upon arriving in Smolensk to participate in commemorative events in 2012, for the first time spoke about the project to establish a memorial to the victims of the catastrophy near Smolensk.'

I could go on, but for now I encourage the reader to explore this independently, or take it on faith: one of the biggest differences between state-media and the Russian independent media was the willingness to print who had done what to whom at Katyn. For now, let's explore what topic modelling can tell  us about Russian media coverage of Katyn.

Topic distribution in texts about Katyn
I collected all the texts about Katyn in my database of Russian media sources. There were 140 texts printed in the independent media, compared to 330 in the state-owned sources. Considering that the state-owned publications are much larger, at a ratio of roughly 4:1, Katyn is relatively more frequently written about in independent media, but the differences is not dramatic. 

After calculating 500 topic models using MALLET I manually labelled the topics. Because no text is made up of a single topic, we can identify the most topics most commonly[1] used to discuss Katyn. These are:



These topics are calculated based on all the texts. The main topic appears to be about Katyn and Stalinist repressions, so it is no surprise it should feature strongly. The other topics, though, are generalizable to many other subjects, but we can understand why they have been identified in texts about Katyn: president Kaczynski’s died in a plane crash on the way to commemorations at Katyn; Andrzej Wajda’s film about Katyn was nominated for an Oscar, while more generally Katyn features in debates about the Second World War. Indeed, some Polish politicians have demanded Katyn be legally recognised as genocide. The final topic, labelled ‘narod and power’ reflects intellectual debates questioning the ideological motivation of political elites.

The reader should bear in mind  here, that these labels were calculated based on the entire data-base of text, and not selected with the Katyn example in mind. 

Politically motivated subject selection
Let’s zoom in a bit further by adding a few more topics and a division by political orientation:

As it turns out, the more liberal and more pro-Kremlin newspapers feature uneven topic selection: the state-owned newspapers’[2][3] coverage about Katyn is overwhelmingly about the plane crash, and about Wajda’s film – as can be seen in the high proportion of texts featuring the language of film making, film festivals, and prize awards. Conversely, the liberal sources write more about Katyn in the context of Stalinist repressions. The starkest difference, though, is in the category ‘the narod and power’ – pointing to the role of Katyn in debates about 'the nation', ideology and politics. Topic selection, then, points to a divide between intellectual, ideological, and to a lesser degree historical subjects in independent media, to cultural and current affairs subjects in state-owned media.

In this way, combining keywords and topic models allows us to identify the type of discourses mobilised in conjunction with a particular topic. This example scratches the surface. We could have looked at example texts from the different categories. The main point here is to show how we can identify topic selection for a given subject, and contrast the proportions of each topic based on different criteria, as well as to hint at how topic modelling can identify memory discourses.

In the next post I wade into a debate about the best way to visualise the topic model as a whole.



[1] Ranked as most frequent compared to proportion in entire dataset
[2] State-owned: Izvestiia, Rossiiskaia Gazeta. Independent: Gazeta.ru, Novaia Gazeta. 
[3] I know these binary categories are not perfect. But imperfect comparisons trump no comparison. 

2 comments:

  1. KARNA RASA HATI YANG GEMBIRA BERKAT BANTUAN AKI SOLEH
    MAKANYA SENGAJA NAMA BELIAU SAYA CANTUNKAN DI INTERNET !!!

    assalamualaikum wr, wb, saya IBU PUSPITA WATI saya Mengucapkan banyak2
    Terima kasih kepada: AKI SOLEH
    atas nomor togelnya yang kemarin AKI berikan "4D"
    alhamdulillah ternyata itu benar2 tembus AKI
    dan berkat bantuan AKI SOLEH saya bisa melunasi semua hutan2…
    orang tua saya yang ada di BANK BRI dan bukan hanya itu AKI alhamdulillah,
    sekarang saya sudah bisa bermodal sedikit untuk mencukupi kebutuhan keluarga saya sehari2.
    Itu semua berkat bantuan AKI SOLEH sekali lagi makasih banyak ya, AKI
    yang ingin merubah nasib
    seperti saya...?
    SILAHKAN GABUNG SAMA AKI SOLEH No; { 082-313-336-747 }

    Sebelum Gabung Sama AKI Baca Duluh Kata2 Yang Dibawah Ini
    Apakah anda termasuk dalam kategori di bawah ini...!!
    1: Di kejar2 tagihan hutang..
    2: Selaluh kalah dalam bermain togel
    3: Barang berharga sudah
    terjual buat judi togel..
    4: Sudah kemana2 tapi tidak
    menghasilkan, solusi yang tepat..!
    5: Sudah banyak dukun ditempati minta angka ritual blom dapat juga,
    satu jalan menyelesaikan masalah anda..
    Dijamin anda akan berhasil
    silahkan buktikan sendiri
    Atau Chat/Tlpn di WhatsApp (WA)
    No WA Aki : 082313336747

    TERIMA KASIH YANG PUNYA
    ROOM ATAS TUMPANGANYA SALAM KOMPAK SELALU
    "KLIK DISINI BOCORAN TOGEL SGP HK SDY DAN DLL"

    ReplyDelete
    Replies
    1. KARNA RASA HATI YANG GEMBIRA BERKAT BANTUAN AKI SOLEH
      MAKANYA SENGAJA NAMA BELIAU SAYA CANTUNKAN DI INTERNET !!!

      assalamualaikum wr, wb, saya IBU PUSPITA WATI saya Mengucapkan banyak2
      Terima kasih kepada: AKI SOLEH
      atas nomor togelnya yang kemarin AKI berikan "4D"
      alhamdulillah ternyata itu benar2 tembus AKI
      dan berkat bantuan AKI SOLEH saya bisa melunasi semua hutan2…
      orang tua saya yang ada di BANK BRI dan bukan hanya itu AKI alhamdulillah,
      sekarang saya sudah bisa bermodal sedikit untuk mencukupi kebutuhan keluarga saya sehari2.
      Itu semua berkat bantuan AKI SOLEH sekali lagi makasih banyak ya, AKI
      yang ingin merubah nasib
      seperti saya...?
      SILAHKAN GABUNG SAMA AKI SOLEH No; { 082-313-336-747 }

      Sebelum Gabung Sama AKI Baca Duluh Kata2 Yang Dibawah Ini
      Apakah anda termasuk dalam kategori di bawah ini...!!
      1: Di kejar2 tagihan hutang..
      2: Selaluh kalah dalam bermain togel
      3: Barang berharga sudah
      terjual buat judi togel..
      4: Sudah kemana2 tapi tidak
      menghasilkan, solusi yang tepat..!
      5: Sudah banyak dukun ditempati minta angka ritual blom dapat juga,
      satu jalan menyelesaikan masalah anda..
      Dijamin anda akan berhasil
      silahkan buktikan sendiri
      Atau Chat/Tlpn di WhatsApp (WA)
      No WA Aki : 082313336747

      TERIMA KASIH YANG PUNYA
      ROOM ATAS TUMPANGANYA SALAM KOMPAK SELALU
      "KLIK DISINI BOCORAN TOGEL SGP HK SDY DAN DLL"

      Delete