Working with Russian characters can be mind-numbingly frustrating. This is true for R, as for other applications, so below I've written out the my top five tricks for making Russian inputs work in R; i believe they should be transferable to most other languages.
Having forced any number of programs to accept Russian characters in the past, I have come to appreciate UTF-8 as the only sensible encoding system for non-latin script. R operates with UTF-8 as default, so using Russian or other foreign scripts should be straightforward, right?
Wrong. There is no end to the annoyance experienced when attempting to import data into R by appending
encoding = "utf-8"
to the end of every line. Sometimes it will work, but rarely both in the characters displayed on screen, and those output by R. So, annoyingly, characters formatted as Russian in a data.frame will magically appear as gobbledygook when written to an output file, or even a plot. Infuriating. The solution is brutal in its simplicity - don't rely on R's UTF-8 to display characters for you, instead start sessions in the appropriate language, using the line Sys.setlocale("LC_CTYPE", "russian")
Now that solves all the problems, right? Almost. Often when scraping data or when inputting data (e.g. through Shiny apps), strings need to be formatted as UTF-8 as follows:
>Enoding(annoyingMisbehavingString) <- "UTF-8"
Be careful with this one, though. Encoding text that already is utf-8 as utf-8 will not work well. Finally, if you ever want to save .R scripts with non-Latin characters in them, do so with care. When you reopen the files the strings will be scrambled, for some reason not quite clear to me. If you use the script as a source file, any command reliant on the non-Latin string (e.g. grep) will return errors or no hits. The solution is to use a different function all together:
eval(parse("iPolarCalc.R", encoding = "UTF-8"))
And that's about it. For Windows systems at least. ======
Update: 06/02/2013
Except encoding issues never really end. Enter the latest problem:
displaying cyrillic characters with Knitr.
Knitr is great. It will take R code and combine it with markdown, allowing you to create ready formatted webpages with calculations and graphics created on the fly from R. But it doesn't work properly with non ascii characters. The solution: Don't use R-studio's built in knitr to html (ctrl-shift-h). Instead save the rmd file in your working directory, and run these lines:
knit("test.rmd", encoding = "utf-8")
markdownToHTML("test.md", "test.html")
browseURL(paste("file://", file.path(getwd(), "test.html"), sep = ""))
-->=====
Update 21/11/2013
Here's my latest discovery: you know when you have foreign characters in a url? Chances are you didn't notice, because most browsers can handle this. Paste this into your browser, and you will get search results for the Katyn massacre:
https://www.google.co.uk/search?q=катынь
However, this is all smoke and mirrors: paste the same string into notepad, and you will see this:
https://www.google.co.uk/search?q=%D0%BA%D0%B0%D1%82%D1%8B%D0%BD%D1%8C
What does this have to do with R? well, we need some way to convert the former to the latter if we want to access URLs with foreign characters in. To do that, use curlEscape() from the rCurl package:
> curlEscape("катынь")
[1] "%D0%BA%D0%B0%D1%82%D1%8B%D0%BD%D1%8C"
Perfect.
I have an SPSS file in Russian encoding, apparently it's 1251, and I can't read it either in R or in SPSS 21.
ReplyDeleteSys.setlocale("LC_CTYPE", "russian") doesn't work on my Mac machine for some reason. Is there any other way of solving this issue? Or, perhaps, there is something that I'm not doing right?
Hi Valery, the short answer is I don't know, because I don't know how to use a mac. But this post seems to have something that may be of interest:
Deletehttp://stackoverflow.com/questions/17031002/get-weekdays-in-english-in-rstudio
I would guess you are looking for "ru_RU.UTF-8". Best, R
Sys.setlocale("LC_CTYPE", "ru_RU.UTF-8")
Deleteworked like a charm!
Hi Rolf, I have to deal with Vietnamese data and I would like to set locale in R to be "en_US.UTF-8" but it doesn't work.
DeleteMy code is: Sys.setlocale(category="LC_ALL", locale = "en_US.UTF-8")
However, the warning message in console is:
In Sys.setlocale(category = "LC_ALL", locale = "en_US.UTF-8") :
OS reports request to set locale to "en_US.UTF-8" cannot be honored
And the locale did not change to "en_US.UTF-8"
I have tried several ways but nothing worked. Could you help me to set my locale to be "en_US.UTF-8"?
data_heatmap can't handle Russian letters as well, see https://github.com/yihui/knitr/issues/436#issuecomment-32781891
ReplyDeleteOh, it's bad, feel like back in 1999. Windows did not want to change my world after seeing Sys.setlocale("LC_CTYPE", "russian")
Hi Rolf! Thank you very much for your quite useful advices on dealing with Russian characters in R programming language. You saved a lot of time on this matter. But never knows what to expect from text in Cyrillic.
ReplyDeleteThank you, this is great!
ReplyDeleteThanks a lot! Solved all my problems)
ReplyDeleteI simply wanted to thank you so much again. I am not sure the things that I might have gone through without the type of hints revealed by you regarding that situation.
ReplyDeleteBlue Prism Training in Bangalore
Feel wild excitement? Spend it in our online casino. Excellent play roulette online Feel what money is with us.
ReplyDeleteRather old, and spammed thread, but I wonder about "Encoding text that already is utf-8 as utf-8 will not work well."
ReplyDeleteI have UTF-8 encoded text which is not recognised as such, Encoding(var) gives me "unknown". Encoding(var) <- "UTF-8" does work, and the text is displayed as intentended. (On a 1252 locale, that is.)
Amazing article. Your blog helped me to improve myself in many ways thanks for sharing this kind of wonderful informative blogs in live. I have bookmarked more article from this website. Such a nice blog you are providing.
ReplyDeletelg mobile service center in velachery
Excellent knowledge shared, Thanks to you...
ReplyDeleteFor more details Click Here- I Digital Academy
Very nice...
ReplyDeletefreeinplanttrainingcourseforECEstudents
internship-in-chennai-for-bsc
inplant-training-for-automobile-engineering-students
freeinplanttrainingfor-ECEstudents-in-chennai
internship-for-cse-students-in-bsnl
application-for-industrial-training
ReplyDeleteThis has so many errors. Beyond trying to repair if you don't know how to do this.thanks a lot guys.
Ai & Artificial Intelligence Course in Chennai
PHP Training in Chennai
Ethical Hacking Course in Chennai Blue Prism Training in Chennai
UiPath Training in Chennai
ReplyDeleteNice Post...I have learn some new information.thanks for sharing. lovely page.
Ai & Artificial Intelligence Course in Chennai
PHP Training in Chennai
Ethical Hacking Course in Chennai Blue Prism Training in Chennai
UiPath Training in Chennai
I recently came across your article and have been reading along. I want to express my admiration of your writing skill and ability to make readers read from the beginning to the end. I would like to read newer posts and to share my thoughts with you.
ReplyDeleteMulesoft training in bangalore
Mulesoft class in bangalore
learn Mulesoft in bangalore
places to learn Mulesoft in bangalore
Mulesoft schools in bangalore
Mulesoft reviews in bangalore
Mulesoft training reviews in bangalore
Mulesoft training in bangalore
Mulesoft institutes in bangalore
Mule soft trainers in bangalore
learning Mule soft in bangalore
where to learn Mule soft in bangalore
best places to learn Mule soft in bangalore
top places to learn Mule soft in bangalore
Mule soft training in bangalore india
Very interesting blog Thank you for sharing such a nice and interesting blog and really very helpful article.
ReplyDeleteSap Solution Manager Online Training
Sap Solution Manager Classes Online
Sap Solution Manager Training Online
Online Sap Solution Manager Course
Sap Solution Manager Course Online
Dell Bhoomi online training
ReplyDeleteDot Net online training
ETL Testing online training
This comment has been removed by the author.
ReplyDeleteThanks for sharing this wonderful information. The trends you have mentioned are really great. I would love to come back again on your website to have a look at some more wonderful posts. In the mean while you can check my website too:
ReplyDeleteDigital Marketing Courses near me
Wow, amazing post! Really engaging, thank you.
ReplyDeleteMule soft training in bangalore
Great Blog to read, It gives more useful information. Thank lot.
ReplyDeleteBest Tableau Training Institute in Pune
very interesting to read
ReplyDeletebest-angular-training in chennai |
I would love to see your next update. Nice Post! Thank you.
ReplyDeleteLead Recycler
ReplyDeleteI really appreciate your valuable efforts and it was very helpful for me. Thank you so much...!
Emergency Protective Order
Preliminary Protective Order
Very innovative post! This post is very interesting and thanks for sharing it with us...
ReplyDeleteDivorce Attorneys Fairfax va
Divorce Attorney in Fairfax
Nice post! it was too good and thank you for sharing it.
ReplyDeletepvc foam board manufacturers in kerala
Best Indian HNI database at Best Price In india
ReplyDeleteyour blog was impressive keep positng it Best digital marketing institute in laxmi nagar
ReplyDelete