- Exploratory Data Analysis
- Corpus and linguistic annotation
- Crash course for R: Visualization
謝舒凱 Graduate Institute of Linguistics, NTU
參考課程教材
Good annotations support good applications
ngramr
: R package to query the Google Ngram Viewer require(ngramr)
require(ggplot2)
ggram(c("monarchy", "democracy"), year_start = 1500, year_end = 2000,
corpus = "eng_gb_2012", ignore_case = TRUE,
geom = "area", geom_options = list(position = "stack")) +
labs(y = NULL)
# rownames(corpuses)
ggram(c("情人", "太太"), year_start = 1500, year_end = 2000,
corpus = "chi_sim_2012", ignore_case = TRUE,
geom = "area", geom_options = list(position = "stack")) +
labs(y = NULL)
tm
套件的 Corpus()
差別?第一層意義:
第二層意義:
xml
, json
ggplot2
, googleVis
, rChart
等等套件。 plot(cars)
googleVis
的 gvisGeoMap()
我們想要利用視覺化技術探勘文本中的訊息、趨勢、模式變化。例如
基本的可能
A word cloud is simply a graphical representation in which the size of the font used for the word corresponds to its frequency relative to others. Bigger the size of the word, higher is its frequency.
tm
, wordcloud
, RColorBrewer
就可以做到。
#windowsFonts(JP = windowsFont("MS Mincho"))
#par(family = "JP")
par(family = "STKaiti")
wordcloud(doc, scale= c(2,0.5))
還有改符號的
tm
package provides us with the TermDocumentMatrix()
function that constructs a term document matrix:colnames(data) <- c("bush","obama")
comparison.cloud(data,max.words = 250, title.size = 2,colors = brewer.pal(3,"Set1"))
Visualization of textual data (Ludovic Lebart and Marie Piron)
(預告) 期中 Hackathon 會是以政治領域資料為主的實作。
Graham Wilcock. 2009. Introduction to Linguistic Annotation and Text Analytics. Atmajitsinh Gohil. 2015. R Data Visualization Cookbook.