- Corpus and NLP for Text Analytics
- Crash course for R: Regular Expression
謝舒凱 Graduate Institute of Linguistics, NTU
nltk
)BUT it is possible to use libraries written in those lower-level and hence faster languages, while writing your code in R and taking advantage of its functional programming style and its many other libraries for data analysis.
見 nlp.Rmd
# works from CRAN !
install.packages("coreNLP")
# wget http://nlp.stanford.edu/software/stanford-corenlp-full-2015-04-20.zip
download.file("http://nlp.stanford.edu/software/stanford-corenlp-full-2015-04-20.zip")
unzip("stanford-corenlp-full-2015-04-20.zip")
library(coreNLP)
initCoreNLP("stanford-corenlp-full-2015-04-20/")
FB = c("Facebook is looking for new ways to get users to share more,
rather than just consume content, in a push that seemingly puts it in more
direct rivalry with Twitter.")
output = annotateString(FB)
getToken(output)[,c(1:3,6:7)]
getParse(output)
getDependency(output)
getSentiment(output)
getCoreference(output)
qdap
: (Quantitative Discourse Analysis Package) is an R package designed to assist in quantitative discourse analysis.網路資料很多 請同學上台練習。
comments <- read.table("perfumes_comments.csv", header = TRUE,
sep = "\t", dec = ".", quote = "\"")
summary(comments)
# random rows of the data set
x <- sample(nrow(comments), 10, replace = FALSE)
comments[x,]
strong <-grepl("strong",comments$Comment, ignore.case = TRUE)
sum(strong)/nrow(comments)
sweet <-grepl("sweet|soft",comments$Comment, ignore.case = TRUE)
sum(sweet)/nrow(comments)