The probability of B is higher than the probability of A
according to this data.
Just Another Notebook
Monday, May 7, 2012
What does “A to B is supported” mean in Speller data source?
If we say this data support the transition from A to B, we mean
the following:
Tuesday, May 1, 2012
Nature Language Processing: Word Tokenization Best Practice
Some tokenizers split on all non-alphabetical characters, which includes the apostrophe. This mean that words like "Ophelia's" and '"dimm'd" will be tokenized to "Ophelia s" and "dimm d".
This is a common problem in NLP tasks: The simplest method works for most cases, but turns out not to work for a variety of edge cases. One way to fix this is to split on all whitespace characters, or to first remove all punctuation.
This is a common problem in NLP tasks: The simplest method works for most cases, but turns out not to work for a variety of edge cases. One way to fix this is to split on all whitespace characters, or to first remove all punctuation.
Example see Dan Jurafsky's NLP Course. https://class.coursera.org/nlp/lecture/preview
Subscribe to:
Posts (Atom)