about syllabus All example source code
Designed to support the creation of new works of computational literature, the RiTa library provides tools for artists and writers working with natural language in programmable media.
The RiTa library has numerous features around text analysis and generation. For example, it has features built into it to generate text with algorithms and systems (Markov Chains, Context Free Grammer) I’ll cover in later tutorials.
For now I want to look at two features, the
RiString object and the
RiString allows you to analyze a piece of text. You can tokenize it, count syllables, determine parts of speech, etc. One particularly useful function is
features() which returns an object with features for the sentence, including phonemes, syllables, stresses, etc.
It should be noted that the parts of speech tags are from the Penn Treebank Project.
RiTaJS also has a lexicon built into it. A lexicon is another word for “vocabulary” and operates like a machine readable dictionary. The RiTa lexicon contains about 40,000 words along with associated spelling and phonemic data. The library provides many hooks into the lexicon. For example, you can ask it for random words of a given part of speech or with a certain syllable account. It also can provide words that rhyming words and words that “sound similar.” To use the lexicon, you simply need to make a
Once you have the object you can query it anywhere in your code.
Once the library is loaded you can create a variable to call of its functions on.
NLP-Compromise works by allowing you to chain together a series of functions that build and/or adjust a block of text. For example, if you want to work with a noun you would say:
But typically you’ll see these actions chained together:
NLP-Compromise can negate statements, conjugate verbs (and therby alter tense), provide articles and pronouns, and more.
Data can come from many different places: websites, news feeds, spreadsheets, databases, and so on. Let's say you've decided to make a map of the world's flowers. After searching online you might find a PDF version of a flower encyclopedia, or a spreadsheet of flower genera, or a JSON feed of flower data, or a REST API that provides geolocated lat/lon coordinates, or some web page someone put together with beautiful flower photos, and so on and so forth. The question inevitably arises: “I found all this data; which should I use, and how do I get it?”
In this case, someone else has done all the work for you. They've gathered data about flowers and built a library with a set of functions that hands you the data in an easy-to-understand format. This library, sadly, does not exist (not yet), but there are some that do.
Let's take another scenario. Say you’re looking to build a visualization of Major League Baseball statistics. You can't find a library to give you the data but you do see everything you’re looking for at mlb.com. If the data is online and your web browser can show it, shouldn't you be able to get the data? Passing data from one application (like a web application) to another is something that comes up again and again in software engineering. A means for doing this is an API or “application programming interface”: a means by which two computer programs can talk to each other. Now that you know this, you might decide to search online for “MLB API”. Unfortunately, mlb.com does not provide its data via an API. In this case you would have to load the raw source of the website itself and manually search for the data you’re looking for. While possible, this solution is much less desirable given the considerable time required to read through the HTML source as well as program algorithms for parsing it.
This is how it might look if you typed it into your code directly (the quotes are no longer necessary.)
An object can contain, as part of itself, another object. Below, the value of “brother” is an object containing two name/value pairs.
To compare to data format like XML, the preceding JSON data would look like the following (for simplicity I'm avoiding the use of XML attributes).
You might find an array as part of an object. Below the value of “favorite colors” is an array of strings.
A great place to find a selection of JSON data sources to play with is corpora, a github repository maintained by Darius Kazemi. For example, here’s a JSON file containing information about birds in Antarctica.
loadJSON() can be called in
preload or used with a callback. I'm using callbacks in just about all my examples so let's follow that syntax here.
The data from the JSON file is passed into the argument
data in the
gotData callback. Then it becomes a bit of detective work. How is the data structured — a single object? an array of objects? An object full of arrays of objects? Let’s look at a snippet from the birds of Antarctica.
If the JSON file is loaded into the variable
data, the way you access that data is no different than if you had said:
For example, if you wanted to display the description and link it to the source you would say:
birds is an array of objects, you can use a
for loop just the way you always do with arrays. Each element of the array is an object itself with properties that can be accessed like
members (which is also an array!).
Here’s what this looks like:
What makes something an API versus just some data you found, and what are some pitfalls you might run into when using an API?
An API (Application Programming Interface) is an interface through which one application can access the services of another. These can come in many forms. Openweathermap.org is an API that offers its data in JSON, XML, and HTML formats. The key element that makes this service an API is exactly that offer; openweathermap.org's sole purpose in life is to offer you its data. And not just offer it, but allow you to query it for specific data in a specific format. Let's look at a short list of sample queries.
One thing to note about openweathermap.org is that it does not require that you tell the API any information about yourself. You simply send a request to a URL and get the data back. Other APIs, however, require you to sign up and obtain an access token. The New York Times API is one such example. Before you can make a request, you'll need to visit The New York Times Developer site and request an API key. Once you have that key, you can store it in your code as a string.
You also need to know what the URL is for the API itself. This information is documented for you on the developer site, but here it is for simplicity:
search() function you might say:
This isn't just guesswork. Figuring out how to put together a query string requires reading through the API's documentation. For The New York Times, it’s all outlined on the Times' developer website. Once you have your query you can join all the pieces together and pass it to
loadJSON(). Here is a tiny example that simply displays the most recent headline.
Some APIs require a deeper level of authentication beyond an API access key. Twitter, for example, uses an authentication protocol known as “OAuth” to provide access to its data. Writing an OAuth application requires more than just passing a string into a request. There are some examples this week that use server-side programming in Node to perform the authentication.
Certain characters and invalid in URLs. For example, let’s say you were querying wordnik for the words “bath towel”. You would have to say
bath%20towel. You could do this yourself with a regex or use URI encoding with
encodeURI(). Here is more documentation and an example below.
encodeURI does not encode the following characters:
, / ? : @ & = + $ #. This is as it should be since these are used in URLs to mean certain things. However, if you wanted to have a $ or / as part of some text you are passing into a key/value pair you would want to encode these characters. For this
encodeURIcomponent() can be used.