about syllabus All example source code
One of the first things we’ll explore in this class is git, github, and github pages. By no means, is github pages required for hosting your projects. But it’s free and fast and lives on github. So there are many reasons why you might like to consider it, at least during the homework / experimentation stage / of a project. Here are some resources for getting started.
The framework we’ll be exploring is p5.js. Here are some links to get your started.
The key functions and topics I will discuss in class are:
changed(), and more? What about
selectAll(), id vs class vs tag
style()(and when to use a CSS file).
style(), source code
p5.Element and the native JS
loadStrings(). It loads a local file (accessed by its relative path to the html file). The simplest way to get the data is to use
preload() which guarantees that the data is read before
Note the naming of the variable
lines. One of the odd nuances of
loadStrings() is that it loads all the text into into an array, with each “line” of text as a separate element of an array, i.e. the text file:
comes in as:
This is convenient in many cases, but for us right now, we just want all the text as one giant string. Therefore, we can use the
join() function to put it back together. This can’t happen until
setup() however, since the data isn’t guaranteed to be loaded until then.
Loading data works in a similar way. There is a moment where the code asks to load the file, and then an event that follows later when the data is actually loaded (there could also be an “error” event if there is a problem with the file.)
loadStrings() works exactly this way when you pass it two arguments — the name of the file, and a function that will executed when the data from the file is ready (the callback).
loadStrings() but this will make the code a bit harder to follow. Let’s take a look at the
The function takes a single argument: lines. lines containa all of the text from the file in a array of strings (unless there was an error).
The next way of getting text in I want to examine is with a user-selected file. This can be accomplished one of two ways, a select file button (as below) or a “drop zone” (an area in the page that a user can drag and drop a file.)
The choose file button can be generated fairly easily with p5.js using
createFileInput() requires only a single argument, a callback for when the file(s) are loaded. A second argument
'multiple' is optional if you want to allow the user to select multiple files. In the case of multiple files, the callback is triggered once for each file.
The argument passed to the
gotFile() callback is a
p5.File object. It contains metadata about the file such as its name, type, and size, as well as the actual contents of the file, its “data.” All of these are available as properties of the
p5.File object and accessible with dot syntax. The following more fleshed out version of the callback creates DOM elements displaying the metadata and contents. Note how a different action can be performed depending on the file’s type.
Another, often more convenient, way to accept files from a user is to allow the user to “drag and drop” files in the page itself. To do this, you first need to create and style a div that will act as the “drop zone”. For example:
There’s nothing particularly special about the CSS for the above drop zone, just some padding and a dotted line.
This could all be generated in p5 using
drop_zone, the p5
select() function can be used to grab the DOM element.
Note the use of the hash sign
# to indicate DOM element id. Once you have a DOM element to act as a drop zone there are three events you can handle —
dragLeave() are just like
mouseOut(), only instead of just hovering over the element, the events are triggered only if the user is dragging a file over the element. This can be useful for giving the user some feedback as to what is going on:
The event we care most about is
drop(). This event requires two callbacks — an event to handle the moment the user drops the file(s), and a callback that is triggered when each file is loaded and ready to be accessed. In the code below, the arguments are in the reverse order, first is the callback for handling the files, and second is the callback for the moment of drop.
Note how I am re-using the exact same
gotFile() function that we had with the “choose files” button.
If you want to get a large body of text typed in by a user,
createInput() isn’t a great choice. It’s meant more for just a couple words or a single sentence:
Type a sentence:
For a larger body of text the
<textarea> element can be generated using
size() function can be used to adjust the areas default size.
It’s also possible to simply use a
p element and assign the attribute
contenteditable. This makes any DOM element editable by the user (and you can the capture the content of that element with the
html() function.) For example:
Note how you can edit this text below:
this will be editable
As always, these elements can also be written into the HTML directly and accessed in p5 with
I should note that almost everything I am doing this week could be improved or expanded with regular expressions, but I am explicitly saving that as a topic for next week.
A String, at its core, is really just a fancy way of storing an array of characters. With the String object, we might find ourselves writing code like.
substring(), and the
indexOf() locates a sequence of characters within a string. For example, run this code and examine the result:
indexOf() returns a
0 for the first character, and a
-1 if the search phrase is not part of the String.
After you find a certain search phrase within a string, you might want to pull out part of the string and save it in a different variable. This is called a “substring” and you can use java’s
substring() function to take care of this task. Examine and run the following code:
Note that the substring begins at the specified beginning index (the first argument) and extends to the character at the end index (the second argument) minus one. Thus the length of the substring is end index minus beginning index.
At any given point, you might also want to access the length of the string. This is accomplished this with the length property.
It’s also important to note that you can concatenate (i.e. join) a string together using the
+ operator. With numbers plus means add, with strings (or characters), it means concatenate, i.e.
One string-related function that will prove very useful in our text analysis programs is split().
split() separates a group of strings embedded into a longer string into an array of strings.
Now the built-in
Examine the following code:
To perform the reverse of split, the p5 function
join() is used.
I’ll end this week by looking at a basic example of text analysis. I’ll read in a file, examine some of its statistical properties, and display a report. The example will compute the Flesch Index (aka Flesch-Kincaid Reading Ease test), a numeric score that indicates the readability of a text. The lower the score, the more difficult the text. The higher, the easier. For example, texts with a score of 90-100 are, say, around the 5th grade level, wheras 0-30 would be for “college graduates”.
The Flesch Index is computed as a function of total words, total sentences, and total syllables. It was developed by Dr. Rudolf Flesch and modified by J. P. Kincaid (thus the joint name). Most word processing programs will compute the Flesch Index for you, which provides us with a nice method to check our results.
Flesch Index = 206.835 – 1.015 * (words / sentences) + 84.6 * (syllables / words)
The pseudo-code looks something like this:
The examples above on this page demonstrate how to read in text from a file and store it in a String object. Now, all I have to do is examine that string, count the total words, sentences, and syllables, and apply the formula as a final step.
The first thing I’ll do is count the number of words in the text. We’ve seen in some of the examples above that we can accomplish this by using
split() to split a String up into an array wherever there is a space. For this example, however, we are going to want to split by more than a space. A new word occurs whenever there is a space or some sort of punctuation.
Note again how
splitTokens() will split using any of the listed characters as a delimiter. Next week, I will cover how to use regular expressions to split text.
Now that I have split up the text, I can march through all the words (tokens) and count their syllables.
Syllables = total # of vowels in a word
(not counting vowels that appear after another vowel and when ‘e’ is found at the end of the word)
The code looks like this:
Again as you will see next week, the above could be vastly improved using Regular Expressions, but it’s nice as an exercise to learn how to do all the string manipulation manually before you move on to more advanced techniques.
Counting sentences is a bit simpler. I’ll just split the content using periods, question marks, exclamation points, etc. (“.:;?!”) as delimiters and count the total number of elements in the resulting array. This isn’t terribly accurate; for example, “My e-mail address is email@example.com.” will be counted as three sentences. Nevertheless, as a first pass, this will do.
Now, all we need to do is apply the formula, generate a report as a string (which can be inserted into a DOM element using
In class, we’ll do an exercise around mashing up text manually. Here are links to further reading and information about the techniques we discussed, as well as online versions of the algorithms. For your homework you can choose to work with one of these methods manually or programmatically.