![]() Perhaps we want to do this on all the files in all the directories. So roughly 8% of the utterances in the files from the "065" directory contain laughter. How many total lines do we have in the "065" directory? (If we type without escaping the brackets, it will match any line that contains the letter x, y or z.) But here we want its literal meaning as a character. We need to use the backslash character to escape the bracket character which has a special meaning in regular expressions. Let's make sure we are working on the same directory here. If we can count how often appears in these lines, then we will have an idea of the proportion of utterances containing laughter. ![]() Using "wc", we know how many lines a file contains. Laughter is coded explicitly in the transcriptions as. Using "grep", we can have an idea of how much people laugh in these conversations. Do to this, we add the option "-color=auto": It can be useful to use some colors, to see what exactly gets matched. ![]() If the regular expression matches the text, the line will be echoed back to us. We can then type some text (one line), and hit enter. We will type "grep" followed by the regular expression we want to look for, then "enter". We can first try regular expressions in an interactive mode. It searches plain-text data for lines matching a regular expression. "grep" stands for Globally search a Regular Expression and Print. Just for the fun of it, I will quickly illustrate what you can do with regular expressions. Regular expressions are quite handy but tricky too (see xkcd comic). The "grep" command ("Select-String" in PowerShell) might be quite useful to you but we will not insist too much on this. Sometimes a file might be really big, and you don't want to open it in Word -) There are other useful Unix commands, and you might want to take a look at Unix for Poets.īut for our purpose, we just want to be able to navigate the directory tree and have quick peeks into files. Now count the number of files in the "058" directory. So we have 100 files in the "065" directory. Two (or more) commands connected in this way form what's called a pipe. We use the vertical bar "|" to connect two commands together so that the output from one command becomes the input of the next command. We can list the directory content and count the output. Can we know how many files we have in this "065" directory? So *.txt will mean "anything that ends with. We will use * which serves as a wildcard character. Now I want to know how many words I have in this whole directory "065", and not only for this particular file.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |