05-3

<<< home | H79.2778 - Reading and Writing Electronic Text | W4-Barney is dead

I with you
You mood me
death a drawn sleep,
than a death big hug
and a than with me to some
least you say you have me too




I which you
You which me
nearly a making persist
areas a either big hug
and a “down “down me to death
sleep, you say you every me too




being when
from every
having weight amount,
sleep person loss mood
when Major other loss
nearly lose when with abuse Many




I being they
with twice can
plays, a think, colitis
gains a person lose when
when a noted being may and they
abate. that with make prior for with




Program does: My program this week is based on alpha_replace and attempts to exploit the absurd innocence of Barney, and cross it over with a dictionary built out of the DSM-IV section regarding major depression.

Program gets: input from an OCR version of the DSM chapter, breaks it into dictionary entries according to the number of letters in each word as keys, each key is assigned a list of words of the set key’s length. The program also reads in a file

Program outputs: and replaces words from the original song which are more than 2 characters long with those from the dictionary (the word taken from the dictionary is one character longer than the source word) and prints out a morbid ditty as an output. Filtering is performed on the values in the dictionary to remove frequently used words such as “Major”, due to the capricious nature of the OCR translation, further filtering is needed to remove the OCR’s faulty translation and other characters and sections mentioned in the text that impair the poetic nature of the outcome, regular expressions would probably help here…

What is does not do: is manage to bring out enough of the barniesque into the end result, thus not really achieving it’s goal, it would take some more precision in the selection process on the string level, rather than just using a random method, in fact, most of the poems this code generate are useless at the moment, hopefully I’ll manage improving on that in the future. The biggest challenge in this however, was the attempt to exclude specific strings off the list of each key’s value, that was a partial success since I didn’t manage excluding substrings, only full words within the list.




1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
import sys                  # builds dictionary from source text, and replaces words from a different text which start with the same letter
import random

source_len = dict()                 # python len_replace8.py major_dep.txt < barney.txt
source_file = sys.argv[1]               # first argument passed on command line, sys.argv[0] will be the name of the script file

ignoreList = [")", "Major", "major"]
ignoreIndex = int()
ignoreIndex = 0


# read each line from source file; split each line into words; store each
# word in the source_alpha dictionary, according to which letter it starts with
for line in open(source_file):          # reading lines from major_dep.txt
    line = line.strip()
    for thing in ignoreList:
        if thing not in line:
            words = line.split(" ")
            for word in words:                          # loops through each of the words      
                if len(word) > 2:                           # check to make sure we have a large word
                    lengthWord = len(word)                  # get first letter of word
                                                            # if we've already seen this number, append to list
                    if lengthWord in source_len:               
                        source_len[lengthWord].append(word) # append word to the value of that letter
                    else:                                       # looks like {'3':['the','big']...}
                        source_len[lengthWord] = [word]     # specify that a new list is made

# source_alpha will be a dictionary whose keys are strings and whose values
# are lists.
# uncomment this to see what the data structure created above looks like
#print source_alpha

# read each line from standard input; split line into words; for each word,
# get a random word beginning with the same letter from source_alpha
for line in sys.stdin:
  line = line.strip()
  words = line.split(" ")
  output = ""
  for word in words:
    if len(word) > 0:
      lengthWord = len(word)
      if lengthWord in source_len:
        if lengthWord > 2:   # only replace words with more than 2 chars (preserve some of the original structure)
            source_words = source_len[lengthWord+1]  # output a value a char longer than the original
            output += random.choice(source_words)     # randomly choose a string from the value list
        else:
            output += word
        output += " "
  print output