It’s a bird, it’s a plane, it’s a python?
Introductions Are In Order
Sometimes at work I have the need to get distracted. You can only work on the some programs so much before you just want to tear it to little bits and start it over from scratch (programming rule #4). So in order to combat this impulse I find little side programs I can create that I can somewhat realistically say is work related. That’s my story and I’m sticking to it! This one involves an experiment in the wonderful language of Python. It’s a fun language, in my opinion. Very powerful and flexible. It takes a little bit of getting used to, but it’s my scripting language of choice. But watch out, it’s pooooiisssoonnous.
We were using Apache Solr on a project (which I lurve) but had some constraints adjusting the log level. We’d typically see a gig or so a day of log traffic that we really didn’t need. Solr uses SLF4J logging. It’s a handy packing that abstracts the logging calls, and you instead provide another library that has the actual implementation. AFAIK there is only JDK and Log4j implementations, but I haven’t really looked. By default, the logging library provided is JDK which is all well and good. JDK logging relies on the container to control what is being logged and where. The problem was, we use Weblogic Not that I have a whole lot against Weblogic, but in order to hook into its logging mechanism, the implementation needs to extend the commons logging API that Weblogic provides. We could have gone Log4j, but didn’t really have the bandwidth to spare. Or more acurately, since the project was being transitioned off, we didn’t have the ability to make code changes.
Enter the python (it’s like a baby dragon or something). Solr comes with a very nice logging console where, at runtime, you can tweak log levels. It only affects JDK logging (conveniently) and gets reset with a server bounce (which happens regularly). So I came up with the program concept to use a script set these levels somehow. I took a look at the server, and lo and behold (for fun go look up the definition of lo) there was a python staring right at me. So I told my co-workers to move slowly, they’re not poisonous, but can constrict you to death with powerful squeezes. After a few turns, and a lucky critical, we took care of the python. I returned to my search and found that the server had python installed and available for use. I had my language of choice and started looking at what I could do. The result? A pretty nifty command line utility that can submit a Web form with parameters of your own design. It’s so good, it has almost completely overflowed the good and rolled over to evil. Luckily we use unsigned values here. Take that evil!
Disclaimer!
The following code can be found at github here and is subject to the Apache License, Version 2.0.
It’s All About Class
So I broke out the gvim and got to work. I borrowed and tweaked a Web form class I found while browsing, but of course, cannot find again. If I do I will link credit; j/k found it. So let’s take a look at that class. It’s a pretty lightweight class. The two main methods to look at are opener and POST. opener creates the url opener that will that is from a python class urllib2. As far as I can tell the opener is the work horse. It sets up the header information (like saying what your user agent is in case the website is doing evil things) and the like. POST is the method in which all the data (url, form values) and the opener are passed in order to make the connection in a POST submission. You’ll notice there is a GET method available if you are so inclined. I never tested it, but I’m sure it works just fine.
class WebForm:
def __init__(self):
pass
def opener(self, ref):
"""Creats an opener to store cookies,
and keep a referer to the site
Added user-agent to spoof browser"""
self.reference = ref
cj = cookielib.CookieJar()
self.opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
self.opener.addheaders.append(("User-agent", "Mozilla/4.0"))
self.opener.addheaders.append(("Referer", ref))
return self.opener
def GET(self, opnr, url):
getReq = opnr.open(url)
return getReq.read()
def POST(self, opnr, url, data):
enData = urllib.urlencode(data)
getReq = opnr.open(url, enData)
return getReq.read()
# end class WebForm
I wanted some space here. What!
Main Attraction
Next up to the plate is the main function. It’s a bit incomplete at the moment. It works in general, but I was trying to get all fancy and make a full fleshed command line tool so there are lots of not implemented sections. If that irks you, you might want to look away. The main showcase here is I was working on argument handling. This is command line after all. Don’t forget the rum, arrrrggv!
def main(argv):
#Get the arguments
try:
opts, args = getopt.getopt(argv, "f:ghu:v:", ["file=", "generate", "help", "urls=", "valuepairs="])
except getopt.GetoptErrot:
usage()
sys.exit(2)
#init the urls list and values dict
urls = []
values = {"submit":"set"}
#Proccess the args
for opt, arg in opts:
if opt in ("-f", "--file"):
list = readFile(arg)
urls = list[0]
values = genParings(list[1], values)
elif opt in ("-g", "--generate"):
print "-g Not Implemented"
sys.exit(3)
elif opt in ("-h", "--help"):
usage()
sys.exit(2)
elif opt in ("-u", "--urls"):
print "-u Not Implemented"
sys.exit(3)
elif opt in ("-v", "--valuepairs"):
print "-v Not Implemented"
sys.exit(3)
#Go through each URL and submit the same logging values for each
for url in urls:
print "Accessing url: " + url + " for values: "
print values
s = WebForm()
postData = urllib.urlencode(values)
urlOpen = s.opener(url)
request = s.POST(urlOPen, url, values)
#Surpurfulous
f = open("test.txt", "w")
f.write(request)
# end main(argv)
A bit of a side note here as this was was difficult for me to search correctly. If you are trying to initialize an empty array, you do something like:
In other words, just set it to an empty looking array (not all that unfamiliar from javascript) Maybe my googlefu was just not up to the task at the time, but that was damn hard to find. I found a number of posts that seemed to ask the question, and was met with much inquiry on why you would possibly want to do that. I did not grok. I cannot imagine it is customary for python to not do for loops to add elements to an array? I must be missing something. Or I found trolls. Good thing fire works on everything.
Help, Help I’m Being Repressed!
Okay, not really. I am actually being helped by tiny little helper functions. I’ve always been a fan of breaking out functionality into classes, unless I am in a performance pinch. Then that extra stack overhead can be problematic. I will still find any excuse to use recursion. It is so elegant. I’m all about elegance. But beyond that little side tangent, there is unfortunately no recursion here. Maybe I will find a nice example of mine later.
Where I was meandering towards is breaking out functionality into methods. Now, you can go overboard with this. You don’t need to have a hundred little functions that are only a few lines each. But at the same time, if it looks like something you could treat as a function, even if you only call it once, I believe it is a good idea to break it out. Makes for an easier to follow code, especially if your function names are meaningful. You do not necessarily get as much that’s a lot of lines in that function syndrome. On the flip side it means you may end up writing a lot of spaghetti code. And while spaghetti is delicious, it has no place in code.
Maybe this all has something to do with a memory of one of my favorite instructors telling us how to declare functions in SML. You see here children, you define a function by starting with fun, because functions are fun. Maybe I just took that to heart.
First up to the plate is a teeny tiny function to generate a values dictionary / map from a list of entries called genParings. Tokenize based off a predefined deliminator and add to the map. The parings are used to select a given radio button (or any html element really) and give it a value that will passed on form submit. Short and sweet. I do not remember actually looking it up, but I cannot remmeber if python support call by reference or not. I went ahead and passed and returned the parings list just in case. I supposed it would be easy enough to verify, but I don’t exactly have the system to work with here! D:
def genParings(list, parings):
for entry in list:
pairs = entry.split(":")
parings[pairs[0]] = pairs[1]
return parings
# end genParings(list, parings)
Next up to the plate is the readFile method. This is used to take in the file input, parse it, and break it out into a list that contains the list of urls to run against, and the parings to apply to each url. It’s a nice example of how simple it is to read a file in python. You’ll notice the open command takes a parameter r. Simple way to say this is a read only file descriptor. You can do a w for write, and a rw for read write.
# Not the coolest nor fault tolerant approach, but only requires one pass
def readFile(fileName):
print "Reading file: " + fileName
file = open(fileName, "r")
addToUrls = False
addToParings = False
urls = []
parings = []
#Process the lines in the file
for line in lines:
line = line.strip(" \n\r\t")
if line.lower() == "#url list":
addToUrls = True
elif line.lower() == "#paring list":
addToUrls = False
addToParings = True
elif addToUrls:
urls.append(line)
elif addToParings:
parings.append(line)
list = [urls, parings]
return list
# end readFile(fileName)
Last up to the plate, as helper functions go anyway, is the usage function! Important to have that usage and such.
def usage():
print "Solr Logging Utility Usage: All your logging are belong to us!"
print ""
print "\t-f [] or --file [] \t\tTakes a file path to use as input for processing"
print "\t-g or -- generate \t\tGenerates a sample input file"
print "\t-h or --help \t\t\tProvides usage help"
print "\t-u [] or --urls [] \t\tTakes a list of URLs to submit logging actions to"
print "\t-v [] or --valuepairs [] \tTakes a list of logging name : logging value parameters to submit"
# end usage()
Lastly, I leave you with a little python trick. I was a bit confused on how this worked, but python has intriguing abilities. It’s a little clever and lightweight trick that exists in Python so that our Python files can act as either reusable modules, or as standalone programs. When the Python interpreter reads a source file, it executes all of the code found in it. Before executing the code, it will define a few special variables, one of those being __name__. If the python interpreter is running that module directly, it sets the special __name__ variable to have a value “__main__”. Otherwise it appears that __name__ will be equal to the current module.
if __name__ == "__main__":
main(sys.argv[1:])
# end if self!
A quick note about how I called main. You’ll notice the sys.argv[1:] argument. We start at 1 because sys.argv[0] is the name of the script you are running. [x:y] is a list operation that performs a slice of the array from position x to position y. In our case y is not provided so the slice will go to the end of the list.
I am Bender, please insert … input!
So I lied about the whole lastly part. I figured I should possibly provide a bit of sample input, you know, to sample. Not much to it if you followed to code above. But gives you an idea of how it is supposed to work. We had a clever, clever migrating master/slave setup going on so I had a number of urls to hit. Good times.
http://hostname1:port/solr-master/admin/logging
http://hostname2:port/solr-master/admin/logging
#paring list
org.apache.solr.core.SolrCore:WARNING
So there’s the real lastly. It’s a bit of a rough work, but I did it under bizarre constraints and wanted to experiment with some python. It might have typos since I had to retype this by hand without testing. Soo yeah. Tune in next time for more shenanigans. There might be some punch and pie.
September 9th, 2011 at 17:54 pm
Bwahahahaha it worked, it worked!
September 9th, 2011 at 17:54 pm
But FUUUU that comments aren’t cross posting.
September 9th, 2011 at 18:06 pm
Oh now you cross post. FUUUUUU!