I really wanted to play fantasy baseball this year, and of course nobody invites me to their league, so I decided to commission one. I'm a stats geek and I just couldn't go with Yahoo's default suggestions for scoring, so I added a few of my own. One I'm really coming to regret is innings pitched (IP). Everybody else in my league is totally dedicated to their craft, so they're swapping pitchers on and off their benches just in time for their scheduled states, killing me in IP. I always think of this stuff around 8 PM, after the guy has already been pitching for an hour and when Yahoo will not let me make any more changes.
I thought, what if there was an app that notified me when one of the pitchers on my fantasy team was set to pitch that day. It would notify me several hours ahead of time, so I can then swap that pitcher in. Even in a non-IP scoring league, that could be useful if you had several good pitchers and wanted to play them all to rack up strikeouts or wins.
The start of this, is, of course, how do you find out who's pitching that day. I don't see any web services devoted to that, but plenty of web sites from major sports news sources. I had just learned how to pick out parts of HTML trees with a Python library called Beautiful Soup. So I figured that if I could find a site that was pretty easy to parse, I could make a service out of that. I googled "pitching schedule" and clicked on the first link.
Gee, looks like a bunch of tables, huh? So if I could just find some CSS classes that indicated where the headings vs. the data were, then maybe I'd have something. When I opened the source code of the page I found just that.
Basically, the team headings (i.e., the "Texas Rangers at Baltimore Orioles" above) had their own unique CSS class called "stathead". I could tell Beautiful Soup to find all the tr's with that class, then I could find the data underneath each one of those by navigating the tr.'s until I got the one I wanted (4 for the first pitcher and 5 for the second).
import feedparser
from pyquery import PyQuery as pq
from lxml import etree
import urllib
import urllib2
from bs4 import BeautifulSoup, NavigableString
from urllib2 import urlopen
import csv
base_url = ("http://espn.go.com/mlb/probables")
soup = BeautifulSoup(urlopen(base_url).read())
teamsHeadings = soup.findAll("tr", attrs={"class": "stathead"})
for t in teamsHeadings:
teams = t.find_next("td")
print "Teams: " +teams.string.strip()
_time = teams.find_next("td")
print "Time: " +_time.string.strip()
firstPitcher = teams.find_parent("tr").find_next("tr").find_next("tr").find_next("tr").find("td").find("a")
secondPitcher = teams.find_parent("tr").find_next("tr").find_next("tr").find_next("tr").find_next("tr").find("td").find("a")
print "Matchup: " + firstPitcher.string.strip() + " vs. " + secondPitcher.string.strip()
print ""
I just did simple test output. It wouldn't be hard to turn this into XML or JSON and make something out of it if I was sufficiently motivated to field a decent fantasy team.
PS: I really recommend Homebrew if you're going to use Python on a Mac. Made it way easier to get extra libraries going. Also, http://krillapps.com/coderunner/ makes it way easier to test my scripts. Now I don't have to remember all those virtual environment incantations.
No comments:
Post a Comment