|
下面是我做的一个程序,用来从起点网站取出我看的那几部小说的最新章节题目,但有时运行正常,有时就会出现编码异常(UnicodeDecodeError: 'gbk' codec can't decode bytes in position 1-2: illegal multibyte sequence),不知道怎么回事
- # -*- coding: cp936 -*-[color=red]此处改为coding: gbk[/color]
- import urllib2
-
- bookindex={
- "巫墓":"http://www.qidian.com/Book/1051839.aspx",
- "盘龙":"http://www.qidian.com/Book/1017141.aspx",
- "斗罗大陆":"http://www.qidian.com/Book/1115277.aspx"
- }
- def getURL(url):
- try:
- fp = urllib2.urlopen(url)
- except:
- print 'get url exception'
- return[]
- p = re.compile('''^<font color="#FF0000">第''')
- content=fp.read()
- fp.close()
- return content
- findstr1='''#FF0000'''
- findstr2='''</font>'''
- for name in bookindex.keys():
- content=getURL(bookindex[name]).decode("cp936")[color=red]#此处改为decode("gbk")[/color]
- begin = content.index(findstr1) + 9
- end = content.index(findstr2,begin)
- print name
- print content[begin:end]
- raw_input("按任意键退出!")
复制代码
谢谢2,3楼的提醒,我用了gbk编码后就正常了 |
|