1. Jsoup.parse(URL) doesn't work
So when writing a Java program to parse a web page (in my project it is phimchieurap.vn), it works. But when using the same code for Google App Engine server, it doesn't work.My solution is:
Jsoup.connect(yourUrl).userAgent(yourUserAgent).get()
2. Element.html() return an incorrect string
So instead of return a correct string, it returns a weird string with some characters like &;...My solution is:
StringEscapeUtils.unescapeHtml3(String)
We need to include commons-lang.jar in order to use StringEscapeUtils.
Also remember to set character encoding to utf-8 in your Servlet.
Tool using Jsoup available at http://www.srccodes.com/tools/html/js-extractor
ReplyDelete