NationStates Jolt Archive


Screenscraping Ahoy! (A List of Scrapable Non-XML Data and Some XML bugs)

Mac Anu
28-06-2006, 05:40
Not quite suitable for this forum, but on the other hand, nowhere else to put it, and it is also a sort of request to get this otherwise openly accessable data into the XML feed for easier parsing, if possible...

National:

- Regional Influence (Just Text)
- #2 and #3 Industries (Not as universal a scrapable trait, so maybe not)
- Regional and/or world ranking in daily UN report along with Report Topic and (optionally) Description

Regional:

- HTML-ized Comments (Could be difficult, as to be correct, you'd have to include an XML namespace declaration, and the HTML used might not be valid XHTML anyway, so the XML might be technically invalid)
- Fix: Check to make sure that XML/HTML entities aren't double-expanded. For example, in the Football Forever group, the entity "& #39;" in the description becomes "'" in the XML (hence rendering as "& #39;" instead of "'" in the XML (n.b. Spaces were added between "&" and "#" primarily due to vBulletin's interpretation))
- Regional Power (Just Text)
- Regional Happenings (Too difficult and/or mundane perhaps)
- Civil Headquarters (Again, perhaps too diffficult and/or mundane)

(New) Game-wide Feed:

- Number of living nations (Just Number)
- Number of living regions (Just Number)
- Top 10 in world ranking of daily UN Report, Topic, and (optionally) Description