Scraping Wikipedia tables with Google Spreadsheets


Fitting in nicely with the discussion on pulling financial data into Google Spreadsheets, the OUseful blog recently demonstrated another Spreadsheet data import function, importHTML(), which allows you to easily link an external HTML table to your workbook.

The Google spreadsheet function =importHTML(“”,”table”,N) will scrape a table from an HMTL web page into a Google spreadsheet. The URL of the target web page, and the target table element both need to be in double quotes. The number N identifies the N’th table in the page (counting starts at 0) as the target table for data scraping.

The author goes on to show you how to pull a country population table from a Wikipedia entry into a spreadsheet, create a graph from it, publish the spreadsheet as a CSV, consume the CSV in Yahoo Pipes, export the Pipe output to KML, and import the KML into a Google Map. Whew!

The importHTML function will accept either “list” or “table” as the second parameter, which allows you to retrieve records from either UL/OL/DL lists or TABLE contents, respectively. If you want to retrieve something that’s not table or list based, the importXML may also come in handy. With importXML, you can pull data from any XML or HTML file using an XPath query to target a specific tag or attribute. For more information on these import functions, consult the official documentation below.

Data Scraping Wikipedia with Google Spreadsheets
Google Docs Documentation: Functions For External Data

HOWTO – track stocks in Google Spreadsheets

Discuss this article with the rest of the community on our Discord server!

I'm a tinkerer and finally reached the point where I fix more things than I break. When I'm not tinkering, I'm probably editing a book for Maker Media.

View more articles by Brian Jepson


Ready to dive into the realm of hands-on innovation? This collection serves as your passport to an exhilarating journey of cutting-edge tinkering and technological marvels, encompassing 15 indispensable books tailored for budding creators.