This tutorial explains how to scrape Google News into Excel using VBA.
You can download the workbook by clicking on the link below.
VBA Code : Google News Scraper
The code below prompts the user to enter the topic for which they want related articles from Google News. It returns the following information about the articles.
- Title
- Source
- Time
- Author
- Link
Sub GetNewsData() Dim query As String Dim url As String Dim xmlHttp As Object Dim htmlDoc As Object Dim articles As Object Dim article As Object Dim links As Collection Dim link As String Dim mytext As String Dim newsText As Collection Dim newsTextSplit As Variant Dim newsData As Collection Dim data As Collection ' Search Query query = InputBox("Please enter topic for news articles:", "User Input") If query = "" Then MsgBox "You didn't enter anything." Exit Sub End If ' Encode special characters in a text string query = EncodeSpecialCharacters(query) url = "https://news.google.com/search?q=" & query & "&hl=en-US&gl=US&ceid=US%3Aen" ' Create a new XML HTTP request Set xmlHttp = CreateObject("MSXML2.XMLHTTP") xmlHttp.Open "GET", url, False xmlHttp.send ' Create a new HTML document Set htmlDoc = CreateObject("htmlfile") htmlDoc.body.innerHTML = xmlHttp.responseText ' Find all articles Set articles = htmlDoc.getElementsByTagName("article") ' Initialize collections Set links = New Collection Set newsData = New Collection ' Loop through each article For Each article In articles ' Get the link link = article.getElementsByTagName("a")(0).href link = Replace(link, "about:", "") link = Replace(link, "./articles/", "https://news.google.com/articles/") links.Add link ' Get the news text mytext = CleanTrim(article.innerText) ' Split the news text into lines newsTextSplit = Split(mytext, "\n") ' Add the data to the collection Set data = New Collection data.Add IIf(UBound(newsTextSplit) >= 2, Trim(newsTextSplit(2)), "Missing") ' Title data.Add Trim(newsTextSplit(0)) ' Source If UBound(newsTextSplit) >= 3 Then data.Add Trim(newsTextSplit(3)) ' Time Else data.Add "Missing" End If If UBound(newsTextSplit) >= 4 Then data.Add Trim(Split(newsTextSplit(4), "By ")(UBound(Split(newsTextSplit(4), "By ")))) ' Author Else data.Add "Missing" End If data.Add link ' Link newsData.Add data Next article Dim ws As Worksheet Dim mydata As Collection Dim i As Integer Dim j As Integer Dim response As VbMsgBoxResult ' Set the active worksheet Set ws = ActiveSheet ' Check if the sheet has data If WorksheetFunction.CountA(ws.UsedRange) > 0 Then ' Prompt the user to replace the data response = MsgBox("The active sheet contains data. Do you want to replace it?", vbQuestion + vbYesNo, "Replace Data") If response = vbYes Then ' Clear the data ws.Cells.ClearContents Else MsgBox "Data was not replaced. No New Data Added.", vbInformation, "Canceled" Exit Sub End If End If With ws .Cells(1, 1).Value = "Title" .Cells(1, 2).Value = "Source" .Cells(1, 3).Value = "Time" .Cells(1, 4).Value = "Author" .Cells(1, 5).Value = "Link" End With ' Get the data from the newsData collection Set mydata = newsData ' Write the data to the worksheet For i = 1 To mydata.Count For j = 1 To mydata(i).Count ws.Cells(i + 1, j).Value = mydata(i)(j) Next j Next i End Sub Function EncodeSpecialCharacters(text As String) As String Dim encodedText As String Dim char As String Dim i As Integer ' Encode the special characters encodedText = "" For i = 1 To Len(text) char = Mid(LCase(text), i, 1) Select Case char Case "&" encodedText = encodedText & "%26" Case "=" encodedText = encodedText & "%3D" Case "+" encodedText = encodedText & "%2B" Case " " encodedText = encodedText & "%20" Case Else encodedText = encodedText & char End Select Next i EncodeSpecialCharacters = encodedText End Function Function CleanTrim(ByVal S As String) As String Dim X As Long, CodesToClean As Variant CodesToClean = Array(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, _ 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 127, 129, 141, 143, 144, 157, 160) For X = LBound(CodesToClean) To UBound(CodesToClean) If InStr(S, Chr(CodesToClean(X))) Then S = Replace(S, Chr(CodesToClean(X)), "\n") Next ' Replace multiple "\n" with a single "\n" temp = Replace(S, "\n\n", "\n") While InStr(temp, "\n\n") > 0 temp = Replace(temp, "\n\n", "\n") Wend If Left(temp, 2) = "\n" Then temp = Mid(temp, 3, Len(temp) - 2) CleanTrim = WorksheetFunction.Trim(temp) End Function
Steps to Enter and Run VBA Code
- Open a New Excel Workbook.
- Press Alt + F11 to access the Visual Basic Editor (or click on the 'Visual Basic' icon in the 'Developer' tab)
- Select Insert > Module.
- In the module window that appears, enter the above VBA code.
- Close the Visual Basic Editor window if it's still open.
- In Excel, press Alt + F8 shortcut key and then select the GetNewsData macro and hit 'Run' button to run macro.
How to Customize the above VBA Code
If you don't want the user prompt to enter the topic for articles every time you run the code, you can hard code it by replacing the code -
query = InputBox("Please enter topic for news articles:", "User Input")
with query = "Enter Your Topic"
Explanation
Set xmlHttp = CreateObject("MSXML2.XMLHTTP")
This line creates an XML HTTP request object which is used to fetch news articles' information from Google News.Set htmlDoc = CreateObject("htmlfile")
This object will be used to parse and manipulate the HTML content.htmlDoc.body.innerHTML = xmlHttp.responseText
It fills the HTML document with the content fetched from the Google News.Set articles = htmlDoc.getElementsByTagName("article")
This line extracts all the HTML elements with the tag name "article" from HTML. Every news article is wrapped within this tag.
Share Share Tweet