How to get all the links from a web page
In this article, I will show how to get all the links from a webpage, maybe not just link, but I will use the links as an example.
data:image/s3,"s3://crabby-images/9bc00/9bc0088d49cc232645c9696d01888a5a9a44c2c2" alt="How to get all the links from a web page"
In this article, I will show how to get all the links from a webpage, maybe not just link, but I will use the links as an example.
We just take Google as an example and my browser is Chrome, for example I just search web 3.0
from Google, and from my browser, the result is like below:
data:image/s3,"s3://crabby-images/626af/626af0fd6d55ee41b49e9b340d74e05c54d40226" alt=""
In order to get the links from this page, first I will investigate the links from source code, we can open the Inspect Elements
window with one of the following method:
- Open from menu
View/Developer/Inspect Elements
- Open with shortcut
⌥+⌘+C
- Right click the page and select
Inspect
context menu
Then select one of the link to see the source code, like below:
data:image/s3,"s3://crabby-images/6ae73/6ae733851bfa6f2a3a2f605c6df3d350aae4067a" alt=""
We can see that all the links are located under <div class="yuRUbf">
, so I can get all the a
element under div with classyuRubf
.
Then select the Console
tab from Inspect Elements
window like below:
data:image/s3,"s3://crabby-images/8603a/8603a9de72cf0625f46ddbf1150a895bb62a25c1" alt=""
In the console, we can get the links with the following code:var links = document.querySelectorAll('.yuRUbf>a');
var urls = [];
for (let link of links) {
urls.push(link.getAttribute('href'));
}
console.log(urls.join('\n'));
The output is like below:
data:image/s3,"s3://crabby-images/31854/318544b7c699df4cc73238d0b30b832156db8a39" alt=""
In order to get other information, you can change the code accordingly, for example, I also want to get the title of the link, from the source code, we can see that the title is under h3
tag, which is inside the a
tag, then we can use the following code to get the titles.var links = document.querySelectorAll('.yuRUbf>a');
var urls = [];
for (let link of links) {
urls.push(link.getElementsByTagName('h3')[0].innerText + '\t' + link.getAttribute('href'));
}
console.log(urls.join('\n'));
The result is like below:
data:image/s3,"s3://crabby-images/ab745/ab74536cb456227ca49fc10881cb92c6902d2600" alt=""
Hope this article can help you when you want to get some information from a web page.