https://morioh.com/p/b260ff42c61f/how-to-web-scraping-with-node-js-cheerio In this post, we’ll learn how to use Node.js and friends to perform a quick and effective web-scraping for single-page applications. This can help us gather and use valuable data which isn’t always available via APIs. Let’s dive in. What is web scraping? Web scraping is a technique used to extract data from websites using a script. Web scraping is the way to automate the laborious work of copying data from various websites. Web Scraping is generally performed in the cases when the desirable websites don’t expose the API for fetching the data. Some common web scraping scenarios are: Scraping emails from various websites for sales leads. Scraping news headlines from news websites. Scraping product’s data from E-Commerce websites. Why do we need web scraping when e-commerce websites expose the API (Product Advertising APIs) for fetching/collecting product’s data? E-Commerce websites only expose some of their product’s data to be fetched through APIs therefore, web scraping is the more effective way to collect the maximum product’s data. Product comparison sites generally do web scraping. Even Google Search Engine does crawling and scraping to index the search results. What will we need? Getting started with web scraping is easy and it is divided into two simple parts- Fetching data by making an HTTP request Extracting important data by parsing the HTML DOM We will be using Node.js for web-scraping. If you’re not familiar with Node, check out this article “The only NodeJs introduction you’ll ever need”. We will also use two open-source npm modules: axios— Promise based HTTP client for the browser and node.js. cheerio — jQuery for Node.js. Cheerio makes it easy to select, edit, and view DOM elements. You can learn more about comparing popular HTTP request libraries here. Tip: Don’t duplicate common code. Use tools like Bit to organize, share and discover components across apps- to build faster. Take a look. Setup Our setup is pretty simple. We create a new folder and run this command inside that folder to create a package.json file. Let’s cook the recipe to make our food delicious. npm init -y Before we start cooking, let’s collect the ingredients for our recipe. Add Axios and Cheerio from npm as our dependencies. npm install axios cheerio Now, require them in our index.js file const axios = require('axios'); const cheerio = require('cheerio'); Make the Request We are done with collecting the ingredients for our food, let’s start with the cooking. We are scraping data from the HackerNews website for which we need to make an HTTP request to get the website’s content. That’s where axios come into action. Our response will look like this —
clean-5
Wisata
Budaya
Kuliner
Kerajaan
kota
Suku
Home
»
»Unlabelled
»
Không có nhận xét nào: