In today's video I will be walking through an introduction to the Python package, Beautiful Soup 4.
This article discusses the steps involved in web scraping using the implementation of a Web Scraping framework of Python called Beautiful Soup. Steps involved in web scraping: Send an HTTP request to the URL of the webpage you want to access. The server responds to the request by returning the HTML content of the webpage. If you are curious to know that, then please use this link - Official Beautiful Soup Docs. In short, with the help of BeautfulSoup and a parser, we can easily navigate, search, scrape, and modify the parsed HTML/XML content like above (bytes type) by treating everything in it as a Python Object! Web scraping is a very powerful tool for any data science professional. The fundamentals of web scraping, using Python's library (Beautiful Soup) can be of extreme help for a data scientist.
Beautiful Soup is a MUST HAVE package for anyone interested in web scraping with python. It provides a very easy to use method of parsing through web data, and is a great starting point for more advanced automated web navigation and scraping.
Topics we will cover:
Where go to find more information, and look up functions and more information about how to use the specific package (Assuming it is well documented!).
---Beautiful Soup Documentation:
How to view/translate the source code for webpages as well as how to go back and forth between the visible web page and the underlying HTML to gain a better understanding of this process.
The stats website basketball-reference is what I will be using to demonstrate the capabilities of Beautiful Soup, specifically the player's per game stats for the 2019-2020 season. Link below.