Selenium is an extremely powerful tool used for web data scraping however, it has some flaws that is fair because it was produced mainly to test web applications. On the other hand, BeautifulSoup was developed produced for data scraping and it is extremely powerful indeed.
However, even BeautifulSoup has its faults, it won’t be beneficial if the required data is behind the “wall”, as it needs user’s login for accessing the data or needs some actions from users.
That’s where we can utilize Selenium, for automating user interactions through the website as well as we would use BeautifulSoup for scraping data after getting inside a “wall”.
Integration of Selenium with BeautifulSoup makes an extremely powerful web scraping tool.
As you can use Selenium for automating user interactions as well as extract the data also, BeautifulSoup is much more effective in extracting the data.
We would be using BeautifulSoup and Selenium to extract movie information like name, description, ratings, etc. in the comedy category from Amazon Prime Video as well as we would filter out the movies depending on the IMDB ratings.
So, let’s start.
Initially, let’s import the necessary modules;
If you want that this program works, then you need to use a Chrome Web Driver. You can have a Chrome driver so ensure that you download a driver, which matches with the Chrome browser’s version.
Then, lets make a function open_site() that will open the sign-in page of Amazon Prime.
In case, the statement is getting movies that have ratings of over 8.0 as well as below 10.0, just to make sure.
Now, it’s time to make a pandas data frame, for storing all the movie data in,
Let’s call functions now,
Your result won’t look precisely like this so, we have formatted this sheet a bit, like adjusting column widths as well as wrapping the text. Other than that, it would look like the one given here.
While BeautifulSoup and Selenium work together in the best way and can provide good results, some other modules are there that are equally powerful.
If you have any other queries related to scraping Amazon Prime Video data, you can always contact X-Byte!