How Web Scraping Is Used To Scrape Google Play Store Data?

Apps have increased the interaction with the world. Shopping, music, news, and dating are just a few of the things you may do on social media. If you can think of it, there's probably an app for it. Some apps are superior to others. You can learn what people like and dislike for an app by analyzing the language of user reviews. Sentiment Analysis and Topic Modeling are two domains of Natural Language Processing (NLP) that can aid with this, but not if you don't have any reviews to examine!

You need to scrape and store some reviews before we get ahead of ourselves. This blog will show you how to do just that with Python code and the google-play-scraper and PyMongo packages. You have several options for storing or saving your scraped reviews.

Real-Time APIs for crawling the Google Play Store is provided by google-play-scraper. It can be used to obtain:

App information includes the app's title and description, as well as the price, genre, and current version.

App evaluations

You can use the app function to retrieve app information, and the reviews or reviews_ all functions to get reviews. We will go through how to use the app briefly before concentrating on how to get the most out of reviews. While reviews all are convenient in some situations, we prefer working with reviews. Once we get there, we will explain why and how with plenty of code.

Initiating with Google-Play-Scraper

[caption class="snax-figure" align="aligncenter" width="1140"][/caption]

Step 1: Obtain App IDs

For Scraping mobile app, you'll need one piece of information: the app's ID code. This can be discovered on the Google Play Store's URL for the app's page. The component you'll need comes just after "id=", as illustrated in the image below.

In other circumstances, the URL terminates with the app ID. In situations like these, you only need the section between "id=" and "&."

Your most recent work will be a collection of applications for mental health, mindfulness, and self-care. We will keep track of a lot of different information on a spreadsheet when exploring apps. This seemed like a reasonable place to keep each app's ID.

Step 2: Installing and Importing

Here, we will import and what is used earlier, including PyMongo. Also, initially you will need to install MongoDB. Guide for installing the community edition will be found here.

To be able to import each of the following, pip should be installed as needed:

import pandas as pdr# for scraping app info and reviews from Google Playrfrom google_play_scraper import app, Sort, reviewsrr# for pretty printing data structuresrfrom pprint import pprintrr# for storing in MongoDBrimport pymongorfrom pymongo import MongoClientrr# for keeping track of timingrimport datetime as dtrfrom tzlocal import get_localzonerr# for building in wait timesrimport randomrimport time

You will also install Mongo, establish a new database for the project, and add new collections (essentially the MongoDB equivalent to the tables of relational databases). You will also have one collection for app information and another for app reviews.

## Set up Mongo clientrclient = MongoClient(host='localhost', port=27017)rr## Database for projectrapp_proj_db = client['app_proj_db']rr## Set up new collection within project db for app inforinfo_collection = app_proj_db['info_collection']rr## Set up new collection within project db for app reviewsrreview_collection = app_proj_db['review_collection']rr

MongoDB is a lazy database and collection creator. This means that unless we start entering documents (MongoDB's equivalent of rows in relational database tables) into our collections, none of these features will exist.

Scraping App Data

[caption class="snax-figure" align="aligncenter" width="1140"][/caption]

The Platform is ready for scraping now. What we need is list of app IDs. You can download a csv copy of the spreadsheet and read it using Pandas DataFrame.

## Read in file containing app names and IDsrapp_df = pd.read_csv('Data/app_ids.csv')rapp_df.head()r## Read in file containing app names and IDsrapp_df = pd.read_csv('Data/app_ids.csv')rapp_df.head()

We can simply fetch the list of the app names and IDs to loop through during scraping:

## Get list of app names and app IDsrapp_names = list(app_df['app_name'])rapp_ids = list(app_df['android_appID'])

When we scrape reviews, we'll use app names. All we need is app ids to scrape general app information with the app function. The code below loops over each program, scraping its information from the Google Play Store and saving it in a list.

## Loop through app IDs to get app inforapp_info = []rfor i in app_ids:r info = app(i)r del info['comments']r app_info.append(info)rrr## Pretty print the data for the first apprpprint(app_info[0])

The last line provides a dictionary with various details about our initial programme. The following is a shortened version of that output:

{'adSupported': None,r'androidVersion': '4.1',r'androidVersionText': '4.1 and up',r'appId': 'com.aurahealth',r'containsAds': False,r'contentRating': 'Everyone',r'contentRatingDescription': None,r'currency': 'USD',r'description': '<b>Find peace everyday with Aura</b> - discover thousands of ' ... (truncated),r'descriptionHTML': '<b>Find peace everyday with Aura</b> - discover ' ... (truncated),r'developer': 'Aura Health - Mindfulness, Sleep, Meditations',r'developerAddress': '2 Embarcadero Center, Fl 8nSan Francisco, CA 94111',r'developerEmail': 'hello@aurahealth.io',r'developerId': 'Aura+Health+-+Mindfulness,+Sleep,+Meditations',r'developerInternalID': '8194778368040078712',r'developerWebsite': 'http://www.aurahealth.io',r'editorsChoice': False,r'free': True,r'genre': 'Health & Fitness',r...

Let's use PyMongo's insert many methods to safely save the app details in our info collection. insert many expects a list of dictionaries, which we've just created.

## Insert app details into info_collectionrinfo_collection.insert_many(app_info)

You can query that dataset straight to a DataFrame with a single line of code whenever you want to start working with it!

## Query the collection and create DataFrame from the list of dictsrinfo_df = pd.DataFrame(list(info_collection.find({})))rinfo_df.head()

Scraping App Reviews

[caption class="snax-figure" align="aligncenter" width="1140"][/caption]

It is always better to work with the review’s functions instead of reviews_all. Here are the reasons

If you truly want all the reviews, you can still acquire them.Instead of doing everything for a single app all at once, you may fragment the procedure for each app. This is beneficial since it provides you with options. You can do the following:Get updates on how many reviews you've scraped on a regular basis.Instead of waiting till the finish, save scraped data as you go.

Anatomy of the “Reviews” Function

Two variables are returned by the reviews function. We're looking for review data in the first variable. The second variable is an information token required for web scraping services are more than the count number of reviews.

rvws, token = reviews(r'co.thefabulous.app', # app's ID, found in app's urlrlang='en', # defaults to 'en'rcountry='us', # defaults to 'us'rsort=Sort.NEWEST, # defaults to Sort.MOST_RELEVANTrfilter_score_with=5, # defaults to None (get all scores)rcount=100 # defaults to 100r# , continuation_token=tokenr)

Visit Here : How Web Scraping Is Used To Scrape Google Play Store Data?

Ecommerce

How Web Scraping Is Used To Scrape Google Play Store Data?

Initiating with Google-Play-Scraper

Scraping App Data

Scraping App Reviews

Anatomy of the “Reviews” Function

Share blog posts from your blog

Report Content

Share blog posts from your blog

Report Content