Disclaimer: This is a user generated content submitted by a member of the WriteUpCafe Community. The views and writings here reflect that of the author and not of WriteUpCafe. If you have any complaints regarding this post kindly report it to us.

Apps have increased the interaction with the world. Shopping, music, news, and dating are just a few of the things you may do on social media. If you can think of it, there's probably an app for it. Some apps are superior to others. You can learn what people like and dislike for an app by analyzing the language of user reviews. Sentiment Analysis and Topic Modeling are two domains of Natural Language Processing (NLP) that can aid with this, but not if you don't have any reviews to examine!

You need to scrape and store some reviews before we get ahead of ourselves. This blog will show you how to do just that with Python code and the google-play-scraper and PyMongo packages. You have several options for storing or saving your scraped reviews.

Real-Time APIs for crawling the Google Play Store is provided by google-play-scraper. It can be used to obtain:

App information includes the app's title and description, as well as the price, genre, and current version.

App evaluations

You can use the app function to retrieve app information, and the reviews or reviews_ all functions to get reviews. We will go through how to use the app briefly before concentrating on how to get the most out of reviews. While reviews all are convenient in some situations, we prefer working with reviews. Once we get there, we will explain why and how with plenty of code.

Initiating with Google-Play-Scraper

Step 1: Obtain App IDs

For Scraping mobile app, you'll need one piece of information: the app's ID code. This can be discovered on the Google Play Store's URL for the app's page. The component you'll need comes just after “id=”, as illustrated in the image below.

In other circumstances, the URL terminates with the app ID. In situations like these, you only need the section between “id=” and “&.”

Your most recent work will be a collection of applications for mental health, mindfulness, and self-care. We will keep track of a lot of different information on a spreadsheet when exploring apps. This seemed like a reasonable place to keep each app's ID.

Step 2: Installing and Importing

Here, we will import and what is used earlier, including PyMongo. Also, initially you will need to install MongoDB. Guide for installing the community edition will be found here.

To be able to import each of the following, pip should be installed as needed:

import pandas as pd
# for scraping app info and reviews from Google Play
from google_play_scraper import app, Sort, reviews

# for pretty printing data structures
from pprint import pprint

# for storing in MongoDB
import pymongo
from pymongo import MongoClient

# for keeping track of timing
import datetime as dt
from tzlocal import get_localzone

# for building in wait times
import random
import time

You will also install Mongo, establish a new database for the project, and add new collections (essentially the MongoDB equivalent to the tables of relational databases). You will also have one collection for app information and another for app reviews.

## Set up Mongo client
client = MongoClient(host='localhost', port=27017)

## Database for project
app_proj_db = client[‘app_proj_db']

## Set up new collection within project db for app info
info_collection = app_proj_db[‘info_collection']

## Set up new collection within project db for app reviews
review_collection = app_proj_db[‘review_collection']

MongoDB is a lazy database and collection creator. This means that unless we start entering documents (MongoDB's equivalent of rows in relational database tables) into our collections, none of these features will exist.

Scraping App Data

The Platform is ready for scraping now. What we need is list of app IDs. You can download a csv copy of the spreadsheet and read it using Pandas DataFrame.

## Read in file containing app names and IDs
app_df = pd.read_csv(‘Data/app_ids.csv')
app_df.head()
## Read in file containing app names and IDs
app_df = pd.read_csv(‘Data/app_ids.csv')
app_df.head()

We can simply fetch the list of the app names and IDs to loop through during scraping:

## Get list of app names and app IDs
app_names = list(app_df[‘app_name'])
app_ids = list(app_df[‘android_appID'])

When we scrape reviews, we'll use app names. All we need is app ids to scrape general app information with the app function. The code below loops over each program, scraping its information from the Google Play Store and saving it in a list.

## Loop through app IDs to get app info
app_info = []
for i in app_ids:
    info = app(i)
    del info[‘comments']
    app_info.append(info)

## Pretty print the data for the first app
pprint(app_info[0])

The last line provides a dictionary with various details about our initial programme. The following is a shortened version of that output:

{‘adSupported': None,
‘androidVersion': ‘4.1',
‘androidVersionText': ‘4.1 and up',
‘appId': ‘com.aurahealth',
‘containsAds': False,
‘contentRating': ‘Everyone',
‘contentRatingDescription': None,
‘currency': ‘USD',
‘description': ‘<b>Find peace everyday with Aura</b> – discover thousands of ‘ … (truncated),
‘descriptionHTML': ‘<b>Find peace everyday with Aura</b> – discover ‘ … (truncated),
‘developer': ‘Aura Health – Mindfulness, Sleep, Meditations',
‘developerAddress': ‘2 Embarcadero Center, Fl 8nSan Francisco, CA 94111',
‘developerEmail': ‘hello@aurahealth.io',
‘developerId': ‘Aura+Health+-+Mindfulness,+Sleep,+Meditations',
‘developerInternalID': ‘8194778368040078712',
‘developerWebsite': ‘http://www.aurahealth.io',
‘editorsChoice': False,
‘free': True,
‘genre': ‘Health & Fitness',

Let's use PyMongo's insert many methods to safely save the app details in our info collection. insert many expects a list of dictionaries, which we've just created.

## Insert app details into info_collection
info_collection.insert_many(app_info)

You can query that dataset straight to a DataFrame with a single line of code whenever you want to start working with it!

## Query the collection and create DataFrame from the list of dicts
info_df = pd.DataFrame(list(info_collection.find({})))
info_df.head()

Scraping App Reviews

It is always better to work with the review’s functions instead of reviews_all. Here are the reasons

  • If you truly want all the reviews, you can still acquire them.
  • Instead of doing everything for a single app all at once, you may fragment the procedure for each app. This is beneficial since it provides you with options. You can do the following:
  • Get updates on how many reviews you've scraped on a regular basis.
  • Instead of waiting till the finish, save scraped data as you go.

Anatomy of the “Reviews” Function

Two variables are returned by the reviews function. We're looking for review data in the first variable. The second variable is an information token required for web scraping services are more than the count number of reviews.

rvws, token = reviews(
‘co.thefabulous.app', # app's ID, found in app's url
lang='en',            # defaults to ‘en'
country='us',         # defaults to ‘us'
sort=Sort.NEWEST,     # defaults to Sort.MOST_RELEVANT
filter_score_with=5,  # defaults to None (get all scores)
count=100             # defaults to 100
# , continuation_token=token
)

Visit Here : How Web Scraping Is Used To Scrape Google Play Store Data?

Login

Welcome to WriteUpCafe Community

Join our community to engage with fellow bloggers and increase the visibility of your blog.
Join WriteUpCafe