Top 20 Python pandas Interview Questions with Answer(2024)

1. What are Pandas?

Answer: Pandas is an open-source data manipulation and for Python. It provides data structures like Series and DataFrame, along with a plethora of functions to manipulate and analyze structured data seamlessly.

2. Explain the difference between Series and DataFrame in Pandas.

Answer: A Series is a one-dimensional labeled array capable of holding any data type, while a DataFrame is a two-dimensional labeled data structure resembling a table, where data is stored in rows and columns.

3. How can you read data from a CSV file into a DataFrame?

Answer: Pandas provides the read_csv function to read data from a CSV file into a DataFrame. For example:

import pandas as pddf = pd.read_csv('filename.csv')

4. What is the purpose of the loc and i loc methods?

Answer: loc is label-based indexing, and i loc is integer-location based indexing. loc is used for selecting data by labels, and i loc is used for selecting data by integer index.

5. Explain the concept of missing values in Pandas.

Answer: Missing values in Pandas are represented by the NaN (Not a Number) value. Pandas provides functions like isnull() and dropna() to identify and handle missing values.

To learn more, visit this website: business operations manager recruiters.

6. How can you handle duplicate values in a DataFrame?

Answer: To handle duplicate values, you can use the drop_duplicates() method, which removes duplicate rows from the DataFrame.

df.drop_duplicates(inplace=True)

7. What is the use of the group by function in Pandas?

Answer: The group by function in Pandas is used to split data into groups based on some criteria and then apply a function to each group independently.

8. How do you concatenate two DataFrames in Pandas?

Answer: The concat() function is used to concatenate two DataFrames along a particular axis, either row-wise or column-wise.

result = pd.concat([df1, df2], axis=1) # Concatenate along columns

9. Explain the purpose of the merge function in Pandas.

Answer: The merge function is used to merge two or more DataFrames based on a common column or index, similar to SQL joins.

merged_df = pd.merge(df1, df2, on='common_column')

10. How can you apply a function to all elements in a DataFrame?

Answer: The apply() function can be used to apply a function along the axis of a DataFrame, either row-wise or column-wise.

df.apply(lambda x: x*2) # Example: Multiply all elements by 2

11. What is the purpose of the pivot_table function in Pandas?

Answer: The pivot_table function is used to create a spreadsheet-style pivot table as a DataFrame, providing a convenient way to summarize and analyze data.

pivot_df = pd.pivot_table(df, values='value', index='index', columns='columns', aggfunc='sum')

12. How can you change the data type of a column in a DataFrame?

Answer: The astype() method can be used to change the data type of a column in a DataFrame.

df['column_name'] = df['column_name'].astype('new_data_type')

13. Explain the resample function in Pandas.

Answer: The resample function is used for frequency conversion and resampling of time-series data. It can be used to upsample or downsample time-series data.

df.resample('D').sum() # Resample data to daily frequency

14. What is the purpose of the pd.cut function in Pandas?

Answer: The pd.cut function is used to segment and sort data values into bins, allowing for easier analysis of continuous data.

bins = [0, 25, 50, 75, 100]labels = ['Low', 'Medium', 'High', 'Very High']df['category'] = pd.cut(df['values'], bins=bins, labels=labels)

15. How do you handle outliers in a DataFrame?

Answer: Outliers can be handled by filtering data based on a defined range or using statistical methods like Z-score to identify and remove outliers.

mean = df['column'].mean()std = df['column'].std()df_no_outliers = df[(df['column'] > mean - 2 * std) & (df['column'] < mean + 2 * std)]

16. Explain the concept of hierarchical indexing in Pandas.

Answer: Hierarchical indexing, or MultiIndex, enables the creation of DataFrames with multiple index levels, allowing for more complex and hierarchical data structures.

17. How can you handle datetime data in Pandas?

Answer: Pandas provides the to_datetime function to convert a column to datetime format. Additionally, various date time-related functions can be applied to manipulate and extract information from datetime columns.

df['datetime_column'] = pd.to_datetime(df['datetime_column'])

18. What is the purpose of the cum sum function in Pandas?

Answer: The cum sum function is used to calculate the cumulative sum of a column in a DataFrame, creating a new column that shows the running total.

df['cumulative_sum'] = df['column'].cumsum()

19. How do you export data from a DataFrame to a CSV file?

Answer: The to_csv method is used to export data from a DataFrame to a CSV file.

df.to_csv('output_file.csv', index=False)

20. Explain the stack and unstack functions in Pandas.

Answer: The stack function is used to pivot the columns of a DataFrame to rows, while unstack does the reverse operation, pivoting rows into columns.

stacked_df = df.stack()unstacked_df = stacked_df.unstack()

Mastering these Python Pandas interview questions will not only help you in showcasing your expertise but also in handling real-world data manipulation and analysis tasks effectively. Keep practicing and exploring more advanced features of Pandas to stay ahead in the dynamic field of data science.

Also read

data science in india

data analyst course in pune

data analytics courses in mumbai

Education