Python — Spreadsheet Data Query

Jupyter Notebook and Pandas library

shah.hassan
2 min readOct 23, 2021

You know what, I can’t run away from SPREADSHEET. It’s following me doesn’t matter where I am or how well I structurally store the data at the database level; at the end of the day the same data will hit my table in the spreadsheet file format 😂 I met Pandas back in 2013 and this Python library is an awesome weapon — with active community. We can write query, chop and combine the data to get the expected outcome with Pandas. Check this simple use case (countries & population query) as your future reference and let’s fight spreadsheet in more heroic way. So.., who needs SQL? 😛

Query — Single Column

Objective : Select country in ASIA.

Query — Multiple Column

Objective : Select country in ASIA with population greater than 30M.

Full Script

import pandas as pdimport numpy as npfile_path = “/Users/johndoe/Desktop/Data Science/Countries of the world.xls”pd.set_option(‘display.float_format’, ‘{:.0f}’.format)df = pd.read_excel(file_path, skiprows=3)display(df)byregion = df[df[‘Region’].str.contains(“ASIA”, na=False)]display(byregion)df[‘Population’] = df[‘Population’].fillna(0)df[‘Population’] = df[‘Population’].astype(int)byregion = df[df[‘Region’].str.contains(“ASIA”, na=False) & df[‘Population’].ge(30000000)]display(byregion)

--

--

shah.hassan

Software engineering and film. All are work-in-progress.