Create and Modify Excel File Using Python

Almas Myrzatay
DataDrivenInvestor
Published in
8 min readOct 5, 2022

--

Some might disagree, but I think the world runs on Excel. Whether it is government agencies, corporation small or large, each of them use spreadsheet and most notably Microsoft product — Excel.

In particular, I want to show how to create editable charts. Previously, I showed how we can generate dynamic graphs and charts using Python, which in turn can be exported. However, they are non-editable. In this case, I want to highlight how we can automate the process of Excel file generation and create editable charts.

Let’s begin with discussion of more in-depth motivation of why using programming language like Python is useful to generate Excel file.

Table of Content

  1. Motivation
  2. Environment & Workspace Set Up
  3. Create and/or Read Excel File
  4. Two Ways to Add or Get Data
  5. Generate Editable Chart using Python
  6. Conclusion

1. Motivation

So why create Excel file in programming language like Python if we can manually create the file?

First, in this day and age we tend to work with large datasets. Furthermore, they are constantly growing and changing (and often in real time). As a result, we are in need of tools that will allow us connecting to databases and/or fetching data via from somewhere (e.g. API calls). In sum, we need to process large amounts of data that are constantly being updated and pulling data from Excel file directly could be both cumbersome and slow.

Second, we can automate the process of Excel file generation. In cases when one has large models, this might not be the case. But imagine a use case where we may want to automate Excel file generation with one click of a button. The process will allow us automation of file generation and will make it easy to make adjustments in the future.

2. Environment & Workspace Set Up

In the previous article, I have shown how to set up virtual environment in Python and install dependencies. Follow the link here and refer to Section 2. Set up Python Environment & Plotly Installation

Now, clone this GitHub repo into your local computer using your IDE (e.g. VSCode):

git clone https://github.com/AlmasM/Medium.git

Then navigate to directory:

py_sample_excel_spreadsheet

You will see requirements.txt file. Install dependencies into your virtual environment:

pip install -r requirements.txt

This will install all the necessary dependencies we use in this tutorial

At this point you have workstation set up and we are ready to go into next steps.

Create and/or Read Excel File

Navigate to excel_file.py within cloned repo and you will that we are creating Python class ExcelBook. Let’s break down process line by line:

Lines 1–8: important statements we use including chart creation, openpyxl library that let’s us create and modify Excel file using Python.

Line 11–17: we are creating Python class ExcelBook with constructor that contains variables that will be accessible within this class.

Lines 19–24: create auxiliary method that checks if file with given name exists and return boolean value.

If you want to invoke this class, you can create main.py file and write the following:

import os
from excel_file import ExcelBook
cwd = os.getcwd() # get current working directory
fileName = "SampleSpreadsheet.xlsx" # give it a name
filePath = os.path.join(cwd, fileName) # create file path
excelObject = ExcelBook(filePath) # instantiate object
isFileExists = excelObject.fileExists

The following code will output whether if file with given name exists. Note that we are just validating file exists, but not yet whether it is an Excel file

Now, let’s create two methods getFile and createFile, which will take as an input and either fetch Excel file or create if one doesn’t exist. In this case, we will ensure the file is loadable (aka is valid Excel file)

Both methods getFile and createFile use aux method fileExists we created above. Depending on the outcome of the aux method, we will load the notebook (Excel file) or create one with openpyxl methods load_worksheet or create_sheet, respectively.

In case of getFile, we take as a parameter name of the worksheet. Remember that Excel file can have more than one worksheet. In this method, we will first see if requested sheet exists. If sheet DNE, we will create one.

In the createFile, we verified file DNE, and then we will try to create file and worksheet as shown on lines 50–57.

Now, let’s use instantiated object excelBook and create or get file.

workSheetName = "Sheet-List"
currentWorkSheet = excelObject.getFile(workSheetName)
if not currentWorkSheet:
isFileCreated = excelObject.createFile(workSheetName)

Recall that we require sheet name provided for either getting or creating the file. The snippet will get file if one exists with given sheet name and if not, we will create file. Also note that we have already created excelObject at this point and there is no need to recreate it again.

Two Ways to Add or Get Data

There are at least two ways to add data into Excel worksheet. If you have ever worked with Python and data, chances that you have seen pandas library is pretty high. In this section we will add into Excel using random data structure and pandas dataframe. We will also use both methods to fetch the data.

You may wonder, why even use openpyxl library if pandas gives us methods like to_excel() or read_excel() ?

It is true that you can use pandas.to_excel() method to convert pandas dataframe into Excel format. You can also use pandas.read_excel() method to read Excel file as Pandas dataframe. In general, pandas provides methods to fetch and manipulate Excel file to do some calculations. However, these methods are limited in functionality. In other words, pandas doesn’t let you create charts or various formulas into the spreadsheet like openpyxl does.

Let’s start with adding the data. Lines 62–65 show how we are takin a list of lists and adding each list as a new row into the spreadsheet.

Lines 67–76: library openpyxl provides method dataframe_to_rows() which automatically converts pandas dataframe into Excel file along with header and index column. We can also specify styling as in lines 73–74.

Note in both cases we need to .save() the workbook we are working on, or else data won’t be displayed.

You can invoke both methods above on our excelObject as follows:

data = [ ['Technology Type', 'Market Share(%)'], ['Phone', 50], ['PC', 20], ['Tablets', 20],['Smart Watch', 10]]df = pd.DataFrame(data)

In the snippet above, we created sample dataset and dataframe using pandas.

Now, we can write either:

excelObject.addData(data)ORexcelObject.addDataFromDataFrame(df)

We may need to use one or the other depending on how our dataset is incoming and being preprocessed. Ultimately it comes down to your project requirements when going with one or the other

Fetching the data will be similar process. Use excelObject such as this:

data_result = excelObject.getDataRowByRow()ORdf_result = excelObject.getDataAsDataFrame()

Note that we haven’t demonstrated methods getDataRowByRow() or getDataAsDataFrame() yet. The point was to illustrate the benefit of creating a class and how we access it’s values when using a class.

In general, reading data using openpyxl library is pretty straightforward.

Lines 78–85: we iterate row by row of the spreadsheet and create list of lists to store the data

Lines 87–91: we use built in method worksheet.values to fetch only the values of the worksheet where we are working. Note that it is openpyxl library .values that gives us only values which in turn are converted into dataframe.

The last part of this tutorial is to create editable graph in Python and be exported in Excel. I believe all of the work was leading up to this feature.

Generate Editable Chart using Python

Again, we could have used pandas helper methods to work with Excel file, however, this is the benefit of using openpyxl library. We can create editable charts in the Excel. In fact, we can do much more and I recommend checking out their documentation to view their capabilities (e.g. pivot tables, formulas, styling, etc).

In the snippet below we are creating pie chart and passing in data to display. Lines 95–96 define labels and data that will be passed into the chart itself. We can also define where the chart can start on line 101 by setting start cell to be D1. Lastly, ensure save the changes or otherwise it won’t be reflected.

To invoke this file we can use sample data we created above and use excelObject again. All we need to do is pass in data and write following invocation:

excelObject.createPieChart(data)

The final view will be editable pie chart along with inserted data as shown below:

You can find working source code in GitHub page here.

Conclusion

In article we created and modified Excel file using Python. Why do it outside of Excel file itself? Because datasets we work with are growing in size and/or there might be a need to automate the process of Excel file generation.

In general, if you need to create complex Excel files for budgeting with many formulas, this might not be the most useful path. However, imagine scenario where user may want to export their dashboard chart in an editable format with just one click. In this case, you won’t be able to generate Excel file manually for each user. As a result, using programming language like Python to automate the process might come in handy.

Thanks for reading my article!

Check out my other stories and make sure to follow me for more beginner friendly tech content!

If you liked it, or have any comments/questions, let me know! Feel free to connect on social media: Instagram and LinkedIn

Subscribe to DDIntel Here.

Join our network here: https://datadriveninvestor.com/collaborate

--

--