The XLRD python module for reading data in spreadsheets

The XLRD python module for reading data in spreadsheets - Quickstart

In the previous post we have seen, how to install xlrd module. If you already have xlrd module installed on your computer then go ahead with this section other wise install the module having a look from here: The xlrd python module for reading excel spreadsheets - Installation.

Eager to get started?This post will give you a good introduction to get started with the xlrd.

First make sure that

xlrd is installed on your computer.$ pip freeze. Check if xlrd is present in the list.
xlrd is up-to-date. In IDLE xlrd.info.__VERSION__ to see the version and check if you are up-to-date.

Lets get started with something simple.

Terminology

Excel files are also called as workbooks or simply books. Also there can be n number of sheets in a given workbook.

Sheets in Excel File or workbook

A cell is a box which contains only one entry or a single piece of data.

Cell in Excel Worksheet

A row is a horizontal group of cells.

Row in Excel worksheet

A column is a vertical group of cells.

Column in excel worksheet

Working with workbook

Lets start with a small python script(script#1).

import xlrd

#Opening the Workbook
workbook = xlrd.open_workbook('roc.xlsx')

#Opening the workbook based on the sheet name
sheet = workbook.sheet_by_name('Sheet1')

#Reading the value of a cell
#sheet.cell_value(row,col)
cell_2_3 = sheet.cell_value(2,3)

#Print the cell value
print(cell_2_3)

If you will run the above script then you will get the following output:

3.0

So what actually happened in the code?

First we have imported the xlrd module.
We have opened our workbook using open_workbook.
We then opened the sheet in the workbook using its name with the help of sheet_by_name
After we have opened the sheet we have accessed the value of the cell in row 2 and column 3 using sheet.col_value(rowx,coly)
Finally we have printed the value of the variable cell_2_3.

As simple as it looks.
Now we will see some functions in details so that by the end of this post you will be comfortable to program with xlrd.

open_workbook

When you will call this function with the on_demand argument then the user will save time and memory by only loading the sheets which are required and releasing the sheets that are not required.

By default on_demand = False.
If you will use on_demand = True with BIFF <5.0 then you will get an error message and on_demand is changed to the default False
If you will use on_demand = True with BIFF version ≥ 5.0 then all the global data is loaded and when the users requests a required sheets, the required sheet is opened and all the other sheets are released, thus saving memory and time.

So we can use open_workbook function as follows:

xlrd.open_workbook(filename='yourfilename.xls', ondemand=True)

An another important argument you need to know is the encoding_override. If you have read the introduction, then we have discussed that xlrd by default uses Unicode encoding. But when your excel file is not having the Unicode encoding then xlrd will try to convert the encoding your excel file has to Unicode. But sometimes python(xlrd) will not be able to convert all the encoding to Unicode and it will raise an error. At that time you will have to add this argument to the open_workbook function and you will be good to go.

book = xlrd.open_workbook(filename='yourfilename.xls',encoding_override="cp1252")

Lets dig deep... Open your interpreter and we will see few other important things that we can do with the book object that we have created earlier. Let's start again from scratch.

>>> import xlrd

Opening the workbook

>>> book = xlrd.open_workbook(filename='roc.xlsx', on_demand=True)

To see who created the workbook(or the author of the workbook):

>>> book.user_name

To see the number of sheets the workbook posses:

>>> book.nsheets

To see the names of the sheets:

>>> book.sheet_names()

To open the sheet based on the name of the sheet:

>>> book.sheet_by_name('Name of sheet')

Sheets in excel sheets are indexed as 0,1,2,3... corresponding to first sheet, second sheet, third sheet... respectively. So to open the sheet based on the index:

>>> book.sheet_by_index(0/1/2....)

To check if the sheet is loaded:

>>> book.sheet_loaded(0/1/2..) #Check the loaded sheet by index
>>> book.sheet_loaded('Sheet name') #Check the loaded sheet by its name

To Unload the sheet which we have loaded or opened:

>>> book.unload_sheet(0/1/2...) #Based on index
>>> book.unload_sheet('Sheet name') #Based on its name

Even though we have loaded the workbook all our data is stored in the form of sheets and thus we will work on sheets now.
But unlike just calling the function we will store the value returned by the function sheet_by_index or sheet_by_name after loading the sheet. See the following:

>>> sheet = book.sheet_by_index(0) #Enter the index of the sheet

>>> sheet = book.sheet_by_name('Sheet1') #Enter the name of the sheet

See the name of the sheet:

>>> sheet.name

Check if the sheet is visible:

>>> sheet.visibility

0 means visible
1 means hidden

Find the number of columns(Total number of vertical columns in the sheet):

>>> sheet.ncols

Find the number of rows in the sheet(Total number of horizontal rows in the sheet):

>>> sheet.nrows

To get the length of a given row:

>>> sheet.row_len(row_number)

Get the cell value:

>>> sheet.cell_value(row_number, column_number)

Get the cell value including the data type:

>>> sheet.cell(row_num, col_num)

You can also access the value of the cell using sheet.cell method as follows:

>>> sheet.cell(row_num,col_num).value

To find the data type of the cell:

>>> sheet.cell_type(row_num,col_num)

But if you will call the above function you will only get numbers which correspond to the following table:

Table 1
Number	Data type
0	Empty String
1	String which is not empty
2	float - number
3	Date as a float number
4	Boolean Where 1 - True; 0 - False
5	int - number

To get all the column values:

>>> sheet.col_values(col_num)

Get the values of the column after slicing:

>>> sheet.col(col_num)

To get the column values along with the data type:

>>> sheet.col_slice(col_num, start_rowx=some-value, end_rowx=some-value)

If the values of start_row and end_row are not given the default values are assumed. The default values are start_rowx=0, end_rowx=None. So if you will call the function without passing all the parameters, then you will get all the values of the column.

>>> sheet.col_slice(col_num) #works sames as sheet.col_values(col_num)

Find the data types of the column:

>>> sheet.col_types(col_num)

Running the above code will give you a list with numbers. See table 1 to find the data type of each and every element in the column.

To get the row values:

>>> sheet.row_values(row_num)

To get the row values along with the data type:

>>> sheet.row(row_num)

Get the values of the row after slicing:

>>> sheet.row_slice(row_num)

row_slice works similar to col_slice. But here the optional parameters are start_colx and end_colx which by default are equal to 0 and None respectively.

To find the data types of the row:

>>> sheet.row_types(row_num)

To get the list of merged cells in the spreadsheet:

>>> sheet.merged_cells

According to official documentation this will only work if open_workbook() which we call to open the workbook, is called with parameter formatting_info=True. But when I tried it without passing the argument value to True, it worked!

Summary

We have seen all the basic functions and operations which we can do with the xlrd module. Using xlrd we can also go further and get all the UI info of the sheet. If you want me to add about the xf class for getting all the UI information, please do comment in the comment box below and I will be glad to add it in this post.

As always I have tried to write each and every part in this post so that it is easy for everyone to understand. But if you have any doubt or didn't understand anything then please do comment in the comment box below. I will reply to you as soon as possible. You can contact me if you want to from here: Contact me

Also please do comment on how I can improve this post so that it will be helpful even for beginners. Please don't hesitate to comment if I have made any typo or you want me to add anything to this post.

"Gaining knowledge is the first step to wisdom,
Sharing knowledge is the first step to humanity. "

Thank you, have a nice day :)

References:

How can I get a list of locally installed python modules
Official xlrd module documentation

Problem 60 Project Euler Solution with python

Prime pair sets The primes 3, 7, 109, and 673, are quite remarkable. By taking any two primes and concatenating them in any order the result will always be prime. For example, taking 7 and 109, both 7109 and 1097 are prime. The sum of these four primes, 792, represents the lowest sum for a set of four primes with this property. Find the lowest sum for a set of five primes for which any two primes concatenate to produce another prime. This problem is j u st a brute force problem. If you have come here because you don't know the limit upto which you will h ave to gener ate the prime numbers t hen go ahe ad and t r y with 10,000 . When I first start ed solving the problem I chose 1 million(beca use most of the problem s on project E uler have this limit ), but it took very long for the computer to fin d the solution. After searching on the internet then I found many people choosing 10, 000 so I have changed my in put f rom 1 million to 10000 and the output was f ast. He...

RADIUS OF CIRCLE

Search This Blog