In this chapter you will learn about one of the most essential data structure required for data analysis and cleaning , that is DataFrame.
DataFrame
A DataFrame is a two dimensional labelled dataset having rows and columns . All tabular datasets such as excel spreadsheets, SQL tables, etc. can be seen as examples of DataFrames.
A DataFrame is made of one or more than one series.
|
|
This is an example of a tabular dataset or DataFrame having multiple rows and columns. In this example, a table/DataFrame is used to organise report cards of different students with respect to their corresponding registration number and name.
A DataFrame is very convenient and efficient for organising data and studying it.
The basic method used to create a dataframe is to call
pandas.DataFrame( data, index, columns, dtype, copy)
Here pd is pandas alias/nickname
Here we import pandas with the alias pd and then we call the pd.DataFrame() function to create a pandas DataFrame, inside this function we pass the data/datum of which we would like to create a DataFrame of . Here {"Roll No.":[1,2,3,4,5,6], "Name":["Ali","Raj","Karan","Mohan","Tina","Reena"],"Marks(%)":[98,90,87,88,76,79]} is passed as the data inside the pd.DataFrame().
OUTPUT
|
|
Here each column (Roll No., Name, Marks(%)) is a Pandas Series containing different data types (string and integer).
The Pandas DataFrame provides immense help in organising, viewing and studying data.
We can use the index and columns parameter of the function to create a pandas DataFrame where the indices and the columns are according to the arguments given.
Here pd is pandas alias/nickname
Here [ [1, 2, 3], [11 , 22, 33], [111, 222, 333] ] ( a two dimensional list ) is passed as the data for the DataFrame, index = ["1st", "2nd", "3rd"] is passed as the index for the DataFrame and columns = ["a", "b", "c"] are passed as the columns for the DataFrame.
OUTPUT