Welcome back to using Pandas. In this part, we'll continue our analysis using various Pandas functions and techniques, here we use the Titanic dataset for exploration of the data analysis.
Recap
Before diving in, let's quickly recap what we've done so far. We've loaded the Titanic dataset into a Pandas DataFrame, examined its structure, data types, and performed basic data exploration.
import pandas as pd
df=pd.read_csv("https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv")
df.head()
type(df)
df.dtypes
df.describe()
Now Start to
Exploring Pandas Series
In this blog post, we'll delve into the Pandas library and explore Series, one of its fundamental data structures. We'll go through various operations and methods to understand Series better.
Series
A Pandas Series is a one-dimensional labeled array capable of holding data of any type. It consists of two arrays: one containing the data (values) and the other containing the corresponding labels (index).
Let's start by creating a Series from the 'Name' column of the Titanic dataset.
import pandas as pd
df = pd.read_csv("https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv")
df.columns
s = df['Name'][0:10]
Basic Operations on Series
Let's explore some basic operations on Series:
len(s)
type(s)
s[0]
l = ['sudh','b','c','d','e','f','g','h','i','j']
s1 = pd.Series(list(s), index=l)
s1
Combining Series
We can combine two or more Series using the append() method.
s2 = s1.append(s)
s2
Mathematical Operations on Series
Series support mathematical operations, and they align the data based on the index labels.
s4 = pd.Series([3, 4, 5, 6, 6], index=[2, 4, 5, 6, 1])
s5 = pd.Series([34, 345, 45, 45, 454], index=[9, 4, 5, 6, 7])
s6 = s4.append(s5)
s6
s6[4]
s6[0:5]
s4 * s5
s4 + s5
we've explored Pandas Series and learned about their creation, basic operations, combining multiple Series, and performing mathematical operations. Series are a powerful and versatile data structure in Pandas, and they play a crucial role in data manipulation and analysis.
Pandas provides two primary accessors for selecting subsets of data:
- iloc Integer-location based indexing.
- loc Label-based indexing.
iloc
iloc is primarily integer-location based, meaning it is used to select data based on the numerical index of rows and columns.
df.iloc[0:2, [0, 1, 2]]
df.iloc[0:2, df.columns.get_loc('PassengerId'):df.columns.get_loc('Pclass')+1]
loc
loc, on the other hand, is label-based indexing. It selects data based on the labels of rows and columns.
df.loc[0:2, ['PassengerId', 'Survived', 'Pclass']]
df.loc[0:2, 'PassengerId':'Pclass']
In this part of our exploration, we've learned more about using Pandas for data analysis, including iloc and loc for selecting specific subsets of data. These techniques are fundamental for exploring and understanding any dataset using Pandas.
Stay tuned for more insights and analysis!