img

Welcome back to using Pandas. In this part, we'll continue our analysis using various Pandas functions and techniques, here we use the Titanic dataset for exploration of the data analysis.

Recap

Before diving in, let's quickly recap what we've done so far. We've loaded the Titanic dataset into a Pandas DataFrame, examined its structure, data types, and performed basic data exploration.

import pandas as pd
df=pd.read_csv("https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv")
df.head()
type(df)
df.dtypes
df.describe()
Now Start to
Exploring Pandas Series

In this blog post, we'll delve into the Pandas library and explore Series, one of its fundamental data structures. We'll go through various operations and methods to understand Series better.

Series

A Pandas Series is a one-dimensional labeled array capable of holding data of any type. It consists of two arrays: one containing the data (values) and the other containing the corresponding labels (index).

Let's start by creating a Series from the 'Name' column of the Titanic dataset.

import pandas as pd
df = pd.read_csv("https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv")
df.columns
s = df['Name'][0:10]

Basic Operations on Series

Let's explore some basic operations on Series:

len(s)
type(s)
s[0]
l = ['sudh','b','c','d','e','f','g','h','i','j']
s1 = pd.Series(list(s), index=l)
s1

Combining Series

We can combine two or more Series using the append() method.

s2 = s1.append(s)
s2

Mathematical Operations on Series

Series support mathematical operations, and they align the data based on the index labels.

s4 = pd.Series([3, 4, 5, 6, 6], index=[2, 4, 5, 6, 1])
s5 = pd.Series([34, 345, 45, 45, 454], index=[9, 4, 5, 6, 7])
s6 = s4.append(s5)
s6
s6[4]
s6[0:5]
s4 * s5
s4 + s5

we've explored Pandas Series and learned about their creation, basic operations, combining multiple Series, and performing mathematical operations. Series are a powerful and versatile data structure in Pandas, and they play a crucial role in data manipulation and analysis.

Pandas provides two primary accessors for selecting subsets of data:
  • iloc Integer-location based indexing.
  • loc Label-based indexing.
iloc

iloc is primarily integer-location based, meaning it is used to select data based on the numerical index of rows and columns.

df.iloc[0:2, [0, 1, 2]]
df.iloc[0:2, df.columns.get_loc('PassengerId'):df.columns.get_loc('Pclass')+1]
loc

loc, on the other hand, is label-based indexing. It selects data based on the labels of rows and columns.

df.loc[0:2, ['PassengerId', 'Survived', 'Pclass']]
df.loc[0:2, 'PassengerId':'Pclass']

In this part of our exploration, we've learned more about using Pandas for data analysis, including iloc and loc for selecting specific subsets of data. These techniques are fundamental for exploring and understanding any dataset using Pandas.

Stay tuned for more insights and analysis!