Data Science Practical Exam Programs
- Get link
- X
- Other Apps
Data Science Practical Exam Programs
slip 1 & slip 11
Write a Python program to create a Pie plot to get the frequency of the three species of the Iris data (Use iris.csv)
import pandas as pd
df2 = pd.read_csv("Iris.csv")
df2
Id | SepalLengthCm | SepalWidthCm | PetalLengthCm | PetalWidthCm | Species | |
---|---|---|---|---|---|---|
0 | 1 | 5.1 | 3.5 | 1.4 | 0.2 | Iris-setosa |
1 | 2 | 4.9 | 3.0 | 1.4 | 0.2 | Iris-setosa |
2 | 3 | 4.7 | 3.2 | 1.3 | 0.2 | Iris-setosa |
3 | 4 | 4.6 | 3.1 | 1.5 | 0.2 | Iris-setosa |
4 | 5 | 5.0 | 3.6 | 1.4 | 0.2 | Iris-setosa |
... | ... | ... | ... | ... | ... | ... |
145 | 146 | 6.7 | 3.0 | 5.2 | 2.3 | Iris-virginica |
146 | 147 | 6.3 | 2.5 | 5.0 | 1.9 | Iris-virginica |
147 | 148 | 6.5 | 3.0 | 5.2 | 2.0 | Iris-virginica |
148 | 149 | 6.2 | 3.4 | 5.4 | 2.3 | Iris-virginica |
149 | 150 | 5.9 | 3.0 | 5.1 | 1.8 | Iris-virginica |
150 rows × 6 columns
import pandas as pd
import matplotlib.pyplot as plt
iris = pd.read_csv("Iris.csv")
iris['Species'].value_counts().plot.pie()
plt.title("Iris Species %")
plt.show()
import pandas as pd
import matplotlib.pyplot as plt
iris = pd.read_csv("Iris.csv")
pie = iris['Species'].value_counts()
pie
Iris-setosa 50 Iris-versicolor 50 Iris-virginica 50 Name: Species, dtype: int64
iris["PetalWidthCm"].value_counts()
0.2 28 1.3 13 1.8 12 1.5 12 1.4 8 2.3 8 1.0 7 0.4 7 0.3 7 0.1 6 2.1 6 2.0 6 1.2 5 1.9 5 1.6 4 2.5 3 2.2 3 2.4 3 1.1 3 1.7 2 0.6 1 0.5 1 Name: PetalWidthCm, dtype: int64
B) Write a Python program to view basic statistical details of the data.(Use wineequality-red.csv)
wine = pd.read_csv("wineequality-red.csv")
wine
fixed acidity;"volatile acidity";"citric acid";"residual sugar";"chlorides";"free sulfur dioxide";"total sulfur dioxide";"density";"pH";"sulphates";"alcohol";"quality" | |
---|---|
0 | 7.4;0.7;0;1.9;0.076;11;34;0.9978;3.51;0.56;9.4;5 |
1 | 7.8;0.88;0;2.6;0.098;25;67;0.9968;3.2;0.68;9.8;5 |
2 | 7.8;0.76;0.04;2.3;0.092;15;54;0.997;3.26;0.65;... |
3 | 11.2;0.28;0.56;1.9;0.075;17;60;0.998;3.16;0.58... |
4 | 7.4;0.7;0;1.9;0.076;11;34;0.9978;3.51;0.56;9.4;5 |
... | ... |
1594 | 6.2;0.6;0.08;2;0.09;32;44;0.9949;3.45;0.58;10.5;5 |
1595 | 5.9;0.55;0.1;2.2;0.062;39;51;0.99512;3.52;0.76... |
1596 | 6.3;0.51;0.13;2.3;0.076;29;40;0.99574;3.42;0.7... |
1597 | 5.9;0.645;0.12;2;0.075;32;44;0.99547;3.57;0.71... |
1598 | 6;0.31;0.47;3.6;0.067;18;42;0.99549;3.39;0.66;... |
1599 rows × 1 columns
wine.describe()
fixed acidity;"volatile acidity";"citric acid";"residual sugar";"chlorides";"free sulfur dioxide";"total sulfur dioxide";"density";"pH";"sulphates";"alcohol";"quality" | |
---|---|
count | 1599 |
unique | 1359 |
top | 7.2;0.36;0.46;2.1;0.074;24;44;0.99534;3.4;0.85... |
freq | 4 |
wine.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 1599 entries, 0 to 1598 Data columns (total 1 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 fixed acidity;"volatile acidity";"citric acid";"residual sugar";"chlorides";"free sulfur dioxide";"total sulfur dioxide";"density";"pH";"sulphates";"alcohol";"quality" 1599 non-null object dtypes: object(1) memory usage: 12.6+ KB
slip 2 and slip 6
Q.2 A) Write a Python program for Handling Missing Value. Replace missing value of salary, age column with mean of that column.(Use Data.csv file).
import pandas as pd
import matplotlib.pyplot as plt
data = pd.read_csv("Data.csv")
data
name | age | salary | |
---|---|---|---|
0 | swapnil | 22.0 | 300.0 |
1 | raj | NaN | 233.0 |
2 | ajay | NaN | NaN |
3 | vijay | 32.0 | 234.0 |
4 | saurabh | 23.0 | NaN |
5 | sonny | NaN | 234.0 |
data["age"] = data['age'].fillna(data['age'].mean())
data
name | age | salary | |
---|---|---|---|
0 | swapnil | 22.000000 | 300.0 |
1 | raj | 25.666667 | 233.0 |
2 | ajay | 25.666667 | NaN |
3 | vijay | 32.000000 | 234.0 |
4 | saurabh | 23.000000 | NaN |
5 | sonny | 25.666667 | 234.0 |
data["salary"] = data['salary'].fillna(data['salary'].mean())
data
name | age | salary | |
---|---|---|---|
0 | swapnil | 22.000000 | 300.00 |
1 | raj | 25.666667 | 233.00 |
2 | ajay | 25.666667 | 250.25 |
3 | vijay | 32.000000 | 234.00 |
4 | saurabh | 23.000000 | 250.25 |
5 | sonny | 25.666667 | 234.00 |
Q.2 B) Write a Python program to generate a line plot of name Vs salary [5]
plt.plot(data['name'],data['salary'])
plt.xlabel("name")
plt.ylabel("salary")
plt.show()
Download the heights and weights dataset and load the dataset froma given csv file into a dataframe. Print the first, last 10 rows and random 20 rows also display shape of the dataset.
hw = pd.read_csv("height_weight.csv")
hw
height | weight | |
---|---|---|
0 | 7 | 56 |
1 | 6 | 45 |
2 | 5 | 45 |
3 | 5 | 46 |
4 | 4 | 75 |
5 | 6 | 67 |
6 | 6 | 36 |
7 | 4 | 35 |
8 | 8 | 75 |
9 | 6 | 56 |
10 | 5 | 47 |
11 | 4 | 88 |
12 | 8 | 90 |
13 | 5 | 56 |
14 | 3 | 45 |
15 | 5 | 46 |
16 | 4 | 75 |
17 | 6 | 67 |
18 | 6 | 36 |
19 | 4 | 35 |
20 | 8 | 75 |
21 | 6 | 56 |
hw.head(10)
height | weight | |
---|---|---|
0 | 7 | 56 |
1 | 6 | 45 |
2 | 5 | 45 |
3 | 5 | 46 |
4 | 4 | 75 |
5 | 6 | 67 |
6 | 6 | 36 |
7 | 4 | 35 |
8 | 8 | 75 |
9 | 6 | 56 |
hw.tail(10)
height | weight | |
---|---|---|
12 | 8 | 90 |
13 | 5 | 56 |
14 | 3 | 45 |
15 | 5 | 46 |
16 | 4 | 75 |
17 | 6 | 67 |
18 | 6 | 36 |
19 | 4 | 35 |
20 | 8 | 75 |
21 | 6 | 56 |
hw.sample(20)
height | weight | |
---|---|---|
12 | 8 | 90 |
1 | 6 | 45 |
14 | 3 | 45 |
16 | 4 | 75 |
3 | 5 | 46 |
8 | 8 | 75 |
17 | 6 | 67 |
13 | 5 | 56 |
21 | 6 | 56 |
0 | 7 | 56 |
2 | 5 | 45 |
5 | 6 | 67 |
11 | 4 | 88 |
6 | 6 | 36 |
7 | 4 | 35 |
4 | 4 | 75 |
20 | 8 | 75 |
15 | 5 | 46 |
10 | 5 | 47 |
19 | 4 | 35 |
hw.shape
(22, 2)
slip 3
Write a Python program to create box plots to see how each feature i.e. Sepal Length, Sepal Width, Petal Length, Petal Width are distributed across the three species. (Use iris.csv dataset)
iris
Id | SepalLengthCm | SepalWidthCm | PetalLengthCm | PetalWidthCm | Species | |
---|---|---|---|---|---|---|
0 | 1 | 5.1 | 3.5 | 1.4 | 0.2 | Iris-setosa |
1 | 2 | 4.9 | 3.0 | 1.4 | 0.2 | Iris-setosa |
2 | 3 | 4.7 | 3.2 | 1.3 | 0.2 | Iris-setosa |
3 | 4 | 4.6 | 3.1 | 1.5 | 0.2 | Iris-setosa |
4 | 5 | 5.0 | 3.6 | 1.4 | 0.2 | Iris-setosa |
... | ... | ... | ... | ... | ... | ... |
145 | 146 | 6.7 | 3.0 | 5.2 | 2.3 | Iris-virginica |
146 | 147 | 6.3 | 2.5 | 5.0 | 1.9 | Iris-virginica |
147 | 148 | 6.5 | 3.0 | 5.2 | 2.0 | Iris-virginica |
148 | 149 | 6.2 | 3.4 | 5.4 | 2.3 | Iris-virginica |
149 | 150 | 5.9 | 3.0 | 5.1 | 1.8 | Iris-virginica |
150 rows × 6 columns
plt.boxplot(iris["Id"])
{'whiskers': [<matplotlib.lines.Line2D at 0x1b41b07fd90>, <matplotlib.lines.Line2D at 0x1b41b090160>], 'caps': [<matplotlib.lines.Line2D at 0x1b41b0904f0>, <matplotlib.lines.Line2D at 0x1b41b0908b0>], 'boxes': [<matplotlib.lines.Line2D at 0x1b41b07fa60>], 'medians': [<matplotlib.lines.Line2D at 0x1b41b090c40>], 'fliers': [<matplotlib.lines.Line2D at 0x1b41b090fd0>], 'means': []}
plt.boxplot(iris["SepalLengthCm"])
{'whiskers': [<matplotlib.lines.Line2D at 0x1b41b0f8370>, <matplotlib.lines.Line2D at 0x1b41b0f8700>], 'caps': [<matplotlib.lines.Line2D at 0x1b41b0f8a90>, <matplotlib.lines.Line2D at 0x1b41b0f8e20>], 'boxes': [<matplotlib.lines.Line2D at 0x1b41b0e9f70>], 'medians': [<matplotlib.lines.Line2D at 0x1b41b1011f0>], 'fliers': [<matplotlib.lines.Line2D at 0x1b41b101580>], 'means': []}
plt.boxplot(iris["SepalWidthCm"])
{'whiskers': [<matplotlib.lines.Line2D at 0x1b41b15e9d0>, <matplotlib.lines.Line2D at 0x1b41b15ed00>], 'caps': [<matplotlib.lines.Line2D at 0x1b41b16c0d0>, <matplotlib.lines.Line2D at 0x1b41b16c460>], 'boxes': [<matplotlib.lines.Line2D at 0x1b41b15e640>], 'medians': [<matplotlib.lines.Line2D at 0x1b41b16c7f0>], 'fliers': [<matplotlib.lines.Line2D at 0x1b41b16cbb0>], 'means': []}
plt.boxplot(iris["PetalLengthCm"])
{'whiskers': [<matplotlib.lines.Line2D at 0x1b41b32a490>, <matplotlib.lines.Line2D at 0x1b41b32a820>], 'caps': [<matplotlib.lines.Line2D at 0x1b41b32abb0>, <matplotlib.lines.Line2D at 0x1b41b32aee0>], 'boxes': [<matplotlib.lines.Line2D at 0x1b41b32a100>], 'medians': [<matplotlib.lines.Line2D at 0x1b41b3372b0>], 'fliers': [<matplotlib.lines.Line2D at 0x1b41b337640>], 'means': []}
Write a Python program to view basic statistical details of the data (Use Heights and Weights Dataset
hw.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 22 entries, 0 to 21 Data columns (total 2 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 height 22 non-null int64 1 weight 22 non-null int64 dtypes: int64(2) memory usage: 480.0 bytes
hw.describe()
height | weight | |
---|---|---|
count | 22.00000 | 22.000000 |
mean | 5.50000 | 56.909091 |
std | 1.40577 | 17.146176 |
min | 3.00000 | 35.000000 |
25% | 4.25000 | 45.000000 |
50% | 5.50000 | 56.000000 |
75% | 6.00000 | 73.000000 |
max | 8.00000 | 90.000000 |
slip 4 & slip 5
Generate a random array of 50 integers and display them using a line chart, scatter
plot, histogram and box plot. Apply appropriate color, labels and styling options
# this program only for genarate random one integer..ok
import numpy as np
import random as rn
arr = rn.randint(1,100)
arr
89
#this program for genarate random array...okay..
from numpy import random
arr = random.randint(1,100,50)
arr
array([84, 49, 52, 66, 81, 34, 30, 2, 15, 14, 23, 85, 9, 22, 58, 69, 67, 48, 3, 92, 54, 43, 79, 82, 70, 9, 19, 90, 57, 19, 18, 59, 97, 25, 99, 74, 62, 38, 10, 9, 49, 7, 34, 53, 89, 74, 54, 1, 49, 67])
import matplotlib.pyplot as plt
plt.plot(arr,color = "red")
plt.xlabel("x-axis")
plt.ylabel("y-axis")
plt.title("line chart")
Text(0.5, 1.0, 'line chart')
arr2 = random.randint(1,100,50)
plt.scatter(arr,arr2,color= 'red')
plt.xlabel("x-axis")
plt.ylabel("y-axis")
plt.title("scatter plot")
plt.show()
plt.hist(arr,bins=[20,40,60,80,100],color= 'red')
plt.xlabel("x-axis")
plt.ylabel("y-axis")
plt.title("histogram")
plt.show()
plt.boxplot(arr)
plt.xlabel("x-axis")
plt.ylabel("y-axis")
plt.title("boxplot plot")
plt.show()
Write a Python program to print the shape, number of rows-columns, data types, feature names and the description of the data(Use User_Data.csv)
userdata = pd.read_csv("csvdata2.csv")
userdata
id | name | city | phone | |
---|---|---|---|---|
0 | 11 | swapnil | pune | 12344 |
1 | 22 | raj | mumbai | 1234 |
2 | 33 | vijay | patas | 2344 |
3 | 44 | jay | baramti | 87 |
4 | 55 | ajay | roti | 8427 |
userdata.shape
userdata.shape
(5, 4)
userdata.info()
userdata.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 5 entries, 0 to 4 Data columns (total 4 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 id 5 non-null int64 1 name 5 non-null object 2 city 5 non-null object 3 phone 5 non-null int64 dtypes: int64(2), object(2) memory usage: 288.0+ bytes
userdata.dtypes
id int64 name object city object phone int64 dtype: object
userdata.describe()
id | phone | |
---|---|---|
count | 5.000000 | 5.000000 |
mean | 33.000000 | 4887.200000 |
std | 17.392527 | 5267.582624 |
min | 11.000000 | 87.000000 |
25% | 22.000000 | 1234.000000 |
50% | 33.000000 | 2344.000000 |
75% | 44.000000 | 8427.000000 |
max | 55.000000 | 12344.000000 |
slip 7
Write a Python program to perform the following tasks : a. Apply OneHot coding on Country column. b. Apply Label encoding on purchased column (Data.csv have two categorical column the country column, and the purchased column).
data = pd.read_csv("Data1.csv")
data
Country | Age | Salary | Purchased | |
---|---|---|---|---|
0 | France | 44 | 72000 | No |
1 | Spain | 27 | 48000 | Yes |
2 | Germany | 30 | 54000 | No |
3 | Spain | 38 | 61000 | No |
4 | Germany | 40 | Yes | NaN |
5 | France | 35 | 58000 | Yes |
6 | Spain | 52000 | No | NaN |
7 | France | 48 | 79000 | Yes |
8 | Germany | 50 | 83000 | No |
9 | France | 37 | 67000 | Yes |
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
ct = ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [0])], remainder='passthrough')
data = pd.DataFrame(ct.fit_transform(data))
data
0 | 1 | 2 | 3 | 4 | |
---|---|---|---|---|---|
0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 |
1 | 1.0 | 0.0 | 1.0 | 0.0 | 1.0 |
2 | 1.0 | 0.0 | 1.0 | 0.0 | 1.0 |
3 | 1.0 | 0.0 | 1.0 | 0.0 | 1.0 |
4 | 1.0 | 0.0 | 1.0 | 0.0 | 1.0 |
5 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 |
6 | 1.0 | 0.0 | 1.0 | 0.0 | 1.0 |
7 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 |
8 | 1.0 | 0.0 | 1.0 | 0.0 | 1.0 |
9 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 |
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
data.iloc[:,-1] = le.fit_transform(data.iloc[:,-1])
data
0 | 1 | 2 | 3 | 4 | |
---|---|---|---|---|---|
0 | 0.0 | 1.0 | 0.0 | 1.0 | 0 |
1 | 1.0 | 0.0 | 1.0 | 0.0 | 1 |
2 | 1.0 | 0.0 | 1.0 | 0.0 | 1 |
3 | 1.0 | 0.0 | 1.0 | 0.0 | 1 |
4 | 1.0 | 0.0 | 1.0 | 0.0 | 1 |
5 | 0.0 | 1.0 | 0.0 | 1.0 | 0 |
6 | 1.0 | 0.0 | 1.0 | 0.0 | 1 |
7 | 0.0 | 1.0 | 0.0 | 1.0 | 0 |
8 | 1.0 | 0.0 | 1.0 | 0.0 | 1 |
9 | 0.0 | 1.0 | 0.0 | 1.0 | 0 |
Write a program in python to perform following task : [15] Standardizing Data (transform them into a standard Gaussian distribution with a mean of 0 and a standard deviation of 1) (Use winequality-red.csv)
#standardize the values in each column
df_new = (data-data.mean())/data.std()
df_new
0 | 1 | 2 | 3 | 4 | |
---|---|---|---|---|---|
0 | -1.161895 | 1.161895 | -1.161895 | 1.161895 | -1.161895 |
1 | 0.774597 | -0.774597 | 0.774597 | -0.774597 | 0.774597 |
2 | 0.774597 | -0.774597 | 0.774597 | -0.774597 | 0.774597 |
3 | 0.774597 | -0.774597 | 0.774597 | -0.774597 | 0.774597 |
4 | 0.774597 | -0.774597 | 0.774597 | -0.774597 | 0.774597 |
5 | -1.161895 | 1.161895 | -1.161895 | 1.161895 | -1.161895 |
6 | 0.774597 | -0.774597 | 0.774597 | -0.774597 | 0.774597 |
7 | -1.161895 | 1.161895 | -1.161895 | 1.161895 | -1.161895 |
8 | 0.774597 | -0.774597 | 0.774597 | -0.774597 | 0.774597 |
9 | -1.161895 | 1.161895 | -1.161895 | 1.161895 | -1.161895 |
df_new.mean()
0 4.440892e-17 1 -4.440892e-17 2 4.440892e-17 3 -4.440892e-17 4 2.220446e-17 dtype: float64
df_new.std()
0 1.0 1 1.0 2 1.0 3 1.0 4 1.0 dtype: float64
Create two lists, one representing subject names and the other representing marks obtained in those subjects. Display the data in a pie chart.
import matplotlib.pyplot as plt
subnames = ["math","marathi","english","stat","python","java"]
marks = [78,56,45,67,97,75]
plt.pie(marks,labels = subnames)
([<matplotlib.patches.Wedge at 0x1b41e568eb0>, <matplotlib.patches.Wedge at 0x1b41e574460>, <matplotlib.patches.Wedge at 0x1b41e574940>, <matplotlib.patches.Wedge at 0x1b41e574e20>, <matplotlib.patches.Wedge at 0x1b41e57f340>, <matplotlib.patches.Wedge at 0x1b41e57f820>], [Text(0.9163353394095168, 0.608547077677024, 'math'), Text(-0.024799968250606833, 1.099720401545215, 'marathi'), Text(-0.7748890343653689, 0.7807349002192141, 'english'), Text(-1.0984780301579335, -0.057844768651503314, 'stat'), Text(-0.30990577210017567, -1.055442282845914, 'python'), Text(0.9298224053418628, -0.5877331831063078, 'java')])
Write a program in python to perform following task (Use winequality-red.csv ) [5] Import Dataset and do the followings: a) Describing the dataset b) Shape of the dataset c) Display first 3 rows from datasee
wine = pd.read_csv("Data1.csv")
wine
Country | Age | Salary | Purchased | |
---|---|---|---|---|
0 | France | 44 | 72000 | No |
1 | Spain | 27 | 48000 | Yes |
2 | Germany | 30 | 54000 | No |
3 | Spain | 38 | 61000 | No |
4 | Germany | 40 | Yes | NaN |
5 | France | 35 | 58000 | Yes |
6 | Spain | 52000 | No | NaN |
7 | France | 48 | 79000 | Yes |
8 | Germany | 50 | 83000 | No |
9 | France | 37 | 67000 | Yes |
wine.describe()
Age | |
---|---|
count | 10.000000 |
mean | 5234.900000 |
std | 16431.582824 |
min | 27.000000 |
25% | 35.500000 |
50% | 39.000000 |
75% | 47.000000 |
max | 52000.000000 |
wine.shape
(10, 4)
wine.head(3)
Country | Age | Salary | Purchased | |
---|---|---|---|---|
0 | France | 44 | 72000 | No |
1 | Spain | 27 | 48000 | Yes |
2 | Germany | 30 | 54000 | No |
slip 10
Write a python program to Display column-wise mean, and median for SOCRHeightWeight dataset.
hw = pd.read_csv("height_weight.csv")
hw
height | weight | |
---|---|---|
0 | 7 | 56 |
1 | 6 | 45 |
2 | 5 | 45 |
3 | 5 | 46 |
4 | 4 | 75 |
5 | 6 | 67 |
6 | 6 | 36 |
7 | 4 | 35 |
8 | 8 | 75 |
9 | 6 | 56 |
10 | 5 | 47 |
11 | 4 | 88 |
12 | 8 | 90 |
13 | 5 | 56 |
14 | 3 | 45 |
15 | 5 | 46 |
16 | 4 | 75 |
17 | 6 | 67 |
18 | 6 | 36 |
19 | 4 | 35 |
20 | 8 | 75 |
21 | 6 | 56 |
hw.mean()
height 5.500000 weight 56.909091 dtype: float64
hw.median()
height 5.5 weight 56.0 dtype: float64
hw["height"].mean()
5.5
Write a python program to compute sum of Manhattan distance between all pairs of points.
# Using scipy to Calculate the Manhattan Distance
from scipy.spatial.distance import cityblock
x1 = [1,2,3,4,5,6]
x2 = [10,20,30,1,2,3]
print(cityblock(x1, x2))
# Returns: 63
63
print(cityblock(hw["height"],hw["weight"]))
1131
slip 12
Write a Python program to create data frame containing column name, salary, department add 10 rows with some missing and duplicate values to the data frame. Also drop all null and empty values. Print the modified data frame
import pandas as pd
import numpy as np
arr = np.array([['swapnil',50399,'airforce'],
['nihan',50399,'airforce'],
['vijay',24499,'math'],
['jay','cs',"NaN"],
['ajay',None,'airforce'],
['vinu',50399,None],
['bharat',50399,'airforce'],
['suahas','airforce',None],
['nihan',50399,'airforce'],
['kemal',50399,'airforce']
])
df = pd.DataFrame(arr, columns =["name","salary","dapartment"])
df
name | salary | dapartment | |
---|---|---|---|
0 | swapnil | 50399 | airforce |
1 | nihan | 50399 | airforce |
2 | vijay | 24499 | math |
3 | jay | cs | NaN |
4 | ajay | None | airforce |
5 | vinu | 50399 | None |
6 | bharat | 50399 | airforce |
7 | suahas | airforce | None |
8 | nihan | 50399 | airforce |
9 | kemal | 50399 | airforce |
df.isnull()
name | salary | dapartment | |
---|---|---|---|
0 | False | False | False |
1 | False | False | False |
2 | False | False | False |
3 | False | False | False |
4 | False | True | False |
5 | False | False | True |
6 | False | False | False |
7 | False | False | True |
8 | False | False | False |
9 | False | False | False |
df.duplicated()
0 False 1 False 2 False 3 False 4 False 5 False 6 False 7 False 8 True 9 False dtype: bool
df.drop
<bound method DataFrame.drop of name salary dapartment 0 swapnil 50399 airforce 1 nihan 50399 airforce 2 vijay 24499 math 3 jay cs NaN 4 ajay None airforce 5 vinu 50399 None 6 bharat 50399 airforce 7 suahas airforce None 8 nihan 50399 airforce 9 kemal 50399 airforce>
df
name | salary | dapartment | |
---|---|---|---|
0 | swapnil | 50399 | airforce |
1 | nihan | 50399 | airforce |
2 | vijay | 24499 | math |
3 | jay | cs | NaN |
4 | ajay | None | airforce |
5 | vinu | 50399 | None |
6 | bharat | 50399 | airforce |
7 | suahas | airforce | None |
8 | nihan | 50399 | airforce |
9 | kemal | 50399 | airforce |
#df.dropna(inplace=True) <--- used for permanantly delete
df.dropna()
name | salary | dapartment | |
---|---|---|---|
0 | swapnil | 50399 | airforce |
1 | nihan | 50399 | airforce |
2 | vijay | 24499 | math |
3 | jay | cs | NaN |
6 | bharat | 50399 | airforce |
8 | nihan | 50399 | airforce |
9 | kemal | 50399 | airforce |
df
name | salary | dapartment | |
---|---|---|---|
0 | swapnil | 50399 | airforce |
1 | nihan | 50399 | airforce |
2 | vijay | 24499 | math |
3 | jay | cs | NaN |
4 | ajay | None | airforce |
5 | vinu | 50399 | None |
6 | bharat | 50399 | airforce |
7 | suahas | airforce | None |
8 | nihan | 50399 | airforce |
9 | kemal | 50399 | airforce |
slip 13
) Write a Python program to create a graph to find relationship between the petal length and petal width.(Use iris.csv dataset)
iris
Id | SepalLengthCm | SepalWidthCm | PetalLengthCm | PetalWidthCm | Species | |
---|---|---|---|---|---|---|
0 | 1 | 5.1 | 3.5 | 1.4 | 0.2 | Iris-setosa |
1 | 2 | 4.9 | 3.0 | 1.4 | 0.2 | Iris-setosa |
2 | 3 | 4.7 | 3.2 | 1.3 | 0.2 | Iris-setosa |
3 | 4 | 4.6 | 3.1 | 1.5 | 0.2 | Iris-setosa |
4 | 5 | 5.0 | 3.6 | 1.4 | 0.2 | Iris-setosa |
... | ... | ... | ... | ... | ... | ... |
145 | 146 | 6.7 | 3.0 | 5.2 | 2.3 | Iris-virginica |
146 | 147 | 6.3 | 2.5 | 5.0 | 1.9 | Iris-virginica |
147 | 148 | 6.5 | 3.0 | 5.2 | 2.0 | Iris-virginica |
148 | 149 | 6.2 | 3.4 | 5.4 | 2.3 | Iris-virginica |
149 | 150 | 5.9 | 3.0 | 5.1 | 1.8 | Iris-virginica |
150 rows × 6 columns
import pandas as pd
import matplotlib.pyplot as plt
iris = pd.read_csv("Iris.csv")
fig = iris[iris.Species=='Iris-setosa'].plot.scatter(x='PetalLengthCm',y='PetalWidthCm',color='orange', label='Setosa')
iris[iris.Species=='Iris-versicolor'].plot.scatter(x='PetalLengthCm',y='PetalWidthCm',color='blue', label='versicolor',ax=fig)
iris[iris.Species=='Iris-virginica'].plot.scatter(x='PetalLengthCm',y='PetalWidthCm',color='green', label='virginica', ax=fig)
fig.set_xlabel("Petal Length")
fig.set_ylabel("Petal Width")
fig.set_title(" Petal Length VS Width")
fig=plt.gcf()
fig.set_size_inches(12,8)
plt.show()
<AxesSubplot:xlabel='PetalLengthCm', ylabel='PetalWidthCm'>
#or
import pandas as pd
import matplotlib.pyplot as plt
iris = pd.read_csv("Iris.csv")
fig = iris[iris.Species=='Iris-setosa'].plot.scatter(x='PetalLengthCm',y='PetalWidthCm',color='orange', label='Setosa')
iris[iris.Species=='Iris-versicolor'].plot.scatter(x='PetalLengthCm',y='PetalWidthCm',color='blue', label='versicolor')
iris[iris.Species=='Iris-virginica'].plot.scatter(x='PetalLengthCm',y='PetalWidthCm',color='green', label='virginica')
fig.set_xlabel("Petal Length")
fig.set_ylabel("Petal Width")
fig.set_title(" Petal Length VS Width")
fig=plt.gcf()
fig.set_size_inches(12,8)
plt.show()
Write a Python program to find the maximum and minimum value of a given flattened array.
arr1 = np.array([[1,2,4],[7,5,9]])
arr1.max()
9
arr1 = np.array([[1,2,4],[7,5,9]])
arr1.min()
1
slip 16
Write a python program to create a data frame for students’ information such as name, graduation percentage and age. Display average age of students, average of graduation percentage.
import pandas as pd
data = [['swapnil',88,22],['om',88,12],['jay',98,32],['sai',45,25],['didi',83,22],['swapnil',88,22]]
df = pd.DataFrame(data, columns = ["name","percentage","age"])
df
name | percentage | age | |
---|---|---|---|
0 | swapnil | 88 | 22 |
1 | om | 88 | 12 |
2 | jay | 98 | 32 |
3 | sai | 45 | 25 |
4 | didi | 83 | 22 |
5 | swapnil | 88 | 22 |
avg = df["age"].mean()
avg
22.5
per = df["percentage"].mean()
per
81.66666666666667
Write a python program to create two lists, one representing subject names and the other
representing marks obtained in those subjects. Display the data in a pie chart and bar chart.
import matplotlib.pyplot as plt
#sname = [["math"],["bio"],["sci"],["ds"],["hist"],["eng"],["stat"]]
sname = ["math","bio","sci","ds","hist","eng","stat"]
marks = [78,76,58,98,45,66,90]
plt.pie(marks , labels= sname,autopct = '%1.0f%%')
plt.title("pie chart")
plt.show()
sname = ["math","bio","sci","ds","hist","eng","stat"]
marks = [78,76,58,98,45,66,90]
plt.bar(sname,marks)
plt.title("bar plot")
plt.show()
slip 17
Write a Python program to draw scatter plots to compare two features of the iris dataset
import pandas as pd
iris = pd.read_csv("Iris.csv")
iris
Id | SepalLengthCm | SepalWidthCm | PetalLengthCm | PetalWidthCm | Species | |
---|---|---|---|---|---|---|
0 | 1 | 5.1 | 3.5 | 1.4 | 0.2 | Iris-setosa |
1 | 2 | 4.9 | 3.0 | 1.4 | 0.2 | Iris-setosa |
2 | 3 | 4.7 | 3.2 | 1.3 | 0.2 | Iris-setosa |
3 | 4 | 4.6 | 3.1 | 1.5 | 0.2 | Iris-setosa |
4 | 5 | 5.0 | 3.6 | 1.4 | 0.2 | Iris-setosa |
... | ... | ... | ... | ... | ... | ... |
145 | 146 | 6.7 | 3.0 | 5.2 | 2.3 | Iris-virginica |
146 | 147 | 6.3 | 2.5 | 5.0 | 1.9 | Iris-virginica |
147 | 148 | 6.5 | 3.0 | 5.2 | 2.0 | Iris-virginica |
148 | 149 | 6.2 | 3.4 | 5.4 | 2.3 | Iris-virginica |
149 | 150 | 5.9 | 3.0 | 5.1 | 1.8 | Iris-virginica |
150 rows × 6 columns
plt.scatter(iris['SepalLengthCm'],iris['SepalWidthCm'],color ="red")
plt.title("scatter plot")
plt.xlabel("SepalLengthCm")
plt.ylabel("SepalWidthCm")
plt.show()
import pandas as pd
data = [['swapnil',88,22],['om',88,12],['jay',98,32],['sai',45,25],['didi',83,22],['swapnil',88,22]]
df = pd.DataFrame(data, columns = ["name","percentage","age"])
df
name | percentage | age | |
---|---|---|---|
0 | swapnil | 88 | 22 |
1 | om | 88 | 12 |
2 | jay | 98 | 32 |
3 | sai | 45 | 25 |
4 | didi | 83 | 22 |
5 | swapnil | 88 | 22 |
slip 18
Write a Python program to create box plots to see how each feature i.e. Sepal Length, Sepal Width, Petal Length, Petal Width are distributed across the three species. (Use iris.csv dataset)
plt.boxplot(iris["SepalLengthCm"])
plt.show()
plt.boxplot(iris["SepalWidthCm"])
plt.show()
plt.boxplot(iris["PetalLengthCm"])
plt.show()
plt.boxplot(iris["PetalWidthCm"])
plt.show()
Use the heights and weights dataset and load the dataset from a given csv file into a dataframe. Print the first, last 5 rows and random 10 row
hw = pd.read_csv("height_weight.csv")
hw.head(5)
height | weight | |
---|---|---|
0 | 7 | 56 |
1 | 6 | 45 |
2 | 5 | 45 |
3 | 5 | 46 |
4 | 4 | 75 |
hw.tail(5)
height | weight | |
---|---|---|
17 | 6 | 67 |
18 | 6 | 36 |
19 | 4 | 35 |
20 | 8 | 75 |
21 | 6 | 56 |
hw.sample(10)
height | weight | |
---|---|---|
12 | 8 | 90 |
11 | 4 | 88 |
5 | 6 | 67 |
13 | 5 | 56 |
20 | 8 | 75 |
17 | 6 | 67 |
16 | 4 | 75 |
0 | 7 | 56 |
21 | 6 | 56 |
3 | 5 | 46 |
slip 19
) Write a Python program [15]
- To create a dataframe containing columns name, age and percentage. Add 10 rows to the dataframe. View the dataframe.
- To print the shape, number of rows-columns, data types, feature names and the description of the data
- To Add 5 rows with duplicate values and missing values. Add a column ‘remarks’ with empty values. Display the data.
import pandas as pd
data = [['swapnil',88,22],['om',88,12],['jay',98,32],['sai',45,25],['didi',83,22],['swapnil',88,22]]
df = pd.DataFrame(data, columns = ["name","percentage","age"])
df
name | percentage | age | |
---|---|---|---|
0 | swapnil | 88 | 22 |
1 | om | 88 | 12 |
2 | jay | 98 | 32 |
3 | sai | 45 | 25 |
4 | didi | 83 | 22 |
5 | swapnil | 88 | 22 |
df.shape
(6, 3)
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 6 entries, 0 to 5 Data columns (total 3 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 name 6 non-null object 1 percentage 6 non-null int64 2 age 6 non-null int64 dtypes: int64(2), object(1) memory usage: 272.0+ bytes
df.dtypes
name object percentage int64 age int64 dtype: object
df.describe()
percentage | age | |
---|---|---|
count | 6.000000 | 6.000000 |
mean | 81.666667 | 22.500000 |
std | 18.618987 | 6.442049 |
min | 45.000000 | 12.000000 |
25% | 84.250000 | 22.000000 |
50% | 88.000000 | 22.000000 |
75% | 88.000000 | 24.250000 |
max | 98.000000 | 32.000000 |
df.loc[6]=["om",88,12]
df
name | percentage | age | |
---|---|---|---|
0 | swapnil | 88 | 22 |
1 | om | 88 | 12 |
2 | jay | 98 | 32 |
3 | sai | 45 | 25 |
4 | didi | 83 | 22 |
5 | swapnil | 88 | 22 |
6 | om | 88 | 12 |
df.loc[7] = [None,None,49]
df.loc[8] = ["rohit",None,49]
df.loc[9] = ["didi",83,22]
df.loc[10] = [None,None,None]
df
name | percentage | age | |
---|---|---|---|
0 | swapnil | 88.0 | 22.0 |
1 | om | 88.0 | 12.0 |
2 | jay | 98.0 | 32.0 |
3 | sai | 45.0 | 25.0 |
4 | didi | 83.0 | 22.0 |
5 | swapnil | 88.0 | 22.0 |
6 | om | 88.0 | 12.0 |
7 | None | NaN | 49.0 |
8 | rohit | NaN | 49.0 |
9 | didi | 83.0 | 22.0 |
10 | None | NaN | NaN |
df["remark"] =None
df
name | percentage | age | remark | |
---|---|---|---|---|
0 | swapnil | 88.0 | 22.0 | None |
1 | om | 88.0 | 12.0 | None |
2 | jay | 98.0 | 32.0 | None |
3 | sai | 45.0 | 25.0 | None |
4 | didi | 83.0 | 22.0 | None |
5 | swapnil | 88.0 | 22.0 | None |
6 | om | 88.0 | 12.0 | None |
7 | None | NaN | 49.0 | None |
8 | rohit | NaN | 49.0 | None |
9 | didi | 83.0 | 22.0 | None |
10 | None | NaN | NaN | None |
slip20
Add two outliers to the above data and display the box plot.
import numpy as np
arr = np.array([1,2,3,4,5,6,100,150])
plt.boxplot(arr)
plt.show()
slip 21
Import dataset “iris.csv”. Write a Python program to create a Bar plot to get the frequency of the three species of the Iris data.
import pandas as pd
import matplotlib.pyplot as plt
iris = pd.read_csv("Iris.csv")
iris['Species'].value_counts().plot.bar()
plt.title("Iris Species %")
plt.show()
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) ~\AppData\Local\Temp/ipykernel_8604/1709669635.py in <module> 5 iris = pd.read_csv("Iris.csv") 6 ----> 7 iris['Species'].value_counts().plt.bar() 8 9 plt.title("Iris Species %") ~\anaconda3\lib\site-packages\pandas\core\generic.py in __getattr__(self, name) 5485 ): 5486 return self[name] -> 5487 return object.__getattribute__(self, name) 5488 5489 def __setattr__(self, name: str, value) -> None: AttributeError: 'Series' object has no attribute 'plt'
Write a Python program to create a histogram of the three species of the Iris data.
slip 24 Q2
import pandas as pd
import matplotlib.pyplot as plt
iris = pd.read_csv("Iris.csv")
#iris['Species'].value_counts().plot.hist(bins=[10,20,30,40,50])
#plt.hist(iris['SepalLengthCm'],bins=20)
plt.hist(iris['Species'],bins=20)
plt.title("Iris Species %")
plt.show()
iris
Id | SepalLengthCm | SepalWidthCm | PetalLengthCm | PetalWidthCm | Species | |
---|---|---|---|---|---|---|
0 | 1 | 5.1 | 3.5 | 1.4 | 0.2 | Iris-setosa |
1 | 2 | 4.9 | 3.0 | 1.4 | 0.2 | Iris-setosa |
2 | 3 | 4.7 | 3.2 | 1.3 | 0.2 | Iris-setosa |
3 | 4 | 4.6 | 3.1 | 1.5 | 0.2 | Iris-setosa |
4 | 5 | 5.0 | 3.6 | 1.4 | 0.2 | Iris-setosa |
... | ... | ... | ... | ... | ... | ... |
145 | 146 | 6.7 | 3.0 | 5.2 | 2.3 | Iris-virginica |
146 | 147 | 6.3 | 2.5 | 5.0 | 1.9 | Iris-virginica |
147 | 148 | 6.5 | 3.0 | 5.2 | 2.0 | Iris-virginica |
148 | 149 | 6.2 | 3.4 | 5.4 | 2.3 | Iris-virginica |
149 | 150 | 5.9 | 3.0 | 5.1 | 1.8 | Iris-virginica |
150 rows × 6 columns
#Import dataset “iris.csv”. Write a Python program to create a Bar plot to get the
#frequency of the three species of the Iris data.
plt.bar(iris['Species'],height=20)
<BarContainer object of 150 artists>
slip 30 ,26,25,20,15,12,9,,4
generate a random array of 50 intger and display them using a line chart , scatter plot , histogram and box plot Apply the appropriate color , labels and styling options
from numpy import random
arr = random.randint(1,100,50)
print(n)
[67 79 16 57 14 84 58 17 73 22 12 34 59 54 91 42 87 74 9 6]
import matplotlib.pyplot as plt
plt.plot(arr, color='red')
plt.xlabel("x-axis")
plt.ylabel("y-axis")
plt.title("line chart")
plt.show()
arr2 = random.randint(1,100,50)
arr2
array([ 2, 30, 30, 4, 72, 57, 35, 32, 55, 86, 54, 21, 83, 72, 51, 21, 49, 99, 8, 87, 68, 16, 40, 74, 57, 10, 68, 78, 49, 43, 3, 73, 18, 44, 6, 51, 81, 86, 16, 41, 86, 54, 99, 10, 65, 43, 18, 93, 74, 30])
import matplotlib.pyplot as plt
plt.scatter(arr,arr2,color="red")
plt.xlabel("x-axis")
plt.ylabel("y-axis")
plt.title("scatter plot")
plt.show()
import matplotlib.pyplot as plt
plt.hist(arr,bins=[0,25,50,75,100],color="red") # bins means width of each block
plt.xlabel("x-axis")
plt.ylabel("y-axis")
plt.title("histogram ")
plt.show()
import matplotlib.pyplot as plt
plt.boxplot(arr)
plt.xlabel("x-axis")
plt.ylabel("y-axis")
plt.title("box plot")
plt.show()
create two list one representing subject name and other representing marks obtained on those subject display data in the bar chart
Subject = ['English', 'Maths', 'Science','history','data science']
marks = [ 90,80,70,97,40]
import matplotlib.pyplot as plt
plt.bar(Subject,marks,color="red")
plt.xlabel("x-axis")
plt.ylabel("y-axis")
plt.title("bar chart ")
plt.show()
Slip 28
# Q.1 Write a Python Program to create a dataframe containing columns name, height, and weight.
# Add 10 rows to the dataframe. view the dataframe
import pandas as pd
df = pd.DataFrame(columns = ['Name','Height','Weight'])
df.loc[0] = ['Nil' , 7 , 58]
df.loc[1] = [ None , 6 , 49]
df.loc[2] = ['Emma' , 6 , 45]
df.loc[3] = ['Swapnil' , 5 , 56]
df.loc[4] = ['Swamiraj' , None , 56]
df.loc[5] = ['Vaishu' , 4 , 49]
df.loc[6] = ['Snehal' , 5 , 58]
df.loc[7] = ['Vaishu' , 4 , 49]
df.loc[8] = ['Navin' , 5 , None]
df.loc[9] = ['Shreya' , 6 , 56]
df
Name | Height | Weight | |
---|---|---|---|
0 | Nil | 7 | 58 |
1 | NaN | 6.0 | 49.0 |
2 | Emma | 6 | 45 |
3 | Swapnil | 5 | 56 |
4 | Swamiraj | None | 56 |
5 | Vaishu | 4 | 49 |
6 | Snehal | 5 | 58 |
7 | Vaishu | 4 | 49 |
8 | Navin | 5 | None |
9 | Shreya | 6 | 56 |
# Q.2 write a python program to find shape, size, datatypes of the dataframe object.
df.shape
(10, 3)
df.size
30
df.dtypes
Name object Height object Weight object dtype: object
# Q.3 write a python program to view basic statistical details of the data
df.describe()
Name | Height | Weight | |
---|---|---|---|
count | 9 | 9.0 | 9.0 |
unique | 8 | 4.0 | 4.0 |
top | Vaishu | 6.0 | 49.0 |
freq | 2 | 3.0 | 3.0 |
df.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 10 entries, 0 to 9 Data columns (total 3 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Name 9 non-null object 1 Height 9 non-null object 2 Weight 9 non-null object dtypes: object(3) memory usage: 320.0+ bytes
# Q.4 write a python program to get the number of observations, missing values and nan values.
df.isnull()
Name | Height | Weight | |
---|---|---|---|
0 | False | False | False |
1 | True | False | False |
2 | False | False | False |
3 | False | False | False |
4 | False | True | False |
5 | False | False | False |
6 | False | False | False |
7 | False | False | False |
8 | False | False | True |
9 | False | False | False |
df.dropna()
Name | Height | Weight | |
---|---|---|---|
0 | Nil | 7 | 58 |
2 | Emma | 6 | 45 |
3 | Swapnil | 5 | 56 |
5 | Vaishu | 4 | 49 |
6 | Snehal | 5 | 58 |
7 | Vaishu | 4 | 49 |
9 | Shreya | 6 | 56 |
# Q.5 write a python program to add a column to dataframe "BMI" which is calculated as weight/height^2
df['BMI']=df.Weight/(df.Height**2)
df
Name | Height | Weight | BMI | |
---|---|---|---|---|
0 | Nil | 7 | 58 | 1.183673 |
1 | NaN | 6.0 | 49.0 | 1.361111 |
2 | Emma | 6 | 45 | 1.25 |
3 | Swapnil | 5 | 56 | 2.24 |
4 | Swamiraj | None | 56 | NaN |
5 | Vaishu | 4 | 49 | 3.0625 |
6 | Snehal | 5 | 58 | 2.32 |
7 | Vaishu | 4 | 49 | 3.0625 |
8 | Navin | 5 | None | NaN |
9 | Shreya | 6 | 56 | 1.555556 |
# Q.6 write a python program to find the maximum and minimum BMI
df.BMI.max()
3.0625
df.BMI.min()
1.183673469387755
# Q.7 write a python program to generate a Scatter plot of height and weight
import matplotlib.pyplot as plt
df.plot.scatter( x = 'Height', y = 'Weight' )
<AxesSubplot:xlabel='Height', ylabel='Weight'>
s
s
s
- Get link
- X
- Other Apps
❤️
ReplyDelete