Data Science Practical Exam Programs
- Get link
- X
- Other Apps
Data Science Practical Exam Programs
slip 1 & slip 11
Write a Python program to create a Pie plot to get the frequency of the three species of the Iris data (Use iris.csv)
import pandas as pddf2 = pd.read_csv("Iris.csv")df2| Id | SepalLengthCm | SepalWidthCm | PetalLengthCm | PetalWidthCm | Species | |
|---|---|---|---|---|---|---|
| 0 | 1 | 5.1 | 3.5 | 1.4 | 0.2 | Iris-setosa |
| 1 | 2 | 4.9 | 3.0 | 1.4 | 0.2 | Iris-setosa |
| 2 | 3 | 4.7 | 3.2 | 1.3 | 0.2 | Iris-setosa |
| 3 | 4 | 4.6 | 3.1 | 1.5 | 0.2 | Iris-setosa |
| 4 | 5 | 5.0 | 3.6 | 1.4 | 0.2 | Iris-setosa |
| ... | ... | ... | ... | ... | ... | ... |
| 145 | 146 | 6.7 | 3.0 | 5.2 | 2.3 | Iris-virginica |
| 146 | 147 | 6.3 | 2.5 | 5.0 | 1.9 | Iris-virginica |
| 147 | 148 | 6.5 | 3.0 | 5.2 | 2.0 | Iris-virginica |
| 148 | 149 | 6.2 | 3.4 | 5.4 | 2.3 | Iris-virginica |
| 149 | 150 | 5.9 | 3.0 | 5.1 | 1.8 | Iris-virginica |
150 rows × 6 columns
import pandas as pdimport matplotlib.pyplot as pltiris = pd.read_csv("Iris.csv")iris['Species'].value_counts().plot.pie()plt.title("Iris Species %")plt.show()import pandas as pdimport matplotlib.pyplot as pltiris = pd.read_csv("Iris.csv")pie = iris['Species'].value_counts()pieIris-setosa 50 Iris-versicolor 50 Iris-virginica 50 Name: Species, dtype: int64
iris["PetalWidthCm"].value_counts()0.2 28 1.3 13 1.8 12 1.5 12 1.4 8 2.3 8 1.0 7 0.4 7 0.3 7 0.1 6 2.1 6 2.0 6 1.2 5 1.9 5 1.6 4 2.5 3 2.2 3 2.4 3 1.1 3 1.7 2 0.6 1 0.5 1 Name: PetalWidthCm, dtype: int64
B) Write a Python program to view basic statistical details of the data.(Use wineequality-red.csv) wine = pd.read_csv("wineequality-red.csv")wine| fixed acidity;"volatile acidity";"citric acid";"residual sugar";"chlorides";"free sulfur dioxide";"total sulfur dioxide";"density";"pH";"sulphates";"alcohol";"quality" | |
|---|---|
| 0 | 7.4;0.7;0;1.9;0.076;11;34;0.9978;3.51;0.56;9.4;5 |
| 1 | 7.8;0.88;0;2.6;0.098;25;67;0.9968;3.2;0.68;9.8;5 |
| 2 | 7.8;0.76;0.04;2.3;0.092;15;54;0.997;3.26;0.65;... |
| 3 | 11.2;0.28;0.56;1.9;0.075;17;60;0.998;3.16;0.58... |
| 4 | 7.4;0.7;0;1.9;0.076;11;34;0.9978;3.51;0.56;9.4;5 |
| ... | ... |
| 1594 | 6.2;0.6;0.08;2;0.09;32;44;0.9949;3.45;0.58;10.5;5 |
| 1595 | 5.9;0.55;0.1;2.2;0.062;39;51;0.99512;3.52;0.76... |
| 1596 | 6.3;0.51;0.13;2.3;0.076;29;40;0.99574;3.42;0.7... |
| 1597 | 5.9;0.645;0.12;2;0.075;32;44;0.99547;3.57;0.71... |
| 1598 | 6;0.31;0.47;3.6;0.067;18;42;0.99549;3.39;0.66;... |
1599 rows × 1 columns
wine.describe()| fixed acidity;"volatile acidity";"citric acid";"residual sugar";"chlorides";"free sulfur dioxide";"total sulfur dioxide";"density";"pH";"sulphates";"alcohol";"quality" | |
|---|---|
| count | 1599 |
| unique | 1359 |
| top | 7.2;0.36;0.46;2.1;0.074;24;44;0.99534;3.4;0.85... |
| freq | 4 |
wine.info()<class 'pandas.core.frame.DataFrame'> RangeIndex: 1599 entries, 0 to 1598 Data columns (total 1 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 fixed acidity;"volatile acidity";"citric acid";"residual sugar";"chlorides";"free sulfur dioxide";"total sulfur dioxide";"density";"pH";"sulphates";"alcohol";"quality" 1599 non-null object dtypes: object(1) memory usage: 12.6+ KB
slip 2 and slip 6
Q.2 A) Write a Python program for Handling Missing Value. Replace missing value of salary, age column with mean of that column.(Use Data.csv file).
import pandas as pdimport matplotlib.pyplot as pltdata = pd.read_csv("Data.csv")data| name | age | salary | |
|---|---|---|---|
| 0 | swapnil | 22.0 | 300.0 |
| 1 | raj | NaN | 233.0 |
| 2 | ajay | NaN | NaN |
| 3 | vijay | 32.0 | 234.0 |
| 4 | saurabh | 23.0 | NaN |
| 5 | sonny | NaN | 234.0 |
data["age"] = data['age'].fillna(data['age'].mean())data| name | age | salary | |
|---|---|---|---|
| 0 | swapnil | 22.000000 | 300.0 |
| 1 | raj | 25.666667 | 233.0 |
| 2 | ajay | 25.666667 | NaN |
| 3 | vijay | 32.000000 | 234.0 |
| 4 | saurabh | 23.000000 | NaN |
| 5 | sonny | 25.666667 | 234.0 |
data["salary"] = data['salary'].fillna(data['salary'].mean())data| name | age | salary | |
|---|---|---|---|
| 0 | swapnil | 22.000000 | 300.00 |
| 1 | raj | 25.666667 | 233.00 |
| 2 | ajay | 25.666667 | 250.25 |
| 3 | vijay | 32.000000 | 234.00 |
| 4 | saurabh | 23.000000 | 250.25 |
| 5 | sonny | 25.666667 | 234.00 |
Q.2 B) Write a Python program to generate a line plot of name Vs salary [5]
plt.plot(data['name'],data['salary'])plt.xlabel("name")plt.ylabel("salary")plt.show()Download the heights and weights dataset and load the dataset froma given csv file into a dataframe. Print the first, last 10 rows and random 20 rows also display shape of the dataset.
hw = pd.read_csv("height_weight.csv")hw| height | weight | |
|---|---|---|
| 0 | 7 | 56 |
| 1 | 6 | 45 |
| 2 | 5 | 45 |
| 3 | 5 | 46 |
| 4 | 4 | 75 |
| 5 | 6 | 67 |
| 6 | 6 | 36 |
| 7 | 4 | 35 |
| 8 | 8 | 75 |
| 9 | 6 | 56 |
| 10 | 5 | 47 |
| 11 | 4 | 88 |
| 12 | 8 | 90 |
| 13 | 5 | 56 |
| 14 | 3 | 45 |
| 15 | 5 | 46 |
| 16 | 4 | 75 |
| 17 | 6 | 67 |
| 18 | 6 | 36 |
| 19 | 4 | 35 |
| 20 | 8 | 75 |
| 21 | 6 | 56 |
hw.head(10)| height | weight | |
|---|---|---|
| 0 | 7 | 56 |
| 1 | 6 | 45 |
| 2 | 5 | 45 |
| 3 | 5 | 46 |
| 4 | 4 | 75 |
| 5 | 6 | 67 |
| 6 | 6 | 36 |
| 7 | 4 | 35 |
| 8 | 8 | 75 |
| 9 | 6 | 56 |
hw.tail(10)| height | weight | |
|---|---|---|
| 12 | 8 | 90 |
| 13 | 5 | 56 |
| 14 | 3 | 45 |
| 15 | 5 | 46 |
| 16 | 4 | 75 |
| 17 | 6 | 67 |
| 18 | 6 | 36 |
| 19 | 4 | 35 |
| 20 | 8 | 75 |
| 21 | 6 | 56 |
hw.sample(20)| height | weight | |
|---|---|---|
| 12 | 8 | 90 |
| 1 | 6 | 45 |
| 14 | 3 | 45 |
| 16 | 4 | 75 |
| 3 | 5 | 46 |
| 8 | 8 | 75 |
| 17 | 6 | 67 |
| 13 | 5 | 56 |
| 21 | 6 | 56 |
| 0 | 7 | 56 |
| 2 | 5 | 45 |
| 5 | 6 | 67 |
| 11 | 4 | 88 |
| 6 | 6 | 36 |
| 7 | 4 | 35 |
| 4 | 4 | 75 |
| 20 | 8 | 75 |
| 15 | 5 | 46 |
| 10 | 5 | 47 |
| 19 | 4 | 35 |
hw.shape(22, 2)
slip 3
Write a Python program to create box plots to see how each feature i.e. Sepal Length, Sepal Width, Petal Length, Petal Width are distributed across the three species. (Use iris.csv dataset)
iris| Id | SepalLengthCm | SepalWidthCm | PetalLengthCm | PetalWidthCm | Species | |
|---|---|---|---|---|---|---|
| 0 | 1 | 5.1 | 3.5 | 1.4 | 0.2 | Iris-setosa |
| 1 | 2 | 4.9 | 3.0 | 1.4 | 0.2 | Iris-setosa |
| 2 | 3 | 4.7 | 3.2 | 1.3 | 0.2 | Iris-setosa |
| 3 | 4 | 4.6 | 3.1 | 1.5 | 0.2 | Iris-setosa |
| 4 | 5 | 5.0 | 3.6 | 1.4 | 0.2 | Iris-setosa |
| ... | ... | ... | ... | ... | ... | ... |
| 145 | 146 | 6.7 | 3.0 | 5.2 | 2.3 | Iris-virginica |
| 146 | 147 | 6.3 | 2.5 | 5.0 | 1.9 | Iris-virginica |
| 147 | 148 | 6.5 | 3.0 | 5.2 | 2.0 | Iris-virginica |
| 148 | 149 | 6.2 | 3.4 | 5.4 | 2.3 | Iris-virginica |
| 149 | 150 | 5.9 | 3.0 | 5.1 | 1.8 | Iris-virginica |
150 rows × 6 columns
plt.boxplot(iris["Id"]){'whiskers': [<matplotlib.lines.Line2D at 0x1b41b07fd90>,
<matplotlib.lines.Line2D at 0x1b41b090160>],
'caps': [<matplotlib.lines.Line2D at 0x1b41b0904f0>,
<matplotlib.lines.Line2D at 0x1b41b0908b0>],
'boxes': [<matplotlib.lines.Line2D at 0x1b41b07fa60>],
'medians': [<matplotlib.lines.Line2D at 0x1b41b090c40>],
'fliers': [<matplotlib.lines.Line2D at 0x1b41b090fd0>],
'means': []}plt.boxplot(iris["SepalLengthCm"]){'whiskers': [<matplotlib.lines.Line2D at 0x1b41b0f8370>,
<matplotlib.lines.Line2D at 0x1b41b0f8700>],
'caps': [<matplotlib.lines.Line2D at 0x1b41b0f8a90>,
<matplotlib.lines.Line2D at 0x1b41b0f8e20>],
'boxes': [<matplotlib.lines.Line2D at 0x1b41b0e9f70>],
'medians': [<matplotlib.lines.Line2D at 0x1b41b1011f0>],
'fliers': [<matplotlib.lines.Line2D at 0x1b41b101580>],
'means': []}plt.boxplot(iris["SepalWidthCm"]){'whiskers': [<matplotlib.lines.Line2D at 0x1b41b15e9d0>,
<matplotlib.lines.Line2D at 0x1b41b15ed00>],
'caps': [<matplotlib.lines.Line2D at 0x1b41b16c0d0>,
<matplotlib.lines.Line2D at 0x1b41b16c460>],
'boxes': [<matplotlib.lines.Line2D at 0x1b41b15e640>],
'medians': [<matplotlib.lines.Line2D at 0x1b41b16c7f0>],
'fliers': [<matplotlib.lines.Line2D at 0x1b41b16cbb0>],
'means': []}plt.boxplot(iris["PetalLengthCm"]){'whiskers': [<matplotlib.lines.Line2D at 0x1b41b32a490>,
<matplotlib.lines.Line2D at 0x1b41b32a820>],
'caps': [<matplotlib.lines.Line2D at 0x1b41b32abb0>,
<matplotlib.lines.Line2D at 0x1b41b32aee0>],
'boxes': [<matplotlib.lines.Line2D at 0x1b41b32a100>],
'medians': [<matplotlib.lines.Line2D at 0x1b41b3372b0>],
'fliers': [<matplotlib.lines.Line2D at 0x1b41b337640>],
'means': []}Write a Python program to view basic statistical details of the data (Use Heights and Weights Dataset
hw.info()<class 'pandas.core.frame.DataFrame'> RangeIndex: 22 entries, 0 to 21 Data columns (total 2 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 height 22 non-null int64 1 weight 22 non-null int64 dtypes: int64(2) memory usage: 480.0 bytes
hw.describe()| height | weight | |
|---|---|---|
| count | 22.00000 | 22.000000 |
| mean | 5.50000 | 56.909091 |
| std | 1.40577 | 17.146176 |
| min | 3.00000 | 35.000000 |
| 25% | 4.25000 | 45.000000 |
| 50% | 5.50000 | 56.000000 |
| 75% | 6.00000 | 73.000000 |
| max | 8.00000 | 90.000000 |
slip 4 & slip 5Generate a random array of 50 integers and display them using a line chart, scatter plot, histogram and box plot. Apply appropriate color, labels and styling options# this program only for genarate random one integer..okimport numpy as npimport random as rnarr = rn.randint(1,100)arr89
#this program for genarate random array...okay..from numpy import randomarr = random.randint(1,100,50)arrarray([84, 49, 52, 66, 81, 34, 30, 2, 15, 14, 23, 85, 9, 22, 58, 69, 67,
48, 3, 92, 54, 43, 79, 82, 70, 9, 19, 90, 57, 19, 18, 59, 97, 25,
99, 74, 62, 38, 10, 9, 49, 7, 34, 53, 89, 74, 54, 1, 49, 67])import matplotlib.pyplot as pltplt.plot(arr,color = "red")plt.xlabel("x-axis")plt.ylabel("y-axis")plt.title("line chart")Text(0.5, 1.0, 'line chart')
arr2 = random.randint(1,100,50)plt.scatter(arr,arr2,color= 'red')plt.xlabel("x-axis")plt.ylabel("y-axis")plt.title("scatter plot")plt.show()plt.hist(arr,bins=[20,40,60,80,100],color= 'red')plt.xlabel("x-axis")plt.ylabel("y-axis")plt.title("histogram")plt.show()plt.boxplot(arr)plt.xlabel("x-axis")plt.ylabel("y-axis")plt.title("boxplot plot")plt.show()Write a Python program to print the shape, number of rows-columns, data types, feature names and the description of the data(Use User_Data.csv)
userdata = pd.read_csv("csvdata2.csv")userdata| id | name | city | phone | |
|---|---|---|---|---|
| 0 | 11 | swapnil | pune | 12344 |
| 1 | 22 | raj | mumbai | 1234 |
| 2 | 33 | vijay | patas | 2344 |
| 3 | 44 | jay | baramti | 87 |
| 4 | 55 | ajay | roti | 8427 |
userdata.shapeuserdata.shape(5, 4)
userdata.info()userdata.info()<class 'pandas.core.frame.DataFrame'> RangeIndex: 5 entries, 0 to 4 Data columns (total 4 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 id 5 non-null int64 1 name 5 non-null object 2 city 5 non-null object 3 phone 5 non-null int64 dtypes: int64(2), object(2) memory usage: 288.0+ bytes
userdata.dtypesid int64 name object city object phone int64 dtype: object
userdata.describe()| id | phone | |
|---|---|---|
| count | 5.000000 | 5.000000 |
| mean | 33.000000 | 4887.200000 |
| std | 17.392527 | 5267.582624 |
| min | 11.000000 | 87.000000 |
| 25% | 22.000000 | 1234.000000 |
| 50% | 33.000000 | 2344.000000 |
| 75% | 44.000000 | 8427.000000 |
| max | 55.000000 | 12344.000000 |
slip 7
Write a Python program to perform the following tasks : a. Apply OneHot coding on Country column. b. Apply Label encoding on purchased column (Data.csv have two categorical column the country column, and the purchased column).
data = pd.read_csv("Data1.csv")data| Country | Age | Salary | Purchased | |
|---|---|---|---|---|
| 0 | France | 44 | 72000 | No |
| 1 | Spain | 27 | 48000 | Yes |
| 2 | Germany | 30 | 54000 | No |
| 3 | Spain | 38 | 61000 | No |
| 4 | Germany | 40 | Yes | NaN |
| 5 | France | 35 | 58000 | Yes |
| 6 | Spain | 52000 | No | NaN |
| 7 | France | 48 | 79000 | Yes |
| 8 | Germany | 50 | 83000 | No |
| 9 | France | 37 | 67000 | Yes |
from sklearn.compose import ColumnTransformerfrom sklearn.preprocessing import OneHotEncoderct = ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [0])], remainder='passthrough')data = pd.DataFrame(ct.fit_transform(data))data| 0 | 1 | 2 | 3 | 4 | |
|---|---|---|---|---|---|
| 0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 |
| 1 | 1.0 | 0.0 | 1.0 | 0.0 | 1.0 |
| 2 | 1.0 | 0.0 | 1.0 | 0.0 | 1.0 |
| 3 | 1.0 | 0.0 | 1.0 | 0.0 | 1.0 |
| 4 | 1.0 | 0.0 | 1.0 | 0.0 | 1.0 |
| 5 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 |
| 6 | 1.0 | 0.0 | 1.0 | 0.0 | 1.0 |
| 7 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 |
| 8 | 1.0 | 0.0 | 1.0 | 0.0 | 1.0 |
| 9 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 |
from sklearn.preprocessing import LabelEncoderle = LabelEncoder()data.iloc[:,-1] = le.fit_transform(data.iloc[:,-1])data| 0 | 1 | 2 | 3 | 4 | |
|---|---|---|---|---|---|
| 0 | 0.0 | 1.0 | 0.0 | 1.0 | 0 |
| 1 | 1.0 | 0.0 | 1.0 | 0.0 | 1 |
| 2 | 1.0 | 0.0 | 1.0 | 0.0 | 1 |
| 3 | 1.0 | 0.0 | 1.0 | 0.0 | 1 |
| 4 | 1.0 | 0.0 | 1.0 | 0.0 | 1 |
| 5 | 0.0 | 1.0 | 0.0 | 1.0 | 0 |
| 6 | 1.0 | 0.0 | 1.0 | 0.0 | 1 |
| 7 | 0.0 | 1.0 | 0.0 | 1.0 | 0 |
| 8 | 1.0 | 0.0 | 1.0 | 0.0 | 1 |
| 9 | 0.0 | 1.0 | 0.0 | 1.0 | 0 |
Write a program in python to perform following task : [15] Standardizing Data (transform them into a standard Gaussian distribution with a mean of 0 and a standard deviation of 1) (Use winequality-red.csv)
#standardize the values in each columndf_new = (data-data.mean())/data.std()df_new| 0 | 1 | 2 | 3 | 4 | |
|---|---|---|---|---|---|
| 0 | -1.161895 | 1.161895 | -1.161895 | 1.161895 | -1.161895 |
| 1 | 0.774597 | -0.774597 | 0.774597 | -0.774597 | 0.774597 |
| 2 | 0.774597 | -0.774597 | 0.774597 | -0.774597 | 0.774597 |
| 3 | 0.774597 | -0.774597 | 0.774597 | -0.774597 | 0.774597 |
| 4 | 0.774597 | -0.774597 | 0.774597 | -0.774597 | 0.774597 |
| 5 | -1.161895 | 1.161895 | -1.161895 | 1.161895 | -1.161895 |
| 6 | 0.774597 | -0.774597 | 0.774597 | -0.774597 | 0.774597 |
| 7 | -1.161895 | 1.161895 | -1.161895 | 1.161895 | -1.161895 |
| 8 | 0.774597 | -0.774597 | 0.774597 | -0.774597 | 0.774597 |
| 9 | -1.161895 | 1.161895 | -1.161895 | 1.161895 | -1.161895 |
df_new.mean()0 4.440892e-17 1 -4.440892e-17 2 4.440892e-17 3 -4.440892e-17 4 2.220446e-17 dtype: float64
df_new.std()0 1.0 1 1.0 2 1.0 3 1.0 4 1.0 dtype: float64
Create two lists, one representing subject names and the other representing marks obtained in those subjects. Display the data in a pie chart.
import matplotlib.pyplot as pltsubnames = ["math","marathi","english","stat","python","java"]marks = [78,56,45,67,97,75]plt.pie(marks,labels = subnames)([<matplotlib.patches.Wedge at 0x1b41e568eb0>, <matplotlib.patches.Wedge at 0x1b41e574460>, <matplotlib.patches.Wedge at 0x1b41e574940>, <matplotlib.patches.Wedge at 0x1b41e574e20>, <matplotlib.patches.Wedge at 0x1b41e57f340>, <matplotlib.patches.Wedge at 0x1b41e57f820>], [Text(0.9163353394095168, 0.608547077677024, 'math'), Text(-0.024799968250606833, 1.099720401545215, 'marathi'), Text(-0.7748890343653689, 0.7807349002192141, 'english'), Text(-1.0984780301579335, -0.057844768651503314, 'stat'), Text(-0.30990577210017567, -1.055442282845914, 'python'), Text(0.9298224053418628, -0.5877331831063078, 'java')])
Write a program in python to perform following task (Use winequality-red.csv ) [5] Import Dataset and do the followings: a) Describing the dataset b) Shape of the dataset c) Display first 3 rows from datasee
wine = pd.read_csv("Data1.csv")wine| Country | Age | Salary | Purchased | |
|---|---|---|---|---|
| 0 | France | 44 | 72000 | No |
| 1 | Spain | 27 | 48000 | Yes |
| 2 | Germany | 30 | 54000 | No |
| 3 | Spain | 38 | 61000 | No |
| 4 | Germany | 40 | Yes | NaN |
| 5 | France | 35 | 58000 | Yes |
| 6 | Spain | 52000 | No | NaN |
| 7 | France | 48 | 79000 | Yes |
| 8 | Germany | 50 | 83000 | No |
| 9 | France | 37 | 67000 | Yes |
wine.describe()| Age | |
|---|---|
| count | 10.000000 |
| mean | 5234.900000 |
| std | 16431.582824 |
| min | 27.000000 |
| 25% | 35.500000 |
| 50% | 39.000000 |
| 75% | 47.000000 |
| max | 52000.000000 |
wine.shape(10, 4)
wine.head(3)| Country | Age | Salary | Purchased | |
|---|---|---|---|---|
| 0 | France | 44 | 72000 | No |
| 1 | Spain | 27 | 48000 | Yes |
| 2 | Germany | 30 | 54000 | No |
slip 10
Write a python program to Display column-wise mean, and median for SOCRHeightWeight dataset.
hw = pd.read_csv("height_weight.csv")hw| height | weight | |
|---|---|---|
| 0 | 7 | 56 |
| 1 | 6 | 45 |
| 2 | 5 | 45 |
| 3 | 5 | 46 |
| 4 | 4 | 75 |
| 5 | 6 | 67 |
| 6 | 6 | 36 |
| 7 | 4 | 35 |
| 8 | 8 | 75 |
| 9 | 6 | 56 |
| 10 | 5 | 47 |
| 11 | 4 | 88 |
| 12 | 8 | 90 |
| 13 | 5 | 56 |
| 14 | 3 | 45 |
| 15 | 5 | 46 |
| 16 | 4 | 75 |
| 17 | 6 | 67 |
| 18 | 6 | 36 |
| 19 | 4 | 35 |
| 20 | 8 | 75 |
| 21 | 6 | 56 |
hw.mean()height 5.500000 weight 56.909091 dtype: float64
hw.median()height 5.5 weight 56.0 dtype: float64
hw["height"].mean()5.5
Write a python program to compute sum of Manhattan distance between all pairs of points.
# Using scipy to Calculate the Manhattan Distancefrom scipy.spatial.distance import cityblockx1 = [1,2,3,4,5,6]x2 = [10,20,30,1,2,3]print(cityblock(x1, x2))# Returns: 6363
print(cityblock(hw["height"],hw["weight"]))1131
slip 12
Write a Python program to create data frame containing column name, salary, department add 10 rows with some missing and duplicate values to the data frame. Also drop all null and empty values. Print the modified data frame
import pandas as pdimport numpy as nparr = np.array([['swapnil',50399,'airforce'], ['nihan',50399,'airforce'], ['vijay',24499,'math'], ['jay','cs',"NaN"], ['ajay',None,'airforce'], ['vinu',50399,None], ['bharat',50399,'airforce'], ['suahas','airforce',None], ['nihan',50399,'airforce'], ['kemal',50399,'airforce'] ])df = pd.DataFrame(arr, columns =["name","salary","dapartment"])df| name | salary | dapartment | |
|---|---|---|---|
| 0 | swapnil | 50399 | airforce |
| 1 | nihan | 50399 | airforce |
| 2 | vijay | 24499 | math |
| 3 | jay | cs | NaN |
| 4 | ajay | None | airforce |
| 5 | vinu | 50399 | None |
| 6 | bharat | 50399 | airforce |
| 7 | suahas | airforce | None |
| 8 | nihan | 50399 | airforce |
| 9 | kemal | 50399 | airforce |
df.isnull()| name | salary | dapartment | |
|---|---|---|---|
| 0 | False | False | False |
| 1 | False | False | False |
| 2 | False | False | False |
| 3 | False | False | False |
| 4 | False | True | False |
| 5 | False | False | True |
| 6 | False | False | False |
| 7 | False | False | True |
| 8 | False | False | False |
| 9 | False | False | False |
df.duplicated()0 False 1 False 2 False 3 False 4 False 5 False 6 False 7 False 8 True 9 False dtype: bool
df.drop<bound method DataFrame.drop of name salary dapartment 0 swapnil 50399 airforce 1 nihan 50399 airforce 2 vijay 24499 math 3 jay cs NaN 4 ajay None airforce 5 vinu 50399 None 6 bharat 50399 airforce 7 suahas airforce None 8 nihan 50399 airforce 9 kemal 50399 airforce>
df| name | salary | dapartment | |
|---|---|---|---|
| 0 | swapnil | 50399 | airforce |
| 1 | nihan | 50399 | airforce |
| 2 | vijay | 24499 | math |
| 3 | jay | cs | NaN |
| 4 | ajay | None | airforce |
| 5 | vinu | 50399 | None |
| 6 | bharat | 50399 | airforce |
| 7 | suahas | airforce | None |
| 8 | nihan | 50399 | airforce |
| 9 | kemal | 50399 | airforce |
#df.dropna(inplace=True) <--- used for permanantly deletedf.dropna()| name | salary | dapartment | |
|---|---|---|---|
| 0 | swapnil | 50399 | airforce |
| 1 | nihan | 50399 | airforce |
| 2 | vijay | 24499 | math |
| 3 | jay | cs | NaN |
| 6 | bharat | 50399 | airforce |
| 8 | nihan | 50399 | airforce |
| 9 | kemal | 50399 | airforce |
df| name | salary | dapartment | |
|---|---|---|---|
| 0 | swapnil | 50399 | airforce |
| 1 | nihan | 50399 | airforce |
| 2 | vijay | 24499 | math |
| 3 | jay | cs | NaN |
| 4 | ajay | None | airforce |
| 5 | vinu | 50399 | None |
| 6 | bharat | 50399 | airforce |
| 7 | suahas | airforce | None |
| 8 | nihan | 50399 | airforce |
| 9 | kemal | 50399 | airforce |
slip 13
) Write a Python program to create a graph to find relationship between the petal length and petal width.(Use iris.csv dataset)
iris| Id | SepalLengthCm | SepalWidthCm | PetalLengthCm | PetalWidthCm | Species | |
|---|---|---|---|---|---|---|
| 0 | 1 | 5.1 | 3.5 | 1.4 | 0.2 | Iris-setosa |
| 1 | 2 | 4.9 | 3.0 | 1.4 | 0.2 | Iris-setosa |
| 2 | 3 | 4.7 | 3.2 | 1.3 | 0.2 | Iris-setosa |
| 3 | 4 | 4.6 | 3.1 | 1.5 | 0.2 | Iris-setosa |
| 4 | 5 | 5.0 | 3.6 | 1.4 | 0.2 | Iris-setosa |
| ... | ... | ... | ... | ... | ... | ... |
| 145 | 146 | 6.7 | 3.0 | 5.2 | 2.3 | Iris-virginica |
| 146 | 147 | 6.3 | 2.5 | 5.0 | 1.9 | Iris-virginica |
| 147 | 148 | 6.5 | 3.0 | 5.2 | 2.0 | Iris-virginica |
| 148 | 149 | 6.2 | 3.4 | 5.4 | 2.3 | Iris-virginica |
| 149 | 150 | 5.9 | 3.0 | 5.1 | 1.8 | Iris-virginica |
150 rows × 6 columns
import pandas as pdimport matplotlib.pyplot as pltiris = pd.read_csv("Iris.csv")fig = iris[iris.Species=='Iris-setosa'].plot.scatter(x='PetalLengthCm',y='PetalWidthCm',color='orange', label='Setosa')iris[iris.Species=='Iris-versicolor'].plot.scatter(x='PetalLengthCm',y='PetalWidthCm',color='blue', label='versicolor',ax=fig)iris[iris.Species=='Iris-virginica'].plot.scatter(x='PetalLengthCm',y='PetalWidthCm',color='green', label='virginica', ax=fig)fig.set_xlabel("Petal Length")fig.set_ylabel("Petal Width")fig.set_title(" Petal Length VS Width")fig=plt.gcf()fig.set_size_inches(12,8)plt.show()<AxesSubplot:xlabel='PetalLengthCm', ylabel='PetalWidthCm'>
#orimport pandas as pdimport matplotlib.pyplot as pltiris = pd.read_csv("Iris.csv")fig = iris[iris.Species=='Iris-setosa'].plot.scatter(x='PetalLengthCm',y='PetalWidthCm',color='orange', label='Setosa')iris[iris.Species=='Iris-versicolor'].plot.scatter(x='PetalLengthCm',y='PetalWidthCm',color='blue', label='versicolor')iris[iris.Species=='Iris-virginica'].plot.scatter(x='PetalLengthCm',y='PetalWidthCm',color='green', label='virginica')fig.set_xlabel("Petal Length")fig.set_ylabel("Petal Width")fig.set_title(" Petal Length VS Width")fig=plt.gcf()fig.set_size_inches(12,8)plt.show()Write a Python program to find the maximum and minimum value of a given flattened array.
arr1 = np.array([[1,2,4],[7,5,9]])arr1.max()9
arr1 = np.array([[1,2,4],[7,5,9]])arr1.min()1
slip 16
Write a python program to create a data frame for students’ information such as name, graduation percentage and age. Display average age of students, average of graduation percentage.
import pandas as pddata = [['swapnil',88,22],['om',88,12],['jay',98,32],['sai',45,25],['didi',83,22],['swapnil',88,22]]df = pd.DataFrame(data, columns = ["name","percentage","age"])df| name | percentage | age | |
|---|---|---|---|
| 0 | swapnil | 88 | 22 |
| 1 | om | 88 | 12 |
| 2 | jay | 98 | 32 |
| 3 | sai | 45 | 25 |
| 4 | didi | 83 | 22 |
| 5 | swapnil | 88 | 22 |
avg = df["age"].mean()avg22.5
per = df["percentage"].mean()per81.66666666666667
Write a python program to create two lists, one representing subject names and the other representing marks obtained in those subjects. Display the data in a pie chart and bar chart. import matplotlib.pyplot as plt#sname = [["math"],["bio"],["sci"],["ds"],["hist"],["eng"],["stat"]]sname = ["math","bio","sci","ds","hist","eng","stat"]marks = [78,76,58,98,45,66,90]plt.pie(marks , labels= sname,autopct = '%1.0f%%')plt.title("pie chart")plt.show()sname = ["math","bio","sci","ds","hist","eng","stat"]marks = [78,76,58,98,45,66,90]plt.bar(sname,marks)plt.title("bar plot")plt.show()slip 17
Write a Python program to draw scatter plots to compare two features of the iris dataset
import pandas as pdiris = pd.read_csv("Iris.csv")iris| Id | SepalLengthCm | SepalWidthCm | PetalLengthCm | PetalWidthCm | Species | |
|---|---|---|---|---|---|---|
| 0 | 1 | 5.1 | 3.5 | 1.4 | 0.2 | Iris-setosa |
| 1 | 2 | 4.9 | 3.0 | 1.4 | 0.2 | Iris-setosa |
| 2 | 3 | 4.7 | 3.2 | 1.3 | 0.2 | Iris-setosa |
| 3 | 4 | 4.6 | 3.1 | 1.5 | 0.2 | Iris-setosa |
| 4 | 5 | 5.0 | 3.6 | 1.4 | 0.2 | Iris-setosa |
| ... | ... | ... | ... | ... | ... | ... |
| 145 | 146 | 6.7 | 3.0 | 5.2 | 2.3 | Iris-virginica |
| 146 | 147 | 6.3 | 2.5 | 5.0 | 1.9 | Iris-virginica |
| 147 | 148 | 6.5 | 3.0 | 5.2 | 2.0 | Iris-virginica |
| 148 | 149 | 6.2 | 3.4 | 5.4 | 2.3 | Iris-virginica |
| 149 | 150 | 5.9 | 3.0 | 5.1 | 1.8 | Iris-virginica |
150 rows × 6 columns
plt.scatter(iris['SepalLengthCm'],iris['SepalWidthCm'],color ="red")plt.title("scatter plot")plt.xlabel("SepalLengthCm")plt.ylabel("SepalWidthCm")plt.show()import pandas as pddata = [['swapnil',88,22],['om',88,12],['jay',98,32],['sai',45,25],['didi',83,22],['swapnil',88,22]]df = pd.DataFrame(data, columns = ["name","percentage","age"])df| name | percentage | age | |
|---|---|---|---|
| 0 | swapnil | 88 | 22 |
| 1 | om | 88 | 12 |
| 2 | jay | 98 | 32 |
| 3 | sai | 45 | 25 |
| 4 | didi | 83 | 22 |
| 5 | swapnil | 88 | 22 |
slip 18
Write a Python program to create box plots to see how each feature i.e. Sepal Length, Sepal Width, Petal Length, Petal Width are distributed across the three species. (Use iris.csv dataset)
plt.boxplot(iris["SepalLengthCm"])plt.show()plt.boxplot(iris["SepalWidthCm"])plt.show()plt.boxplot(iris["PetalLengthCm"])plt.show()plt.boxplot(iris["PetalWidthCm"])plt.show()Use the heights and weights dataset and load the dataset from a given csv file into a dataframe. Print the first, last 5 rows and random 10 row
hw = pd.read_csv("height_weight.csv")hw.head(5)| height | weight | |
|---|---|---|
| 0 | 7 | 56 |
| 1 | 6 | 45 |
| 2 | 5 | 45 |
| 3 | 5 | 46 |
| 4 | 4 | 75 |
hw.tail(5)| height | weight | |
|---|---|---|
| 17 | 6 | 67 |
| 18 | 6 | 36 |
| 19 | 4 | 35 |
| 20 | 8 | 75 |
| 21 | 6 | 56 |
hw.sample(10)| height | weight | |
|---|---|---|
| 12 | 8 | 90 |
| 11 | 4 | 88 |
| 5 | 6 | 67 |
| 13 | 5 | 56 |
| 20 | 8 | 75 |
| 17 | 6 | 67 |
| 16 | 4 | 75 |
| 0 | 7 | 56 |
| 21 | 6 | 56 |
| 3 | 5 | 46 |
slip 19
) Write a Python program [15]
- To create a dataframe containing columns name, age and percentage. Add 10 rows to the dataframe. View the dataframe.
- To print the shape, number of rows-columns, data types, feature names and the description of the data
- To Add 5 rows with duplicate values and missing values. Add a column ‘remarks’ with empty values. Display the data.
import pandas as pddata = [['swapnil',88,22],['om',88,12],['jay',98,32],['sai',45,25],['didi',83,22],['swapnil',88,22]]df = pd.DataFrame(data, columns = ["name","percentage","age"])df| name | percentage | age | |
|---|---|---|---|
| 0 | swapnil | 88 | 22 |
| 1 | om | 88 | 12 |
| 2 | jay | 98 | 32 |
| 3 | sai | 45 | 25 |
| 4 | didi | 83 | 22 |
| 5 | swapnil | 88 | 22 |
df.shape(6, 3)
df.info()<class 'pandas.core.frame.DataFrame'> RangeIndex: 6 entries, 0 to 5 Data columns (total 3 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 name 6 non-null object 1 percentage 6 non-null int64 2 age 6 non-null int64 dtypes: int64(2), object(1) memory usage: 272.0+ bytes
df.dtypesname object percentage int64 age int64 dtype: object
df.describe()| percentage | age | |
|---|---|---|
| count | 6.000000 | 6.000000 |
| mean | 81.666667 | 22.500000 |
| std | 18.618987 | 6.442049 |
| min | 45.000000 | 12.000000 |
| 25% | 84.250000 | 22.000000 |
| 50% | 88.000000 | 22.000000 |
| 75% | 88.000000 | 24.250000 |
| max | 98.000000 | 32.000000 |
df.loc[6]=["om",88,12]df| name | percentage | age | |
|---|---|---|---|
| 0 | swapnil | 88 | 22 |
| 1 | om | 88 | 12 |
| 2 | jay | 98 | 32 |
| 3 | sai | 45 | 25 |
| 4 | didi | 83 | 22 |
| 5 | swapnil | 88 | 22 |
| 6 | om | 88 | 12 |
df.loc[7] = [None,None,49]df.loc[8] = ["rohit",None,49]df.loc[9] = ["didi",83,22]df.loc[10] = [None,None,None]df| name | percentage | age | |
|---|---|---|---|
| 0 | swapnil | 88.0 | 22.0 |
| 1 | om | 88.0 | 12.0 |
| 2 | jay | 98.0 | 32.0 |
| 3 | sai | 45.0 | 25.0 |
| 4 | didi | 83.0 | 22.0 |
| 5 | swapnil | 88.0 | 22.0 |
| 6 | om | 88.0 | 12.0 |
| 7 | None | NaN | 49.0 |
| 8 | rohit | NaN | 49.0 |
| 9 | didi | 83.0 | 22.0 |
| 10 | None | NaN | NaN |
df["remark"] =Nonedf| name | percentage | age | remark | |
|---|---|---|---|---|
| 0 | swapnil | 88.0 | 22.0 | None |
| 1 | om | 88.0 | 12.0 | None |
| 2 | jay | 98.0 | 32.0 | None |
| 3 | sai | 45.0 | 25.0 | None |
| 4 | didi | 83.0 | 22.0 | None |
| 5 | swapnil | 88.0 | 22.0 | None |
| 6 | om | 88.0 | 12.0 | None |
| 7 | None | NaN | 49.0 | None |
| 8 | rohit | NaN | 49.0 | None |
| 9 | didi | 83.0 | 22.0 | None |
| 10 | None | NaN | NaN | None |
slip20
Add two outliers to the above data and display the box plot.
import numpy as nparr = np.array([1,2,3,4,5,6,100,150])plt.boxplot(arr)plt.show()slip 21
Import dataset “iris.csv”. Write a Python program to create a Bar plot to get the frequency of the three species of the Iris data.
import pandas as pdimport matplotlib.pyplot as pltiris = pd.read_csv("Iris.csv")iris['Species'].value_counts().plot.bar()plt.title("Iris Species %")plt.show() --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) ~\AppData\Local\Temp/ipykernel_8604/1709669635.py in <module> 5 iris = pd.read_csv("Iris.csv") 6 ----> 7 iris['Species'].value_counts().plt.bar() 8 9 plt.title("Iris Species %") ~\anaconda3\lib\site-packages\pandas\core\generic.py in __getattr__(self, name) 5485 ): 5486 return self[name] -> 5487 return object.__getattribute__(self, name) 5488 5489 def __setattr__(self, name: str, value) -> None: AttributeError: 'Series' object has no attribute 'plt'
Write a Python program to create a histogram of the three species of the Iris data.
slip 24 Q2
import pandas as pdimport matplotlib.pyplot as pltiris = pd.read_csv("Iris.csv")#iris['Species'].value_counts().plot.hist(bins=[10,20,30,40,50])#plt.hist(iris['SepalLengthCm'],bins=20)plt.hist(iris['Species'],bins=20)plt.title("Iris Species %")plt.show() iris| Id | SepalLengthCm | SepalWidthCm | PetalLengthCm | PetalWidthCm | Species | |
|---|---|---|---|---|---|---|
| 0 | 1 | 5.1 | 3.5 | 1.4 | 0.2 | Iris-setosa |
| 1 | 2 | 4.9 | 3.0 | 1.4 | 0.2 | Iris-setosa |
| 2 | 3 | 4.7 | 3.2 | 1.3 | 0.2 | Iris-setosa |
| 3 | 4 | 4.6 | 3.1 | 1.5 | 0.2 | Iris-setosa |
| 4 | 5 | 5.0 | 3.6 | 1.4 | 0.2 | Iris-setosa |
| ... | ... | ... | ... | ... | ... | ... |
| 145 | 146 | 6.7 | 3.0 | 5.2 | 2.3 | Iris-virginica |
| 146 | 147 | 6.3 | 2.5 | 5.0 | 1.9 | Iris-virginica |
| 147 | 148 | 6.5 | 3.0 | 5.2 | 2.0 | Iris-virginica |
| 148 | 149 | 6.2 | 3.4 | 5.4 | 2.3 | Iris-virginica |
| 149 | 150 | 5.9 | 3.0 | 5.1 | 1.8 | Iris-virginica |
150 rows × 6 columns
#Import dataset “iris.csv”. Write a Python program to create a Bar plot to get the #frequency of the three species of the Iris data. plt.bar(iris['Species'],height=20)<BarContainer object of 150 artists>
slip 30 ,26,25,20,15,12,9,,4
generate a random array of 50 intger and display them using a line chart , scatter plot , histogram and box plot Apply the appropriate color , labels and styling options
from numpy import randomarr = random.randint(1,100,50)print(n)[67 79 16 57 14 84 58 17 73 22 12 34 59 54 91 42 87 74 9 6]
import matplotlib.pyplot as pltplt.plot(arr, color='red')plt.xlabel("x-axis")plt.ylabel("y-axis")plt.title("line chart")plt.show()arr2 = random.randint(1,100,50)arr2array([ 2, 30, 30, 4, 72, 57, 35, 32, 55, 86, 54, 21, 83, 72, 51, 21, 49,
99, 8, 87, 68, 16, 40, 74, 57, 10, 68, 78, 49, 43, 3, 73, 18, 44,
6, 51, 81, 86, 16, 41, 86, 54, 99, 10, 65, 43, 18, 93, 74, 30])import matplotlib.pyplot as pltplt.scatter(arr,arr2,color="red")plt.xlabel("x-axis")plt.ylabel("y-axis")plt.title("scatter plot")plt.show()import matplotlib.pyplot as pltplt.hist(arr,bins=[0,25,50,75,100],color="red") # bins means width of each blockplt.xlabel("x-axis")plt.ylabel("y-axis")plt.title("histogram ")plt.show()import matplotlib.pyplot as pltplt.boxplot(arr)plt.xlabel("x-axis")plt.ylabel("y-axis")plt.title("box plot")plt.show()create two list one representing subject name and other representing marks obtained on those subject display data in the bar chart
Subject = ['English', 'Maths', 'Science','history','data science']marks = [ 90,80,70,97,40]import matplotlib.pyplot as pltplt.bar(Subject,marks,color="red") plt.xlabel("x-axis")plt.ylabel("y-axis")plt.title("bar chart ")plt.show()Slip 28
# Q.1 Write a Python Program to create a dataframe containing columns name, height, and weight.# Add 10 rows to the dataframe. view the dataframeimport pandas as pddf = pd.DataFrame(columns = ['Name','Height','Weight'])df.loc[0] = ['Nil' , 7 , 58]df.loc[1] = [ None , 6 , 49]df.loc[2] = ['Emma' , 6 , 45]df.loc[3] = ['Swapnil' , 5 , 56]df.loc[4] = ['Swamiraj' , None , 56]df.loc[5] = ['Vaishu' , 4 , 49]df.loc[6] = ['Snehal' , 5 , 58]df.loc[7] = ['Vaishu' , 4 , 49]df.loc[8] = ['Navin' , 5 , None]df.loc[9] = ['Shreya' , 6 , 56]df| Name | Height | Weight | |
|---|---|---|---|
| 0 | Nil | 7 | 58 |
| 1 | NaN | 6.0 | 49.0 |
| 2 | Emma | 6 | 45 |
| 3 | Swapnil | 5 | 56 |
| 4 | Swamiraj | None | 56 |
| 5 | Vaishu | 4 | 49 |
| 6 | Snehal | 5 | 58 |
| 7 | Vaishu | 4 | 49 |
| 8 | Navin | 5 | None |
| 9 | Shreya | 6 | 56 |
# Q.2 write a python program to find shape, size, datatypes of the dataframe object.df.shape(10, 3)
df.size30
df.dtypesName object Height object Weight object dtype: object
# Q.3 write a python program to view basic statistical details of the datadf.describe()| Name | Height | Weight | |
|---|---|---|---|
| count | 9 | 9.0 | 9.0 |
| unique | 8 | 4.0 | 4.0 |
| top | Vaishu | 6.0 | 49.0 |
| freq | 2 | 3.0 | 3.0 |
df.info()<class 'pandas.core.frame.DataFrame'> Int64Index: 10 entries, 0 to 9 Data columns (total 3 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Name 9 non-null object 1 Height 9 non-null object 2 Weight 9 non-null object dtypes: object(3) memory usage: 320.0+ bytes
# Q.4 write a python program to get the number of observations, missing values and nan values.df.isnull()| Name | Height | Weight | |
|---|---|---|---|
| 0 | False | False | False |
| 1 | True | False | False |
| 2 | False | False | False |
| 3 | False | False | False |
| 4 | False | True | False |
| 5 | False | False | False |
| 6 | False | False | False |
| 7 | False | False | False |
| 8 | False | False | True |
| 9 | False | False | False |
df.dropna()| Name | Height | Weight | |
|---|---|---|---|
| 0 | Nil | 7 | 58 |
| 2 | Emma | 6 | 45 |
| 3 | Swapnil | 5 | 56 |
| 5 | Vaishu | 4 | 49 |
| 6 | Snehal | 5 | 58 |
| 7 | Vaishu | 4 | 49 |
| 9 | Shreya | 6 | 56 |
# Q.5 write a python program to add a column to dataframe "BMI" which is calculated as weight/height^2df['BMI']=df.Weight/(df.Height**2)df| Name | Height | Weight | BMI | |
|---|---|---|---|---|
| 0 | Nil | 7 | 58 | 1.183673 |
| 1 | NaN | 6.0 | 49.0 | 1.361111 |
| 2 | Emma | 6 | 45 | 1.25 |
| 3 | Swapnil | 5 | 56 | 2.24 |
| 4 | Swamiraj | None | 56 | NaN |
| 5 | Vaishu | 4 | 49 | 3.0625 |
| 6 | Snehal | 5 | 58 | 2.32 |
| 7 | Vaishu | 4 | 49 | 3.0625 |
| 8 | Navin | 5 | None | NaN |
| 9 | Shreya | 6 | 56 | 1.555556 |
# Q.6 write a python program to find the maximum and minimum BMIdf.BMI.max()3.0625
df.BMI.min()1.183673469387755
# Q.7 write a python program to generate a Scatter plot of height and weightimport matplotlib.pyplot as pltdf.plot.scatter( x = 'Height', y = 'Weight' )<AxesSubplot:xlabel='Height', ylabel='Weight'>
s
s
s
- Get link
- X
- Other Apps
❤️
ReplyDelete