Data Science Practical Exam Programs

Data Science Practical Exam Programs

slip 1 & slip 11

Write a Python program to create a Pie plot to get the frequency of the three species of the Iris data (Use iris.csv)

In [32]:

import pandas as pd
df2 = pd.read_csv("Iris.csv")
df2

Out[32]:

	Id	SepalLengthCm	SepalWidthCm	PetalLengthCm	PetalWidthCm	Species
0	1	5.1	3.5	1.4	0.2	Iris-setosa
1	2	4.9	3.0	1.4	0.2	Iris-setosa
2	3	4.7	3.2	1.3	0.2	Iris-setosa
3	4	4.6	3.1	1.5	0.2	Iris-setosa
4	5	5.0	3.6	1.4	0.2	Iris-setosa
...	...	...	...	...	...	...
145	146	6.7	3.0	5.2	2.3	Iris-virginica
146	147	6.3	2.5	5.0	1.9	Iris-virginica
147	148	6.5	3.0	5.2	2.0	Iris-virginica
148	149	6.2	3.4	5.4	2.3	Iris-virginica
149	150	5.9	3.0	5.1	1.8	Iris-virginica

150 rows × 6 columns

In [ ]:

In [28]:

import pandas as pd
import matplotlib.pyplot as plt
​
​
iris = pd.read_csv("Iris.csv")
​
​
​
iris['Species'].value_counts().plot.pie()
​
plt.title("Iris Species %")
plt.show()
​

In [31]:

import pandas as pd
import matplotlib.pyplot as plt
​
iris = pd.read_csv("Iris.csv")
​
pie = iris['Species'].value_counts()
pie

Out[31]:

Iris-setosa        50
Iris-versicolor    50
Iris-virginica     50
Name: Species, dtype: int64

In [33]:

iris["PetalWidthCm"].value_counts()
​

Out[33]:

0.2    28
1.3    13
1.8    12
1.5    12
1.4     8
2.3     8
1.0     7
0.4     7
0.3     7
0.1     6
2.1     6
2.0     6
1.2     5
1.9     5
1.6     4
2.5     3
2.2     3
2.4     3
1.1     3
1.7     2
0.6     1
0.5     1
Name: PetalWidthCm, dtype: int64

B) Write a Python program to view basic statistical details of the data.(Use wineequality-red.csv) 
​
​

In [35]:

wine = pd.read_csv("wineequality-red.csv")
wine

Out[35]:

	fixed acidity;"volatile acidity";"citric acid";"residual sugar";"chlorides";"free sulfur dioxide";"total sulfur dioxide";"density";"pH";"sulphates";"alcohol";"quality"
0	7.4;0.7;0;1.9;0.076;11;34;0.9978;3.51;0.56;9.4;5
1	7.8;0.88;0;2.6;0.098;25;67;0.9968;3.2;0.68;9.8;5
2	7.8;0.76;0.04;2.3;0.092;15;54;0.997;3.26;0.65;...
3	11.2;0.28;0.56;1.9;0.075;17;60;0.998;3.16;0.58...
4	7.4;0.7;0;1.9;0.076;11;34;0.9978;3.51;0.56;9.4;5
...	...
1594	6.2;0.6;0.08;2;0.09;32;44;0.9949;3.45;0.58;10.5;5
1595	5.9;0.55;0.1;2.2;0.062;39;51;0.99512;3.52;0.76...
1596	6.3;0.51;0.13;2.3;0.076;29;40;0.99574;3.42;0.7...
1597	5.9;0.645;0.12;2;0.075;32;44;0.99547;3.57;0.71...
1598	6;0.31;0.47;3.6;0.067;18;42;0.99549;3.39;0.66;...

1599 rows × 1 columns

In [36]:

​
wine.describe()

Out[36]:

	fixed acidity;"volatile acidity";"citric acid";"residual sugar";"chlorides";"free sulfur dioxide";"total sulfur dioxide";"density";"pH";"sulphates";"alcohol";"quality"
count	1599
unique	1359
top	7.2;0.36;0.46;2.1;0.074;24;44;0.99534;3.4;0.85...
freq	4

In [37]:

wine.info()
​

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1599 entries, 0 to 1598
Data columns (total 1 columns):
 #   Column                                                                                                                                                                   Non-Null Count  Dtype 
---  ------                                                                                                                                                                   --------------  ----- 
 0   fixed acidity;"volatile acidity";"citric acid";"residual sugar";"chlorides";"free sulfur dioxide";"total sulfur dioxide";"density";"pH";"sulphates";"alcohol";"quality"  1599 non-null   object
dtypes: object(1)
memory usage: 12.6+ KB

In [ ]:

slip 2 and slip 6

Q.2 A) Write a Python program for Handling Missing Value. Replace missing value of salary, age column with mean of that column.(Use Data.csv file).

In [57]:

import pandas as pd
import matplotlib.pyplot as plt
​
data = pd.read_csv("Data.csv")
data

Out[57]:

	name	age	salary
0	swapnil	22.0	300.0
1	raj	NaN	233.0
2	ajay	NaN	NaN
3	vijay	32.0	234.0
4	saurabh	23.0	NaN
5	sonny	NaN	234.0

In [59]:

data["age"] = data['age'].fillna(data['age'].mean())
data

Out[59]:

	name	age	salary
0	swapnil	22.000000	300.0
1	raj	25.666667	233.0
2	ajay	25.666667	NaN
3	vijay	32.000000	234.0
4	saurabh	23.000000	NaN
5	sonny	25.666667	234.0

In [60]:

data["salary"] = data['salary'].fillna(data['salary'].mean())
data

Out[60]:

	name	age	salary
0	swapnil	22.000000	300.00
1	raj	25.666667	233.00
2	ajay	25.666667	250.25
3	vijay	32.000000	234.00
4	saurabh	23.000000	250.25
5	sonny	25.666667	234.00

In [ ]:

Q.2 B) Write a Python program to generate a line plot of name Vs salary [5]

In [66]:

plt.plot(data['name'],data['salary'])
plt.xlabel("name")
plt.ylabel("salary")
plt.show()

Download the heights and weights dataset and load the dataset froma given csv file into a dataframe. Print the first, last 10 rows and random 20 rows also display shape of the dataset.

In [69]:

hw = pd.read_csv("height_weight.csv")
hw

Out[69]:

	height	weight
0	7	56
1	6	45
2	5	45
3	5	46
4	4	75
5	6	67
6	6	36
7	4	35
8	8	75
9	6	56
10	5	47
11	4	88
12	8	90
13	5	56
14	3	45
15	5	46
16	4	75
17	6	67
18	6	36
19	4	35
20	8	75
21	6	56

In [70]:

hw.head(10)
​

Out[70]:

	height	weight
0	7	56
1	6	45
2	5	45
3	5	46
4	4	75
5	6	67
6	6	36
7	4	35
8	8	75
9	6	56

In [71]:

hw.tail(10)
​

Out[71]:

	height	weight
12	8	90
13	5	56
14	3	45
15	5	46
16	4	75
17	6	67
18	6	36
19	4	35
20	8	75
21	6	56

In [ ]:

In [74]:

hw.sample(20)
​

Out[74]:

	height	weight
12	8	90
1	6	45
14	3	45
16	4	75
3	5	46
8	8	75
17	6	67
13	5	56
21	6	56
0	7	56
2	5	45
5	6	67
11	4	88
6	6	36
7	4	35
4	4	75
20	8	75
15	5	46
10	5	47
19	4	35

In [76]:

hw.shape
​

Out[76]:

(22, 2)

In [ ]:

slip 3

Write a Python program to create box plots to see how each feature i.e. Sepal Length, Sepal Width, Petal Length, Petal Width are distributed across the three species. (Use iris.csv dataset)

In [88]:

​
iris

Out[88]:

	Id	SepalLengthCm	SepalWidthCm	PetalLengthCm	PetalWidthCm	Species
0	1	5.1	3.5	1.4	0.2	Iris-setosa
1	2	4.9	3.0	1.4	0.2	Iris-setosa
2	3	4.7	3.2	1.3	0.2	Iris-setosa
3	4	4.6	3.1	1.5	0.2	Iris-setosa
4	5	5.0	3.6	1.4	0.2	Iris-setosa
...	...	...	...	...	...	...
145	146	6.7	3.0	5.2	2.3	Iris-virginica
146	147	6.3	2.5	5.0	1.9	Iris-virginica
147	148	6.5	3.0	5.2	2.0	Iris-virginica
148	149	6.2	3.4	5.4	2.3	Iris-virginica
149	150	5.9	3.0	5.1	1.8	Iris-virginica

150 rows × 6 columns

In [97]:

​
plt.boxplot(iris["Id"])

Out[97]:

{'whiskers': [<matplotlib.lines.Line2D at 0x1b41b07fd90>,
  <matplotlib.lines.Line2D at 0x1b41b090160>],
 'caps': [<matplotlib.lines.Line2D at 0x1b41b0904f0>,
  <matplotlib.lines.Line2D at 0x1b41b0908b0>],
 'boxes': [<matplotlib.lines.Line2D at 0x1b41b07fa60>],
 'medians': [<matplotlib.lines.Line2D at 0x1b41b090c40>],
 'fliers': [<matplotlib.lines.Line2D at 0x1b41b090fd0>],
 'means': []}

In [98]:

​
plt.boxplot(iris["SepalLengthCm"])

Out[98]:

{'whiskers': [<matplotlib.lines.Line2D at 0x1b41b0f8370>,
  <matplotlib.lines.Line2D at 0x1b41b0f8700>],
 'caps': [<matplotlib.lines.Line2D at 0x1b41b0f8a90>,
  <matplotlib.lines.Line2D at 0x1b41b0f8e20>],
 'boxes': [<matplotlib.lines.Line2D at 0x1b41b0e9f70>],
 'medians': [<matplotlib.lines.Line2D at 0x1b41b1011f0>],
 'fliers': [<matplotlib.lines.Line2D at 0x1b41b101580>],
 'means': []}

In [99]:

plt.boxplot(iris["SepalWidthCm"])
​

Out[99]:

{'whiskers': [<matplotlib.lines.Line2D at 0x1b41b15e9d0>,
  <matplotlib.lines.Line2D at 0x1b41b15ed00>],
 'caps': [<matplotlib.lines.Line2D at 0x1b41b16c0d0>,
  <matplotlib.lines.Line2D at 0x1b41b16c460>],
 'boxes': [<matplotlib.lines.Line2D at 0x1b41b15e640>],
 'medians': [<matplotlib.lines.Line2D at 0x1b41b16c7f0>],
 'fliers': [<matplotlib.lines.Line2D at 0x1b41b16cbb0>],
 'means': []}

In [100]:

plt.boxplot(iris["PetalLengthCm"])
​

Out[100]:

{'whiskers': [<matplotlib.lines.Line2D at 0x1b41b32a490>,
  <matplotlib.lines.Line2D at 0x1b41b32a820>],
 'caps': [<matplotlib.lines.Line2D at 0x1b41b32abb0>,
  <matplotlib.lines.Line2D at 0x1b41b32aee0>],
 'boxes': [<matplotlib.lines.Line2D at 0x1b41b32a100>],
 'medians': [<matplotlib.lines.Line2D at 0x1b41b3372b0>],
 'fliers': [<matplotlib.lines.Line2D at 0x1b41b337640>],
 'means': []}

Write a Python program to view basic statistical details of the data (Use Heights and Weights Dataset

In [101]:

hw.info()
​

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 22 entries, 0 to 21
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   height  22 non-null     int64
 1   weight  22 non-null     int64
dtypes: int64(2)
memory usage: 480.0 bytes

In [102]:

hw.describe()
​

Out[102]:

	height	weight
count	22.00000	22.000000
mean	5.50000	56.909091
std	1.40577	17.146176
min	3.00000	35.000000
25%	4.25000	45.000000
50%	5.50000	56.000000
75%	6.00000	73.000000
max	8.00000	90.000000

In [ ]:

slip 4 & slip 5
​
Generate a random array of 50 integers and display them using a line chart, scatter 
plot, histogram and box plot. Apply appropriate color, labels and styling options

In [112]:

# this program only for genarate random one integer..ok
​
import numpy as np
import random as rn
arr = rn.randint(1,100)
arr

Out[112]:

In [113]:

#this program for genarate random array...okay..
​
from numpy import random
​
arr = random.randint(1,100,50)
arr

Out[113]:

array([84, 49, 52, 66, 81, 34, 30,  2, 15, 14, 23, 85,  9, 22, 58, 69, 67,
       48,  3, 92, 54, 43, 79, 82, 70,  9, 19, 90, 57, 19, 18, 59, 97, 25,
       99, 74, 62, 38, 10,  9, 49,  7, 34, 53, 89, 74, 54,  1, 49, 67])

In [120]:

import matplotlib.pyplot as plt
​
plt.plot(arr,color = "red")
plt.xlabel("x-axis")
plt.ylabel("y-axis")
plt.title("line chart")

Out[120]:

Text(0.5, 1.0, 'line chart')

In [122]:

​
arr2 = random.randint(1,100,50)
​
plt.scatter(arr,arr2,color= 'red')
plt.xlabel("x-axis")
plt.ylabel("y-axis")
plt.title("scatter plot")
plt.show()

In [127]:

​
plt.hist(arr,bins=[20,40,60,80,100],color= 'red')
plt.xlabel("x-axis")
plt.ylabel("y-axis")
plt.title("histogram")
plt.show()

In [130]:

plt.boxplot(arr)
plt.xlabel("x-axis")
plt.ylabel("y-axis")
plt.title("boxplot plot")
plt.show()

In [ ]:

Write a Python program to print the shape, number of rows-columns, data types, feature names and the description of the data(Use User_Data.csv)

In [133]:

userdata = pd.read_csv("csvdata2.csv")
userdata

Out[133]:

	id	name	city	phone
0	11	swapnil	pune	12344
1	22	raj	mumbai	1234
2	33	vijay	patas	2344
3	44	jay	baramti	87
4	55	ajay	roti	8427

In [134]:

​userdata.shape
userdata.shape

Out[134]:

(5, 4)

In [135]:

​userdata.info()
userdata.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   id      5 non-null      int64 
 1    name   5 non-null      object
 2    city   5 non-null      object
 3    phone  5 non-null      int64 
dtypes: int64(2), object(2)
memory usage: 288.0+ bytes

In [137]:

userdata.dtypes
​

Out[137]:

id         int64
 name     object
 city     object
 phone     int64
dtype: object

In [145]:

userdata.describe()
​

Out[145]:

	id	phone
count	5.000000	5.000000
mean	33.000000	4887.200000
std	17.392527	5267.582624
min	11.000000	87.000000
25%	22.000000	1234.000000
50%	33.000000	2344.000000
75%	44.000000	8427.000000
max	55.000000	12344.000000

slip 7

Write a Python program to perform the following tasks : a. Apply OneHot coding on Country column. b. Apply Label encoding on purchased column (Data.csv have two categorical column the country column, and the purchased column).

In [147]:

data = pd.read_csv("Data1.csv")
data

Out[147]:

	Country	Age	Salary	Purchased
0	France	44	72000	No
1	Spain	27	48000	Yes
2	Germany	30	54000	No
3	Spain	38	61000	No
4	Germany	40	Yes	NaN
5	France	35	58000	Yes
6	Spain	52000	No	NaN
7	France	48	79000	Yes
8	Germany	50	83000	No
9	France	37	67000	Yes

In [156]:

from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
​
ct = ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [0])], remainder='passthrough')
data = pd.DataFrame(ct.fit_transform(data))
data

Out[156]:

	0	1	2	3	4
0	0.0	1.0	0.0	1.0	0.0
1	1.0	0.0	1.0	0.0	1.0
2	1.0	0.0	1.0	0.0	1.0
3	1.0	0.0	1.0	0.0	1.0
4	1.0	0.0	1.0	0.0	1.0
5	0.0	1.0	0.0	1.0	0.0
6	1.0	0.0	1.0	0.0	1.0
7	0.0	1.0	0.0	1.0	0.0
8	1.0	0.0	1.0	0.0	1.0
9	0.0	1.0	0.0	1.0	0.0

In [157]:

from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
data.iloc[:,-1] = le.fit_transform(data.iloc[:,-1])
data

Out[157]:

	0	1	2	3	4
0	0.0	1.0	0.0	1.0	0
1	1.0	0.0	1.0	0.0	1
2	1.0	0.0	1.0	0.0	1
3	1.0	0.0	1.0	0.0	1
4	1.0	0.0	1.0	0.0	1
5	0.0	1.0	0.0	1.0	0
6	1.0	0.0	1.0	0.0	1
7	0.0	1.0	0.0	1.0	0
8	1.0	0.0	1.0	0.0	1
9	0.0	1.0	0.0	1.0	0

Write a program in python to perform following task : [15] Standardizing Data (transform them into a standard Gaussian distribution with a mean of 0 and a standard deviation of 1) (Use winequality-red.csv)

In [159]:

#standardize the values in each column
​
df_new = (data-data.mean())/data.std()
df_new

Out[159]:

	0	1	2	3	4
0	-1.161895	1.161895	-1.161895	1.161895	-1.161895
1	0.774597	-0.774597	0.774597	-0.774597	0.774597
2	0.774597	-0.774597	0.774597	-0.774597	0.774597
3	0.774597	-0.774597	0.774597	-0.774597	0.774597
4	0.774597	-0.774597	0.774597	-0.774597	0.774597
5	-1.161895	1.161895	-1.161895	1.161895	-1.161895
6	0.774597	-0.774597	0.774597	-0.774597	0.774597
7	-1.161895	1.161895	-1.161895	1.161895	-1.161895
8	0.774597	-0.774597	0.774597	-0.774597	0.774597
9	-1.161895	1.161895	-1.161895	1.161895	-1.161895

In [160]:

df_new.mean()
​

Out[160]:

0    4.440892e-17
1   -4.440892e-17
2    4.440892e-17
3   -4.440892e-17
4    2.220446e-17
dtype: float64

In [162]:

df_new.std()
​

Out[162]:

0    1.0
1    1.0
2    1.0
3    1.0
4    1.0
dtype: float64

In [ ]:

Create two lists, one representing subject names and the other representing marks obtained in those subjects. Display the data in a pie chart.

In [167]:

import matplotlib.pyplot as plt
​
subnames = ["math","marathi","english","stat","python","java"]
marks = [78,56,45,67,97,75]
​
plt.pie(marks,labels = subnames)

Out[167]:

([<matplotlib.patches.Wedge at 0x1b41e568eb0>,
  <matplotlib.patches.Wedge at 0x1b41e574460>,
  <matplotlib.patches.Wedge at 0x1b41e574940>,
  <matplotlib.patches.Wedge at 0x1b41e574e20>,
  <matplotlib.patches.Wedge at 0x1b41e57f340>,
  <matplotlib.patches.Wedge at 0x1b41e57f820>],
 [Text(0.9163353394095168, 0.608547077677024, 'math'),
  Text(-0.024799968250606833, 1.099720401545215, 'marathi'),
  Text(-0.7748890343653689, 0.7807349002192141, 'english'),
  Text(-1.0984780301579335, -0.057844768651503314, 'stat'),
  Text(-0.30990577210017567, -1.055442282845914, 'python'),
  Text(0.9298224053418628, -0.5877331831063078, 'java')])

Write a program in python to perform following task (Use winequality-red.csv ) [5] Import Dataset and do the followings: a) Describing the dataset b) Shape of the dataset c) Display first 3 rows from datasee

In [171]:

wine = pd.read_csv("Data1.csv")
wine
​

Out[171]:

	Country	Age	Salary	Purchased
0	France	44	72000	No
1	Spain	27	48000	Yes
2	Germany	30	54000	No
3	Spain	38	61000	No
4	Germany	40	Yes	NaN
5	France	35	58000	Yes
6	Spain	52000	No	NaN
7	France	48	79000	Yes
8	Germany	50	83000	No
9	France	37	67000	Yes

In [173]:

wine.describe()
​

Out[173]:

	Age
count	10.000000
mean	5234.900000
std	16431.582824
min	27.000000
25%	35.500000
50%	39.000000
75%	47.000000
max	52000.000000

In [176]:

wine.shape
​

Out[176]:

(10, 4)

In [177]:

​
wine.head(3)

Out[177]:

	Country	Age	Salary	Purchased
0	France	44	72000	No
1	Spain	27	48000	Yes
2	Germany	30	54000	No

In [ ]:

slip 10

Write a python program to Display column-wise mean, and median for SOCRHeightWeight dataset.

In [179]:

hw = pd.read_csv("height_weight.csv")
hw
​

Out[179]:

	height	weight
0	7	56
1	6	45
2	5	45
3	5	46
4	4	75
5	6	67
6	6	36
7	4	35
8	8	75
9	6	56
10	5	47
11	4	88
12	8	90
13	5	56
14	3	45
15	5	46
16	4	75
17	6	67
18	6	36
19	4	35
20	8	75
21	6	56

In [236]:

hw.mean()

Out[236]:

height     5.500000
weight    56.909091
dtype: float64

In [237]:

hw.median()
​

Out[237]:

height     5.5
weight    56.0
dtype: float64

In [239]:

hw["height"].mean()
​

Out[239]:

5.5

Write a python program to compute sum of Manhattan distance between all pairs of points.

In [183]:

# Using scipy to Calculate the Manhattan Distance
from scipy.spatial.distance import cityblock
x1 = [1,2,3,4,5,6]
x2 = [10,20,30,1,2,3]
print(cityblock(x1, x2))
​
# Returns: 63

In [240]:

print(cityblock(hw["height"],hw["weight"]))
​

slip 12

Write a Python program to create data frame containing column name, salary, department add 10 rows with some missing and duplicate values to the data frame. Also drop all null and empty values. Print the modified data frame

In [229]:

import pandas as pd
import numpy as np
​
arr = np.array([['swapnil',50399,'airforce'],
                ['nihan',50399,'airforce'],
                ['vijay',24499,'math'],
                ['jay','cs',"NaN"],
                ['ajay',None,'airforce'],
                ['vinu',50399,None],
                ['bharat',50399,'airforce'], 
                ['suahas','airforce',None],
                ['nihan',50399,'airforce'],
                ['kemal',50399,'airforce']
               ])
​
​
df = pd.DataFrame(arr, columns =["name","salary","dapartment"])
df
​

Out[229]:

	name	salary	dapartment
0	swapnil	50399	airforce
1	nihan	50399	airforce
2	vijay	24499	math
3	jay	cs	NaN
4	ajay	None	airforce
5	vinu	50399	None
6	bharat	50399	airforce
7	suahas	airforce	None
8	nihan	50399	airforce
9	kemal	50399	airforce

In [241]:

df.isnull()
​

Out[241]:

	name	salary	dapartment
0	False	False	False
1	False	False	False
2	False	False	False
3	False	False	False
4	False	True	False
5	False	False	True
6	False	False	False
7	False	False	True
8	False	False	False
9	False	False	False

In [242]:

df.duplicated()
​

Out[242]:

0    False
1    False
2    False
3    False
4    False
5    False
6    False
7    False
8     True
9    False
dtype: bool

In [243]:

df.drop
​

Out[243]:

<bound method DataFrame.drop of       name    salary dapartment
0  swapnil     50399   airforce
1    nihan     50399   airforce
2    vijay     24499       math
3      jay        cs        NaN
4     ajay      None   airforce
5     vinu     50399       None
6   bharat     50399   airforce
7   suahas  airforce       None
8    nihan     50399   airforce
9    kemal     50399   airforce>

In [244]:

df
​

Out[244]:

	name	salary	dapartment
0	swapnil	50399	airforce
1	nihan	50399	airforce
2	vijay	24499	math
3	jay	cs	NaN
4	ajay	None	airforce
5	vinu	50399	None
6	bharat	50399	airforce
7	suahas	airforce	None
8	nihan	50399	airforce
9	kemal	50399	airforce

In [245]:

#df.dropna(inplace=True)       <--- used for permanantly delete
​
df.dropna()

Out[245]:

	name	salary	dapartment
0	swapnil	50399	airforce
1	nihan	50399	airforce
2	vijay	24499	math
3	jay	cs	NaN
6	bharat	50399	airforce
8	nihan	50399	airforce
9	kemal	50399	airforce

In [246]:

df
​

Out[246]:

	name	salary	dapartment
0	swapnil	50399	airforce
1	nihan	50399	airforce
2	vijay	24499	math
3	jay	cs	NaN
4	ajay	None	airforce
5	vinu	50399	None
6	bharat	50399	airforce
7	suahas	airforce	None
8	nihan	50399	airforce
9	kemal	50399	airforce

slip 13

) Write a Python program to create a graph to find relationship between the petal length and petal width.(Use iris.csv dataset)

In [250]:

iris
​

Out[250]:

	Id	SepalLengthCm	SepalWidthCm	PetalLengthCm	PetalWidthCm	Species
0	1	5.1	3.5	1.4	0.2	Iris-setosa
1	2	4.9	3.0	1.4	0.2	Iris-setosa
2	3	4.7	3.2	1.3	0.2	Iris-setosa
3	4	4.6	3.1	1.5	0.2	Iris-setosa
4	5	5.0	3.6	1.4	0.2	Iris-setosa
...	...	...	...	...	...	...
145	146	6.7	3.0	5.2	2.3	Iris-virginica
146	147	6.3	2.5	5.0	1.9	Iris-virginica
147	148	6.5	3.0	5.2	2.0	Iris-virginica
148	149	6.2	3.4	5.4	2.3	Iris-virginica
149	150	5.9	3.0	5.1	1.8	Iris-virginica

150 rows × 6 columns

In [252]:

import pandas as pd
import matplotlib.pyplot as plt
​
iris = pd.read_csv("Iris.csv")
​
fig = iris[iris.Species=='Iris-setosa'].plot.scatter(x='PetalLengthCm',y='PetalWidthCm',color='orange', label='Setosa')
iris[iris.Species=='Iris-versicolor'].plot.scatter(x='PetalLengthCm',y='PetalWidthCm',color='blue', label='versicolor',ax=fig)
iris[iris.Species=='Iris-virginica'].plot.scatter(x='PetalLengthCm',y='PetalWidthCm',color='green', label='virginica', ax=fig)
​
fig.set_xlabel("Petal Length")
fig.set_ylabel("Petal Width")
fig.set_title(" Petal Length VS Width")
fig=plt.gcf()
fig.set_size_inches(12,8)
plt.show()
​

Out[252]:

<AxesSubplot:xlabel='PetalLengthCm', ylabel='PetalWidthCm'>

In [253]:

#or
​
import pandas as pd
import matplotlib.pyplot as plt
​
iris = pd.read_csv("Iris.csv")
​
fig = iris[iris.Species=='Iris-setosa'].plot.scatter(x='PetalLengthCm',y='PetalWidthCm',color='orange', label='Setosa')
iris[iris.Species=='Iris-versicolor'].plot.scatter(x='PetalLengthCm',y='PetalWidthCm',color='blue', label='versicolor')
iris[iris.Species=='Iris-virginica'].plot.scatter(x='PetalLengthCm',y='PetalWidthCm',color='green', label='virginica')
​
​
fig.set_xlabel("Petal Length")
fig.set_ylabel("Petal Width")
fig.set_title(" Petal Length VS Width")
fig=plt.gcf()
fig.set_size_inches(12,8)
plt.show()
​

Write a Python program to find the maximum and minimum value of a given flattened array.

In [258]:

​
arr1 = np.array([[1,2,4],[7,5,9]])
arr1.max()

Out[258]:

In [260]:

​
arr1 = np.array([[1,2,4],[7,5,9]])
arr1.min()

Out[260]:

In [ ]:

slip 16

Write a python program to create a data frame for students’ information such as name, graduation percentage and age. Display average age of students, average of graduation percentage.

In [1]:

import pandas as pd
​
data = [['swapnil',88,22],['om',88,12],['jay',98,32],['sai',45,25],['didi',83,22],['swapnil',88,22]]
​
df = pd.DataFrame(data, columns = ["name","percentage","age"])
df

Out[1]:

	name	percentage	age
0	swapnil	88	22
1	om	88	12
2	jay	98	32
3	sai	45	25
4	didi	83	22
5	swapnil	88	22

In [8]:

avg = df["age"].mean()
avg

Out[8]:

22.5

In [10]:

per = df["percentage"].mean()
per

Out[10]:

81.66666666666667

Write a python program to create two lists, one representing subject names and the other 
representing marks obtained in those subjects. Display the data in a pie chart and bar chart. 
​

In [32]:

import matplotlib.pyplot as plt
​
​
#sname = [["math"],["bio"],["sci"],["ds"],["hist"],["eng"],["stat"]]
sname = ["math","bio","sci","ds","hist","eng","stat"]
marks = [78,76,58,98,45,66,90]
​
plt.pie(marks , labels= sname,autopct = '%1.0f%%')
plt.title("pie chart")
plt.show()
​

In [31]:

​
sname = ["math","bio","sci","ds","hist","eng","stat"]
marks = [78,76,58,98,45,66,90]
​
plt.bar(sname,marks)
plt.title("bar plot")
plt.show()

slip 17

Write a Python program to draw scatter plots to compare two features of the iris dataset

In [36]:

import pandas as pd
iris = pd.read_csv("Iris.csv")
iris

Out[36]:

	Id	SepalLengthCm	SepalWidthCm	PetalLengthCm	PetalWidthCm	Species
0	1	5.1	3.5	1.4	0.2	Iris-setosa
1	2	4.9	3.0	1.4	0.2	Iris-setosa
2	3	4.7	3.2	1.3	0.2	Iris-setosa
3	4	4.6	3.1	1.5	0.2	Iris-setosa
4	5	5.0	3.6	1.4	0.2	Iris-setosa
...	...	...	...	...	...	...
145	146	6.7	3.0	5.2	2.3	Iris-virginica
146	147	6.3	2.5	5.0	1.9	Iris-virginica
147	148	6.5	3.0	5.2	2.0	Iris-virginica
148	149	6.2	3.4	5.4	2.3	Iris-virginica
149	150	5.9	3.0	5.1	1.8	Iris-virginica

150 rows × 6 columns

In [54]:

plt.scatter(iris['SepalLengthCm'],iris['SepalWidthCm'],color ="red")
plt.title("scatter plot")
plt.xlabel("SepalLengthCm")
plt.ylabel("SepalWidthCm")
plt.show()

In [55]:

import pandas as pd
​
data = [['swapnil',88,22],['om',88,12],['jay',98,32],['sai',45,25],['didi',83,22],['swapnil',88,22]]
​
df = pd.DataFrame(data, columns = ["name","percentage","age"])
df

Out[55]:

	name	percentage	age
0	swapnil	88	22
1	om	88	12
2	jay	98	32
3	sai	45	25
4	didi	83	22
5	swapnil	88	22

In [ ]:

slip 18

Write a Python program to create box plots to see how each feature i.e. Sepal Length, Sepal Width, Petal Length, Petal Width are distributed across the three species. (Use iris.csv dataset)

In [59]:

​
plt.boxplot(iris["SepalLengthCm"])
plt.show()

In [61]:

plt.boxplot(iris["SepalWidthCm"])
plt.show()

In [62]:

plt.boxplot(iris["PetalLengthCm"])
plt.show()

In [63]:

plt.boxplot(iris["PetalWidthCm"])
plt.show()

Use the heights and weights dataset and load the dataset from a given csv file into a dataframe. Print the first, last 5 rows and random 10 row

In [65]:

hw = pd.read_csv("height_weight.csv")
hw.head(5)

Out[65]:

	height	weight
0	7	56
1	6	45
2	5	45
3	5	46
4	4	75

In [66]:

hw.tail(5)
​

Out[66]:

	height	weight
17	6	67
18	6	36
19	4	35
20	8	75
21	6	56

In [67]:

hw.sample(10)
​

Out[67]:

	height	weight
12	8	90
11	4	88
5	6	67
13	5	56
20	8	75
17	6	67
16	4	75
0	7	56
21	6	56
3	5	46

slip 19

) Write a Python program [15]

To create a dataframe containing columns name, age and percentage. Add 10 rows to the dataframe. View the dataframe.
To print the shape, number of rows-columns, data types, feature names and the description of the data
To Add 5 rows with duplicate values and missing values. Add a column ‘remarks’ with empty values. Display the data.

In [69]:

import pandas as pd
​
data = [['swapnil',88,22],['om',88,12],['jay',98,32],['sai',45,25],['didi',83,22],['swapnil',88,22]]
​
df = pd.DataFrame(data, columns = ["name","percentage","age"])
df

Out[69]:

	name	percentage	age
0	swapnil	88	22
1	om	88	12
2	jay	98	32
3	sai	45	25
4	didi	83	22
5	swapnil	88	22

In [71]:

df.shape
​

Out[71]:

(6, 3)

In [73]:

df.info()
​

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 3 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   name        6 non-null      object
 1   percentage  6 non-null      int64 
 2   age         6 non-null      int64 
dtypes: int64(2), object(1)
memory usage: 272.0+ bytes

In [74]:

df.dtypes
​

Out[74]:

name          object
percentage     int64
age            int64
dtype: object

In [76]:

df.describe()

Out[76]:

	percentage	age
count	6.000000	6.000000
mean	81.666667	22.500000
std	18.618987	6.442049
min	45.000000	12.000000
25%	84.250000	22.000000
50%	88.000000	22.000000
75%	88.000000	24.250000
max	98.000000	32.000000

In [77]:

df.loc[6]=["om",88,12]
df

Out[77]:

	name	percentage	age
0	swapnil	88	22
1	om	88	12
2	jay	98	32
3	sai	45	25
4	didi	83	22
5	swapnil	88	22
6	om	88	12

In [79]:

df.loc[7] = [None,None,49]
df.loc[8] = ["rohit",None,49]
df.loc[9] = ["didi",83,22]
df.loc[10] = [None,None,None]
df

Out[79]:

	name	percentage	age
0	swapnil	88.0	22.0
1	om	88.0	12.0
2	jay	98.0	32.0
3	sai	45.0	25.0
4	didi	83.0	22.0
5	swapnil	88.0	22.0
6	om	88.0	12.0
7	None	NaN	49.0
8	rohit	NaN	49.0
9	didi	83.0	22.0
10	None	NaN	NaN

In [81]:

df["remark"] =None
df

Out[81]:

	name	percentage	age	remark
0	swapnil	88.0	22.0	None
1	om	88.0	12.0	None
2	jay	98.0	32.0	None
3	sai	45.0	25.0	None
4	didi	83.0	22.0	None
5	swapnil	88.0	22.0	None
6	om	88.0	12.0	None
7	None	NaN	49.0	None
8	rohit	NaN	49.0	None
9	didi	83.0	22.0	None
10	None	NaN	NaN	None

slip20

Add two outliers to the above data and display the box plot.

In [91]:

import numpy as np
​
arr = np.array([1,2,3,4,5,6,100,150])
​
plt.boxplot(arr)
plt.show()

In [ ]:

slip 21

Import dataset “iris.csv”. Write a Python program to create a Bar plot to get the frequency of the three species of the Iris data.

In [93]:

import pandas as pd
import matplotlib.pyplot as plt
​
​
iris = pd.read_csv("Iris.csv")
​
iris['Species'].value_counts().plot.bar()
​
plt.title("Iris Species %")
plt.show()
    

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_8604/1709669635.py in <module>
      5 iris = pd.read_csv("Iris.csv")
      6 
----> 7 iris['Species'].value_counts().plt.bar()
      8 
      9 plt.title("Iris Species %")

~\anaconda3\lib\site-packages\pandas\core\generic.py in __getattr__(self, name)
   5485         ):
   5486             return self[name]
-> 5487         return object.__getattribute__(self, name)
   5488 
   5489     def __setattr__(self, name: str, value) -> None:

AttributeError: 'Series' object has no attribute 'plt'

Write a Python program to create a histogram of the three species of the Iris data.
slip 24 Q2

In [106]:

import pandas as pd
import matplotlib.pyplot as plt
​
​
iris = pd.read_csv("Iris.csv")
​
#iris['Species'].value_counts().plot.hist(bins=[10,20,30,40,50])
#plt.hist(iris['SepalLengthCm'],bins=20)
plt.hist(iris['Species'],bins=20)
​
​
plt.title("Iris Species %")
plt.show()
    

In [96]:

iris

Out[96]:

	Id	SepalLengthCm	SepalWidthCm	PetalLengthCm	PetalWidthCm	Species
0	1	5.1	3.5	1.4	0.2	Iris-setosa
1	2	4.9	3.0	1.4	0.2	Iris-setosa
2	3	4.7	3.2	1.3	0.2	Iris-setosa
3	4	4.6	3.1	1.5	0.2	Iris-setosa
4	5	5.0	3.6	1.4	0.2	Iris-setosa
...	...	...	...	...	...	...
145	146	6.7	3.0	5.2	2.3	Iris-virginica
146	147	6.3	2.5	5.0	1.9	Iris-virginica
147	148	6.5	3.0	5.2	2.0	Iris-virginica
148	149	6.2	3.4	5.4	2.3	Iris-virginica
149	150	5.9	3.0	5.1	1.8	Iris-virginica

150 rows × 6 columns

In [112]:

#Import dataset “iris.csv”. Write a Python program to create a Bar plot to get the 
#frequency of the three species of the Iris data. 
​
plt.bar(iris['Species'],height=20)
​
​

Out[112]:

<BarContainer object of 150 artists>

slip 30 ,26,25,20,15,12,9,,4

generate a random array of 50 intger and display them using a line chart , scatter plot , histogram and box plot Apply the appropriate color , labels and styling options

In [5]:

from numpy import random
arr = random.randint(1,100,50)
print(n)

[67 79 16 57 14 84 58 17 73 22 12 34 59 54 91 42 87 74  9  6]

In [25]:

import matplotlib.pyplot as plt
plt.plot(arr, color='red')
plt.xlabel("x-axis")
plt.ylabel("y-axis")
plt.title("line chart")
plt.show()

In [12]:

arr2 = random.randint(1,100,50)
arr2

Out[12]:

array([ 2, 30, 30,  4, 72, 57, 35, 32, 55, 86, 54, 21, 83, 72, 51, 21, 49,
       99,  8, 87, 68, 16, 40, 74, 57, 10, 68, 78, 49, 43,  3, 73, 18, 44,
        6, 51, 81, 86, 16, 41, 86, 54, 99, 10, 65, 43, 18, 93, 74, 30])

In [26]:

import matplotlib.pyplot as plt
plt.scatter(arr,arr2,color="red")
plt.xlabel("x-axis")
plt.ylabel("y-axis")
plt.title("scatter plot")
plt.show()

In [27]:

import matplotlib.pyplot as plt
plt.hist(arr,bins=[0,25,50,75,100],color="red")    # bins means width of each block
plt.xlabel("x-axis")
plt.ylabel("y-axis")
plt.title("histogram ")
plt.show()

import matplotlib.pyplot as plt
plt.boxplot(arr)
plt.xlabel("x-axis")
plt.ylabel("y-axis")
plt.title("box plot")
plt.show()

In [ ]:

create two list one representing subject name and other representing marks obtained on those subject display data in the bar chart

In [31]:

​
Subject = ['English', 'Maths', 'Science','history','data science']
marks = [ 90,80,70,97,40]
​
import matplotlib.pyplot as plt
plt.bar(Subject,marks,color="red")   
plt.xlabel("x-axis")
plt.ylabel("y-axis")
plt.title("bar chart ")
plt.show()

Slip 28

In [11]:

# Q.1 Write a Python Program to create a dataframe containing columns name, height, and weight.
#       Add 10 rows to the dataframe. view the dataframe
​
import pandas as pd
df = pd.DataFrame(columns = ['Name','Height','Weight'])
df.loc[0] = ['Nil' , 7 , 58]
df.loc[1] = [ None , 6 , 49]
df.loc[2] = ['Emma' , 6 , 45]
df.loc[3] = ['Swapnil' , 5 , 56]
df.loc[4] = ['Swamiraj' , None , 56]
df.loc[5] = ['Vaishu' , 4 , 49]
df.loc[6] = ['Snehal' , 5 , 58]
df.loc[7] = ['Vaishu' , 4 , 49]
df.loc[8] = ['Navin' , 5 , None]
df.loc[9] = ['Shreya' , 6 , 56]
df

Out[11]:

	Name	Height	Weight
0	Nil	7	58
1	NaN	6.0	49.0
2	Emma	6	45
3	Swapnil	5	56
4	Swamiraj	None	56
5	Vaishu	4	49
6	Snehal	5	58
7	Vaishu	4	49
8	Navin	5	None
9	Shreya	6	56

In [12]:

# Q.2 write a python program to find shape, size, datatypes of the dataframe object.
df.shape

Out[12]:

(10, 3)

In [13]:

df.size

Out[13]:

In [14]:

df.dtypes

Out[14]:

Name      object
Height    object
Weight    object
dtype: object

In [15]:

# Q.3 write a python program to view basic statistical details of the data
df.describe()

Out[15]:

	Name	Height	Weight
count	9	9.0	9.0
unique	8	4.0	4.0
top	Vaishu	6.0	49.0
freq	2	3.0	3.0

In [16]:

df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 10 entries, 0 to 9
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Name    9 non-null      object
 1   Height  9 non-null      object
 2   Weight  9 non-null      object
dtypes: object(3)
memory usage: 320.0+ bytes

In [17]:

# Q.4 write a python program to get the number of observations, missing values and nan values.
​
df.isnull()

Out[17]:

	Name	Height	Weight
0	False	False	False
1	True	False	False
2	False	False	False
3	False	False	False
4	False	True	False
5	False	False	False
6	False	False	False
7	False	False	False
8	False	False	True
9	False	False	False

In [18]:

df.dropna()

Out[18]:

	Name	Height	Weight
0	Nil	7	58
2	Emma	6	45
3	Swapnil	5	56
5	Vaishu	4	49
6	Snehal	5	58
7	Vaishu	4	49
9	Shreya	6	56

In [20]:

# Q.5 write a python program to add a column to dataframe "BMI" which is calculated as weight/height^2
​
df['BMI']=df.Weight/(df.Height**2)

In [21]:

df

Out[21]:

	Name	Height	Weight	BMI
0	Nil	7	58	1.183673
1	NaN	6.0	49.0	1.361111
2	Emma	6	45	1.25
3	Swapnil	5	56	2.24
4	Swamiraj	None	56	NaN
5	Vaishu	4	49	3.0625
6	Snehal	5	58	2.32
7	Vaishu	4	49	3.0625
8	Navin	5	None	NaN
9	Shreya	6	56	1.555556

In [25]:

# Q.6 write a python program to find the maximum and minimum BMI
​
df.BMI.max()

Out[25]:

3.0625

In [26]:

df.BMI.min()

Out[26]:

1.183673469387755

In [27]:

# Q.7 write a python program to generate a Scatter plot of height and weight
​
import matplotlib.pyplot as plt
df.plot.scatter( x = 'Height', y = 'Weight' )
​

Out[27]:

<AxesSubplot:xlabel='Height', ylabel='Weight'>

Search This Blog

Musicworld

Data Science Practical Exam Programs

Comments

Post a Comment

Popular posts from this blog

Practical slips programs : Machine Learning

Full Stack Developement Practical Slips Programs

Android App Developement Practicals Programs

	height	weight
0	7	56
1	6	45
2	5	45
3	5	46
4	4	75
5	6	67
6	6	36
7	4	35
8	8	75
9	6	56
10	5	47
11	4	88
12	8	90
13	5	56
14	3	45
15	5	46
16	4	75
17	6	67
18	6	36
19	4	35
20	8	75
21	6	56

	height	weight
12	8	90
1	6	45
14	3	45
16	4	75
3	5	46
8	8	75
17	6	67
13	5	56
21	6	56
0	7	56
2	5	45
5	6	67
11	4	88
6	6	36
7	4	35
4	4	75
20	8	75
15	5	46
10	5	47
19	4	35

	0	1	2	3	4
0	0.0	1.0	0.0	1.0	0.0
1	1.0	0.0	1.0	0.0	1.0
2	1.0	0.0	1.0	0.0	1.0
3	1.0	0.0	1.0	0.0	1.0
4	1.0	0.0	1.0	0.0	1.0
5	0.0	1.0	0.0	1.0	0.0
6	1.0	0.0	1.0	0.0	1.0
7	0.0	1.0	0.0	1.0	0.0
8	1.0	0.0	1.0	0.0	1.0
9	0.0	1.0	0.0	1.0	0.0

	0	1	2	3	4
0	0.0	1.0	0.0	1.0	0
1	1.0	0.0	1.0	0.0	1
2	1.0	0.0	1.0	0.0	1
3	1.0	0.0	1.0	0.0	1
4	1.0	0.0	1.0	0.0	1
5	0.0	1.0	0.0	1.0	0
6	1.0	0.0	1.0	0.0	1
7	0.0	1.0	0.0	1.0	0
8	1.0	0.0	1.0	0.0	1
9	0.0	1.0	0.0	1.0	0

	height	weight
0	7	56
1	6	45
2	5	45
3	5	46
4	4	75
5	6	67
6	6	36
7	4	35
8	8	75
9	6	56
10	5	47
11	4	88
12	8	90
13	5	56
14	3	45
15	5	46
16	4	75
17	6	67
18	6	36
19	4	35
20	8	75
21	6	56

	height	weight
0	7	56
1	6	45
2	5	45
3	5	46
4	4	75
5	6	67
6	6	36
7	4	35
8	8	75
9	6	56
10	5	47
11	4	88
12	8	90
13	5	56
14	3	45
15	5	46
16	4	75
17	6	67
18	6	36
19	4	35
20	8	75
21	6	56

	height	weight
12	8	90
1	6	45
14	3	45
16	4	75
3	5	46
8	8	75
17	6	67
13	5	56
21	6	56
0	7	56
2	5	45
5	6	67
11	4	88
6	6	36
7	4	35
4	4	75
20	8	75
15	5	46
10	5	47
19	4	35

	0	1	2	3	4
0	0.0	1.0	0.0	1.0	0.0
1	1.0	0.0	1.0	0.0	1.0
2	1.0	0.0	1.0	0.0	1.0
3	1.0	0.0	1.0	0.0	1.0
4	1.0	0.0	1.0	0.0	1.0
5	0.0	1.0	0.0	1.0	0.0
6	1.0	0.0	1.0	0.0	1.0
7	0.0	1.0	0.0	1.0	0.0
8	1.0	0.0	1.0	0.0	1.0
9	0.0	1.0	0.0	1.0	0.0

	0	1	2	3	4
0	0.0	1.0	0.0	1.0	0
1	1.0	0.0	1.0	0.0	1
2	1.0	0.0	1.0	0.0	1
3	1.0	0.0	1.0	0.0	1
4	1.0	0.0	1.0	0.0	1
5	0.0	1.0	0.0	1.0	0
6	1.0	0.0	1.0	0.0	1
7	0.0	1.0	0.0	1.0	0
8	1.0	0.0	1.0	0.0	1
9	0.0	1.0	0.0	1.0	0

	height	weight
0	7	56
1	6	45
2	5	45
3	5	46
4	4	75
5	6	67
6	6	36
7	4	35
8	8	75
9	6	56
10	5	47
11	4	88
12	8	90
13	5	56
14	3	45
15	5	46
16	4	75
17	6	67
18	6	36
19	4	35
20	8	75
21	6	56

	height	weight
0	7	56
1	6	45
2	5	45
3	5	46
4	4	75
5	6	67
6	6	36
7	4	35
8	8	75
9	6	56
10	5	47
11	4	88
12	8	90
13	5	56
14	3	45
15	5	46
16	4	75
17	6	67
18	6	36
19	4	35
20	8	75
21	6	56

	height	weight
12	8	90
1	6	45
14	3	45
16	4	75
3	5	46
8	8	75
17	6	67
13	5	56
21	6	56
0	7	56
2	5	45
5	6	67
11	4	88
6	6	36
7	4	35
4	4	75
20	8	75
15	5	46
10	5	47
19	4	35

	0	1	2	3	4
0	0.0	1.0	0.0	1.0	0.0
1	1.0	0.0	1.0	0.0	1.0
2	1.0	0.0	1.0	0.0	1.0
3	1.0	0.0	1.0	0.0	1.0
4	1.0	0.0	1.0	0.0	1.0
5	0.0	1.0	0.0	1.0	0.0
6	1.0	0.0	1.0	0.0	1.0
7	0.0	1.0	0.0	1.0	0.0
8	1.0	0.0	1.0	0.0	1.0
9	0.0	1.0	0.0	1.0	0.0

	0	1	2	3	4
0	0.0	1.0	0.0	1.0	0
1	1.0	0.0	1.0	0.0	1
2	1.0	0.0	1.0	0.0	1
3	1.0	0.0	1.0	0.0	1
4	1.0	0.0	1.0	0.0	1
5	0.0	1.0	0.0	1.0	0
6	1.0	0.0	1.0	0.0	1
7	0.0	1.0	0.0	1.0	0
8	1.0	0.0	1.0	0.0	1
9	0.0	1.0	0.0	1.0	0

	height	weight
0	7	56
1	6	45
2	5	45
3	5	46
4	4	75
5	6	67
6	6	36
7	4	35
8	8	75
9	6	56
10	5	47
11	4	88
12	8	90
13	5	56
14	3	45
15	5	46
16	4	75
17	6	67
18	6	36
19	4	35
20	8	75
21	6	56