Practical slips programs : Machine Learning
Savitribai Phule Pune University
M.Sc.-II (Comp. Sci.) Sem-III Practical Examination -2024-25
Practical Paper (CS-605-MJP) Lab course on CS-602-MJ Machine Learning
Practical slips programs : Machine Learning
Slip 1 :
Q.1 Use Apriori algorithm on groceries dataset to find which items are brought together. Use minimum support =0.25
Steps to Follow
Install Required Libraries:
Prepare the Dataset: The grocery dataset should be in a transaction format, where each row represents a transaction, and each column represents an item. A value of
1
indicates that the item was purchased in that transaction, and0
indicates it was not.Example of a grocery dataset (transactional format):
Python Code:
Here’s the code to apply the Apriori algorithm on the dataset with a minimum support of 0.25:
Explanation of Code:
Data Loading: This code assumes you have a groceries dataset in the correct format. If using a CSV file, load it using
pd.read_csv("groceries.csv")
.Apriori Algorithm:
- The
apriori()
function frommlxtend
generates frequent itemsets that meet the specified minimum support of 0.25. use_colnames=True
allows us to see item names in the resulting frequent itemsets instead of column indices.
- The
Association Rules:
- The
association_rules()
function generates rules from the frequent itemsets. metric="lift"
andmin_threshold=1
are set to get meaningful association rules where items are likely bought together.
- The
Sample Output:
The output should display the frequent itemsets with a support of at least 0.25 and association rules that show which items are often bought together, including metrics like confidence and lift.
Example Output:
Slip 2 :
Q.1. Write a python program to implement simple Linear Regression for predicting house price. First find all null values in a given dataset and remove them
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
# Load the dataset
# Replace 'house_prices.csv' with your actual dataset file
data = pd.read_csv('house_prices.csv')
# Display the first few rows of the dataset
print("First few rows of the dataset:")
print(data.head())
# Step 1: Find and remove null values
print("\nChecking for null values:")
print(data.isnull().sum()) # Check for null values in each column
# Drop rows with any null values
data = data.dropna()
print("\nData after removing null values:")
print(data.isnull().sum())
# Step 2: Select feature and target variable
# Assuming the dataset has a 'SquareFootage' column as the feature and 'Price' as the target variable
X = data[['SquareFootage']] # Input feature (independent variable)
y = data['Price'] # Target variable (dependent variable)
# Step 3: Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Step 4: Create and train the Linear Regression model
model = LinearRegression()
model.fit(X_train, y_train)
# Step 5: Make predictions on the test set
y_pred = model.predict(X_test)
# Step 6: Evaluate the model
mse = mean_squared_error(y_test, y_pred)
print(f"\nMean Squared Error: {mse}")
# Display the slope (coefficient) and intercept of the regression line
print(f"Slope (Coefficient): {model.coef_[0]}")
print(f"Intercept: {model.intercept_}")
# Step 7: Plot the data and the regression line
plt.figure(figsize=(10, 6))
plt.scatter(X, y, color='blue', label='Data Points')
plt.plot(X, model.predict(X), color='red', linewidth=2, label='Regression Line')
plt.xlabel('Square Footage')
plt.ylabel('Price')
plt.title('House Price Prediction using Linear Regression')
plt.legend()
plt.show()
Slip 3 :
Q.1. Write a python program to implement multiple Linear Regression for a house price dataset. Divide the dataset into training and testing data.
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
# Step 1: Load the dataset
# Replace 'house_prices.csv' with the actual path to your dataset
data = pd.read_csv('house_prices.csv')
# Display the first few rows of the dataset to understand its structure
print("First few rows of the dataset:")
print(data.head())
# Step 2: Data Preprocessing
# Check for any null values and handle them
print("\nChecking for null values:")
print(data.isnull().sum())
# Drop rows with any missing values
data = data.dropna()
# Select features and target variable
# Assume the dataset contains columns like 'SquareFootage', 'Bedrooms', 'Bathrooms', and 'Price'
# Adjust these column names based on the actual dataset
features = ['SquareFootage', 'Bedrooms', 'Bathrooms'] # Independent variables
X = data[features]
y = data['Price'] # Dependent variable
# Step 3: Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Step 4: Create and train the Multiple Linear Regression model
model = LinearRegression()
model.fit(X_train, y_train)
# Step 5: Make predictions on the test set
y_pred = model.predict(X_test)
# Step 6: Evaluate the model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f"\nMean Squared Error: {mse}")
print(f"R-squared: {r2}")
# Display the model's coefficients and intercept
print("\nModel Coefficients:")
for feature, coef in zip(features, model.coef_):
print(f"{feature}: {coef}")
print(f"Intercept: {model.intercept_}")
# Step 7: Test a sample prediction (optional)
sample_input = [[2000, 3, 2]] # Example: 2000 sqft, 3 bedrooms, 2 bathrooms
predicted_price = model.predict(sample_input)
print(f"\nPredicted Price for {sample_input[0]}: {predicted_price[0]}")
Slip 4 :
Q.1. Write a python program to implement k-means algorithm on a mall_customers dataset.
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
# Step 1: Load the dataset
# Replace 'mall_customers.csv' with the actual path to your dataset
data = pd.read_csv('mall_customers.csv')
# Display the first few rows of the dataset
print("First few rows of the dataset:")
print(data.head())
# Step 2: Preprocess the data
# We'll select two features (e.g., 'Annual Income' and 'Spending Score') for clustering
X = data[['Annual Income (k$)', 'Spending Score (1-100)']]
# Standardize the data for better clustering performance
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# Step 3: Use the Elbow Method to find the optimal number of clusters
inertia = []
K_range = range(1, 11)
for k in K_range:
kmeans = KMeans(n_clusters=k, random_state=42)
kmeans.fit(X_scaled)
inertia.append(kmeans.inertia_)
# Plot the Elbow curve
plt.figure(figsize=(8, 4))
plt.plot(K_range, inertia, marker='o')
plt.xlabel('Number of Clusters (K)')
plt.ylabel('Inertia')
plt.title('Elbow Method for Optimal K')
plt.show()
# Based on the Elbow plot, choose the optimal number of clusters
optimal_k = 5 # Adjust this based on the plot
kmeans = KMeans(n_clusters=optimal_k, random_state=42)
kmeans.fit(X_scaled)
# Step 4: Assign the clusters to the original data
data['Cluster'] = kmeans.labels_
# Display the first few rows of the dataset with cluster assignments
print("\nDataset with Cluster Assignments:")
print(data.head())
# Step 5: Visualize the clusters
plt.figure(figsize=(10, 6))
plt.scatter(X_scaled[:, 0], X_scaled[:, 1], c=kmeans.labels_, cmap='viridis', marker='o', edgecolor='k')
plt.xlabel('Annual Income (scaled)')
plt.ylabel('Spending Score (scaled)')
plt.title('K-means Clustering of Mall Customers')
plt.colorbar(label='Cluster')
plt.show()
Q.2. Write a python program to Implement Simple Linear Regression for predicting house price.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
# Step 1: Load the dataset
# Replace 'house_prices.csv' with the actual path to your dataset
# Assume the dataset has columns 'SquareFootage' and 'Price'
data = pd.read_csv('house_prices.csv')
# Display the first few rows of the dataset
print("First few rows of the dataset:")
print(data.head())
# Step 2: Preprocess the data
# Check for null values and remove them if any
print("\nChecking for null values:")
print(data.isnull().sum())
data = data.dropna()
# Step 3: Define the feature (e.g., SquareFootage) and target (Price) variables
X = data[['SquareFootage']] # Feature (independent variable)
y = data['Price'] # Target (dependent variable)
# Step 4: Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Step 5: Create and train the Simple Linear Regression model
model = LinearRegression()
model.fit(X_train, y_train)
# Step 6: Make predictions on the test set
y_pred = model.predict(X_test)
# Step 7: Evaluate the model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f"\nMean Squared Error (MSE): {mse:.2f}")
print(f"R-squared (R2) Score: {r2:.2f}")
# Display model coefficients
print("\nModel Coefficients:")
print(f"Slope (Coefficient for SquareFootage): {model.coef_[0]:.2f}")
print(f"Intercept: {model.intercept_:.2f}")
# Step 8: Visualize the results
plt.figure(figsize=(10, 6))
plt.scatter(X, y, color='blue', label='Actual Prices')
plt.plot(X_test, y_pred, color='red', linewidth=2, label='Regression Line')
plt.xlabel('Square Footage')
plt.ylabel('House Price')
plt.title('Simple Linear Regression for House Price Prediction')
plt.legend()
plt.show()
Slip 5 :
Q.1. Write a python program to implement Multiple Linear Regression for Fuel Consumption dataset.
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
import matplotlib.pyplot as plt
# Step 1: Load the dataset
# Replace 'fuel_consumption.csv' with the actual path to your dataset
# Assume the dataset contains columns like 'Engine Size', 'Cylinders', 'Fuel Consumption', and 'CO2 Emissions'
data = pd.read_csv('fuel_consumption.csv')
# Display the first few rows of the dataset
print("First few rows of the dataset:")
print(data.head())
# Step 2: Preprocess the data
# Checking for null values and removing them if any
print("\nChecking for null values:")
print(data.isnull().sum())
data = data.dropna()
# Step 3: Define the features and target variable
# Selecting multiple features for multiple linear regression
X = data[['Engine Size', 'Cylinders', 'Fuel Consumption']]
y = data['CO2 Emissions']
# Step 4: Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Step 5: Train the Multiple Linear Regression model
model = LinearRegression()
model.fit(X_train, y_train)
# Step 6: Make predictions on the test set
y_pred = model.predict(X_test)
# Step 7: Evaluate the model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f"\nMean Squared Error (MSE): {mse:.2f}")
print(f"R-squared (R2) Score: {r2:.2f}")
# Display model coefficients
print("\nModel Coefficients:")
for feature, coef in zip(X.columns, model.coef_):
print(f"{feature}: {coef:.2f}")
print(f"Intercept: {model.intercept_:.2f}")
# Step 8: Plotting the actual vs predicted CO2 Emissions
plt.figure(figsize=(10, 6))
plt.scatter(y_test, y_pred, color='blue', alpha=0.6)
plt.plot([y.min(), y.max()], [y.min(), y.max()], color='red', linewidth=2)
plt.xlabel('Actual CO2 Emissions')
plt.ylabel('Predicted CO2 Emissions')
plt.title('Actual vs Predicted CO2 Emissions')
plt.show()
Slip 6 :
Q.1. Write a python program to implement Polynomial Linear Regression for Boston Housing Dataset.
import pandas as pd
import numpy as np
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.metrics import mean_squared_error, r2_score
import matplotlib.pyplot as plt
# Step 1: Load the Boston Housing dataset
boston = load_boston()
data = pd.DataFrame(data=boston.data, columns=boston.feature_names)
data['PRICE'] = boston.target # Add the target variable (House prices)
# Display the first few rows of the dataset
print("First few rows of the Boston Housing dataset:")
print(data.head())
# Step 2: Define features (X) and target (y)
X = data.drop('PRICE', axis=1) # All features except the target 'PRICE'
y = data['PRICE'] # Target variable (house price)
# Step 3: Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Step 4: Apply Polynomial features
# We can experiment with the degree of the polynomial (e.g., degree=2)
poly = PolynomialFeatures(degree=2)
X_train_poly = poly.fit_transform(X_train)
X_test_poly = poly.transform(X_test)
# Step 5: Train a Polynomial Linear Regression model
model = LinearRegression()
model.fit(X_train_poly, y_train)
# Step 6: Make predictions
y_pred = model.predict(X_test_poly)
# Step 7: Evaluate the model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f"\nMean Squared Error (MSE): {mse:.2f}")
print(f"R-squared (R²) Score: {r2:.2f}")
# Step 8: Visualize the results (optional, for better understanding of relationships)
# Here, we will plot a comparison of actual vs predicted values for the first feature
plt.figure(figsize=(8, 6))
plt.scatter(y_test, y_pred, color='blue', alpha=0.6)
plt.plot([y.min(), y.max()], [y.min(), y.max()], color='red', linewidth=2) # y=x line for reference
plt.xlabel('Actual House Prices')
plt.ylabel('Predicted House Prices')
plt.title('Actual vs Predicted House Prices (Polynomial Regression)')
plt.show()
Slip 7 :
Q.1. Fit the simple linear regression model to Salary_positions.csv data. Predict the sa of level 11 and level 12 employees
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
# Step 1: Load the dataset
# Assuming Salary_positions.csv has columns 'Level' and 'Salary'
data = pd.read_csv('Salary_positions.csv')
# Step 2: Preprocess the data
# Inspect the data (optional)
print(data.head())
# Extracting the relevant columns
X = data[['Level']] # Independent variable (employee level)
y = data['Salary'] # Dependent variable (salary)
# Step 3: Fit the Simple Linear Regression Model
model = LinearRegression()
model.fit(X, y)
# Step 4: Predict salary for level 11 and level 12 employees
levels = np.array([11, 12]).reshape(-1, 1) # Reshape to match the model's input format
predictions = model.predict(levels)
# Output the predictions
print(f"Predicted salary for level 11 employee: ${predictions[0]:,.2f}")
print(f"Predicted salary for level 12 employee: ${predictions[1]:,.2f}")
# Step 5: Plot the data and the regression line
plt.scatter(X, y, color='blue') # Plot the actual data points
plt.plot(X, model.predict(X), color='red') # Plot the regression line
plt.title('Salary vs Level')
plt.xlabel('Employee Level')
plt.ylabel('Salary')
plt.show()
Slip 8 :
Q.1. Write a python program to categorize the given news text into one of the available 20 categories of news groups, using multinomial Naïve Bayes machine learning model.
# Import necessary libraries
import pandas as pd
from sklearn.datasets import fetch_20newsgroups
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, classification_report
# Step 1: Load the 20 Newsgroups dataset
newsgroups = fetch_20newsgroups(subset='all') # 'all' loads all the data
X = newsgroups.data # Text data
y = newsgroups.target # Target labels (categories)
# Step 2: Preprocess the data using TF-IDF Vectorization
# TF-IDF Vectorizer converts text into numerical representation
vectorizer = TfidfVectorizer(stop_words='english', max_features=5000)
X_tfidf = vectorizer.fit_transform(X)
# Step 3: Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X_tfidf, y, test_size=0.3, random_state=42)
# Step 4: Train a Multinomial Naive Bayes model
nb_classifier = MultinomialNB()
nb_classifier.fit(X_train, y_train)
# Step 5: Evaluate the model
y_pred = nb_classifier.predict(X_test)
# Print the accuracy of the model
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy of the Multinomial Naive Bayes model: {accuracy * 100:.2f}%')
# Print the classification report
print('\nClassification Report:')
print(classification_report(y_test, y_pred, target_names=newsgroups.target_names))
# Step 6: Categorize a new sample news text
sample_news = [
"NASA's Perseverance rover on Mars has successfully collected its first sample of Martian rock."
]
# Transform the new sample using the same vectorizer
sample_tfidf = vectorizer.transform(sample_news)
# Predict the category of the new sample
predicted_category = nb_classifier.predict(sample_tfidf)
print(f'\nPredicted Category for the sample news: {newsgroups.target_names[predicted_category[0]]}')
# Import necessary libraries
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report
from sklearn.preprocessing import LabelEncoder
# Step 1: Prepare the dataset
data = {
'Outlook': ['Sunny', 'Sunny', 'Overcast', 'Rain', 'Rain', 'Rain', 'Overcast', 'Sunny', 'Sunny', 'Rain', 'Sunny', 'Overcast', 'Overcast', 'Rain'],
'Temperature': ['Hot', 'Hot', 'Hot', 'Mild', 'Mild', 'Cool', 'Cool', 'Mild', 'Mild', 'Mild', 'Mild', 'Mild', 'Mild', 'Hot'],
'Humidity': ['High', 'High', 'High', 'High', 'Low', 'Low', 'Low', 'High', 'Low', 'Low', 'High', 'Low', 'Low', 'High'],
'Wind': ['Weak', 'Strong', 'Weak', 'Weak', 'Weak', 'Weak', 'Strong', 'Weak', 'Weak', 'Strong', 'Weak', 'Strong', 'Strong', 'Weak'],
'PlayTennis': ['No', 'No', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'No', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes', 'No']
}
# Convert the data into a DataFrame
df = pd.DataFrame(data)
# Step 2: Encode categorical variables into numeric values
label_encoders = {}
for column in ['Outlook', 'Temperature', 'Humidity', 'Wind', 'PlayTennis']:
le = LabelEncoder()
df[column] = le.fit_transform(df[column])
label_encoders[column] = le
# Step 3: Split the data into features and target
X = df.drop('PlayTennis', axis=1) # Features
y = df['PlayTennis'] # Target variable
# Step 4: Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Step 5: Train a Decision Tree Classifier
dt_classifier = DecisionTreeClassifier(criterion='entropy', random_state=42)
dt_classifier.fit(X_train, y_train)
# Step 6: Make predictions
y_pred = dt_classifier.predict(X_test)
# Step 7: Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy of the Decision Tree model: {accuracy * 100:.2f}%')
# Print the classification report
print('\nClassification Report:')
print(classification_report(y_test, y_pred))
# Step 8: Make a prediction for new data (e.g., sunny, mild temperature, high humidity, weak wind)
new_data = pd.DataFrame({
'Outlook': [label_encoders['Outlook'].transform(['Sunny'])[0]],
'Temperature': [label_encoders['Temperature'].transform(['Mild'])[0]],
'Humidity': [label_encoders['Humidity'].transform(['High'])[0]],
'Wind': [label_encoders['Wind'].transform(['Weak'])[0]]
})
# Predict whether to play tennis
prediction = dt_classifier.predict(new_data)
print(f'\nPrediction for new data (Sunny, Mild, High Humidity, Weak Wind): {"Play" if prediction[0] == 1 else "Don\'t Play"}')
Slip 9 :
Q.1. Implement Ridge Regression and Lasso regression model using boston_houses.csv and take only ‘RM’ and ‘Price’ of the houses. Divide the data as training and testing data. Fit line using Ridge regression and to find price of a house if it contains 5 rooms and compare results.
# Import necessary libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Ridge, Lasso
from sklearn.metrics import mean_squared_error
# Step 1: Load the Boston Housing dataset from sklearn
from sklearn.datasets import load_boston
# Load the dataset
boston = load_boston()
df = pd.DataFrame(boston.data, columns=boston.feature_names)
# Step 2: Select only the 'RM' (average number of rooms) and 'Price' (house price) columns
df = df[['RM']]
df['Price'] = boston.target
# Step 3: Split the data into training and testing sets
X = df[['RM']] # Features (number of rooms)
y = df['Price'] # Target (house price)
# Split the dataset into 80% training data and 20% testing data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Step 4: Train Ridge Regression model
ridge_regressor = Ridge(alpha=1.0) # Alpha is the regularization strength
ridge_regressor.fit(X_train, y_train)
# Step 5: Train Lasso Regression model
lasso_regressor = Lasso(alpha=0.1) # Alpha is the regularization strength
lasso_regressor.fit(X_train, y_train)
# Step 6: Predict house prices for both models
y_pred_ridge = ridge_regressor.predict(X_test)
y_pred_lasso = lasso_regressor.predict(X_test)
# Step 7: Compare the models' performance using Mean Squared Error (MSE)
mse_ridge = mean_squared_error(y_test, y_pred_ridge)
mse_lasso = mean_squared_error(y_test, y_pred_lasso)
# Print the MSE for both models
print(f'Mean Squared Error for Ridge Regression: {mse_ridge:.2f}')
print(f'Mean Squared Error for Lasso Regression: {mse_lasso:.2f}')
# Step 8: Predict the price of a house with 5 rooms using both models
rooms = 5
price_ridge = ridge_regressor.predict([[rooms]]) # Predict using Ridge model
price_lasso = lasso_regressor.predict([[rooms]]) # Predict using Lasso model
print(f'Predicted price for a house with {rooms} rooms using Ridge Regression: ${price_ridge[0]:.2f}')
print(f'Predicted price for a house with {rooms} rooms using Lasso Regression: ${price_lasso[0]:.2f}')
Q.2. Write a python program to implement Linear SVM using UniversalBank.csv [15 M]
# Import necessary libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
# Step 1: Load the dataset
# Replace 'UniversalBank.csv' with the actual path to the dataset
df = pd.read_csv('UniversalBank.csv')
# Step 2: Data Preprocessing
# Check for missing values
print(df.isnull().sum())
# Handling missing values if necessary (this is just an example)
# df = df.fillna(df.mean()) # Or any other imputation strategy
# Convert categorical variables to numerical (if required)
# Assuming 'Personal.Loan' is the target variable
# If there are categorical features, we may need to encode them (e.g. 'Gender' or 'Education')
df = pd.get_dummies(df, drop_first=True)
# Step 3: Define Features (X) and Target (y)
# Assuming 'Personal.Loan' is the target variable (binary classification)
X = df.drop('Personal.Loan', axis=1) # Features
y = df['Personal.Loan'] # Target variable (whether the person has taken a loan or not)
# Step 4: Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Step 5: Feature Scaling (important for SVM)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
# Step 6: Train Linear SVM model
svm_model = SVC(kernel='linear', random_state=42)
svm_model.fit(X_train_scaled, y_train)
# Step 7: Make Predictions
y_pred = svm_model.predict(X_test_scaled)
# Step 8: Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy * 100:.2f}%')
# Confusion Matrix
print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred))
# Classification Report (Precision, Recall, F1-Score)
print("Classification Report:")
print(classification_report(y_test, y_pred))
Slip 10 :
Q.1. Write a python program to transform data with Principal Component Analysis (PCA). Use iris dataset.
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import load_iris
# Step 1: Load the Iris dataset
iris = load_iris()
X = iris.data # Features (sepal length, sepal width, petal length, petal width)
y = iris.target # Target labels (Iris species)
# Step 2: Standardize the data (important for PCA)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# Step 3: Apply PCA
# We'll reduce the data to 2 components for visualization
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)
# Step 4: Visualize the PCA result
# Plot the transformed data
plt.figure(figsize=(8, 6))
plt.scatter(X_pca[:, 0], X_pca[:, 1], c=y, cmap='viridis', edgecolor='k', s=50)
plt.title('PCA of Iris Dataset')
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.colorbar(label='Target Class')
plt.show()
# Optionally, you can print the explained variance ratio of each component
print(f'Explained variance ratio for each principal component: {pca.explained_variance_ratio_}')
Slip 11 :
Q.1. Write a python program to implement Polynomial Regression for Boston Housing Dataset
# Import necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.metrics import mean_squared_error, r2_score
# Step 1: Load the Boston Housing Dataset
boston = load_boston()
X = boston.data # Features (e.g., crime rate, property tax, etc.)
y = boston.target # Target variable (house price)
# Step 2: Preprocess the data
# We can split the data into training and testing sets (80% training, 20% testing)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Step 3: Polynomial Feature Transformation
# We'll use PolynomialFeatures to create polynomial features from the original data
degree = 2 # You can experiment with different degrees (e.g., 3, 4)
poly = PolynomialFeatures(degree=degree)
# Transform the features to include polynomial terms
X_train_poly = poly.fit_transform(X_train)
X_test_poly = poly.transform(X_test)
# Step 4: Fit the Linear Regression Model
# Now we can apply linear regression to the polynomial features
model = LinearRegression()
model.fit(X_train_poly, y_train)
# Step 5: Evaluate the model
# Predict on the test data
y_pred = model.predict(X_test_poly)
# Calculate the mean squared error and R-squared score
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f'Mean Squared Error (MSE): {mse}')
print(f'R-Squared: {r2}')
# Step 6: Visualize the results (optional, works best with one feature)
# Since the dataset is multi-dimensional, we'll plot predictions vs actual values
plt.scatter(y_test, y_pred)
plt.xlabel('True Values (Prices)')
plt.ylabel('Predicted Values (Prices)')
plt.title('Polynomial Regression: Predicted vs Actual')
plt.show()
Slip 12 :
Q.1. Write a python program to implement k-nearest Neighbors ML algorithm to build prediction model (Use iris Dataset)
Steps:
- Load the Iris dataset from
sklearn.datasets
. - Preprocess the data: Split the data into training and testing sets.
- Train the k-NN model: Use the
KNeighborsClassifier
fromsklearn.neighbors
. - Make predictions and evaluate the model.
Python Code:
Slip 13 :
Q.1. Create RNN model and analyze the Google stock price dataset. Find out increasing or decreasing trends of stock price for the next day
Python Code Example:
Step 1: Install Required Libraries
Make sure you have the required libraries installed:
Step 2: Import Necessary Libraries
Step 3: Download Google Stock Price Data
Step 4: Preprocess the Data
We'll use the Closing Price of Google stock to predict the trends (increase or decrease) for the next day.
Step 5: Build the RNN Model
We will create an RNN using LSTM (Long Short-Term Memory) layers, which are good for sequential data like stock prices.
Step 6: Model Evaluation
After training the model, we'll evaluate it on the test data and check its accuracy.
Step 7: Make Predictions for the Next Day
Now, we can use the trained model to predict the trend (increase or decrease) for the next day.
Step 8: Visualize the Results
You can plot the stock prices and predictions for a better understanding of the model’s performance.
Slip 14 :
Q.1. Create a CNN model and train it on mnist handwritten digit dataset. Using model find out the digit written by a hand in a given image. Import mnist dataset from tensorflow.keras.datasets
- Import necessary libraries.
- Load the MNIST dataset from
tensorflow.keras.datasets
. - Preprocess the data: Normalize the images and reshape them for the CNN model.
- Build the CNN model: Define the architecture of the CNN.
- Compile and train the model.
- Evaluate the model on the test dataset.
- Use the trained model to predict digits in new images.
Below is the Python code to accomplish this task using TensorFlow/Keras:
Step 1: Install Necessary Libraries
If you don't have TensorFlow installed, you can install it using:
Step 2: Python Program for CNN on MNIST Dataset
Q.2. Write a python program to find all null values in a given dataset and remove them. Create your own dataset.
import pandas as pd
import numpy as np
# Step 1: Create a sample dataset (DataFrame)
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David', np.nan],
'Age': [25, 30, np.nan, 22, 23],
'City': ['New York', 'Los Angeles', 'Chicago', np.nan, 'Houston'],
'Salary': [50000, 60000, 55000, 45000, np.nan]
}
# Create a DataFrame
df = pd.DataFrame(data)
# Step 2: Display the original dataset
print("Original Dataset:")
print(df)
# Step 3: Identify null values
print("\nNull Values in the Dataset:")
print(df.isnull())
# Step 4: Remove rows with any null values
df_cleaned = df.dropna()
# Step 5: Display the cleaned dataset
print("\nDataset after removing rows with null values:")
print(df_cleaned)
Slip 15 :
Q.1. Create an ANN and train it on house price dataset classify the house price is above average or below average
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import load_boston
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.utils import to_categorical
# Step 1: Load the dataset (Boston Housing dataset)
boston = load_boston()
X = boston.data # Features
y = boston.target # Target variable (house prices)
# Step 2: Calculate the average house price
average_price = np.mean(y)
# Step 3: Convert house prices to binary classification (Above average = 1, Below average = 0)
y_class = np.where(y > average_price, 1, 0)
# Step 4: Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y_class, test_size=0.2, random_state=42)
# Step 5: Normalize the features using StandardScaler
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
# Step 6: Define the ANN model
model = Sequential()
model.add(Dense(64, input_dim=X_train.shape[1], activation='relu')) # First hidden layer
model.add(Dense(32, activation='relu')) # Second hidden layer
model.add(Dense(1, activation='sigmoid')) # Output layer (binary classification)
# Step 7: Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
# Step 8: Train the model
model.fit(X_train, y_train, epochs=100, batch_size=32, validation_split=0.2)
# Step 9: Evaluate the model
loss, accuracy = model.evaluate(X_test, y_test)
print(f"Test Accuracy: {accuracy*100:.2f}%")
# Step 10: Predict the class (above or below average) on test set
predictions = model.predict(X_test)
predictions = (predictions > 0.5).astype(int) # Convert probabilities to binary class (0 or 1)
# Print first 10 predictions
print("Predictions for the first 10 houses:")
print(predictions[:10].flatten())
# Optionally: You can print the actual test labels for comparison
print("Actual labels for the first 10 houses:")
print(y_test[:10].values)
Slip 16 :
Q.1. Create a two layered neural network with relu and sigmoid activation function. [15 M]
# Import necessary libraries
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import Adam
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_classification
from sklearn.preprocessing import StandardScaler
# Step 1: Create a simple binary classification dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=10, n_classes=2, random_state=42)
# Step 2: Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Step 3: Standardize the data
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
# Step 4: Build the neural network model
model = Sequential()
# First layer (Hidden Layer): Using ReLU activation
model.add(Dense(64, input_dim=X_train.shape[1], activation='relu'))
# Second layer (Output Layer): Using Sigmoid activation for binary classification
model.add(Dense(1, activation='sigmoid'))
# Step 5: Compile the model
model.compile(loss='binary_crossentropy', # For binary classification
optimizer=Adam(learning_rate=0.001), # Optimizer with learning rate
metrics=['accuracy'])
# Step 6: Train the model
history = model.fit(X_train, y_train, epochs=50, batch_size=32, validation_data=(X_test, y_test))
# Step 7: Evaluate the model
loss, accuracy = model.evaluate(X_test, y_test)
print(f"Test Loss: {loss}")
print(f"Test Accuracy: {accuracy}")
# Step 8: Make predictions (Example)
predictions = model.predict(X_test[:5])
print("Predictions for the first 5 samples:", predictions)
Slip 17 :
Q.1. Implement Ensemble ML algorithm on Pima Indians Diabetes Database with bagging (random forest), boosting, voting and Stacking methods and display analysis accordingly. Compare result
# Import necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier, VotingClassifier, StackingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
# Step 1: Load the Pima Indians Diabetes Dataset
url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv"
columns = ['Pregnancies', 'Glucose', 'BloodPressure', 'SkinThickness', 'Insulin', 'BMI', 'DiabetesPedigreeFunction', 'Age', 'Outcome']
data = pd.read_csv(url, names=columns)
# Step 2: Split the data into features and target variable
X = data.drop('Outcome', axis=1)
y = data['Outcome']
# Step 3: Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Step 4: Standardize the features (important for some models like SVM)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
# Step 5: Bagging - Random Forest Classifier
rf = RandomForestClassifier(n_estimators=100, random_state=42)
rf.fit(X_train, y_train)
rf_pred = rf.predict(X_test)
rf_accuracy = accuracy_score(y_test, rf_pred)
# Step 6: Boosting - AdaBoost Classifier
ab = AdaBoostClassifier(n_estimators=100, random_state=42)
ab.fit(X_train, y_train)
ab_pred = ab.predict(X_test)
ab_accuracy = accuracy_score(y_test, ab_pred)
# Step 7: Voting - Hard Voting Classifier
voting_clf = VotingClassifier(estimators=[('rf', rf), ('ab', ab)], voting='hard')
voting_clf.fit(X_train, y_train)
voting_pred = voting_clf.predict(X_test)
voting_accuracy = accuracy_score(y_test, voting_pred)
# Step 8: Stacking - Stacking Classifier
estimators = [('rf', rf), ('ab', ab), ('knn', KNeighborsClassifier())]
stacking_clf = StackingClassifier(estimators=estimators, final_estimator=LogisticRegression())
stacking_clf.fit(X_train, y_train)
stacking_pred = stacking_clf.predict(X_test)
stacking_accuracy = accuracy_score(y_test, stacking_pred)
# Step 9: Display Results
print(f"Random Forest Accuracy: {rf_accuracy:.4f}")
print(f"AdaBoost Accuracy: {ab_accuracy:.4f}")
print(f"Voting Classifier Accuracy: {voting_accuracy:.4f}")
print(f"Stacking Classifier Accuracy: {stacking_accuracy:.4f}")
# Step 10: Visualization of Comparison
methods = ['Random Forest', 'AdaBoost', 'Voting', 'Stacking']
accuracies = [rf_accuracy, ab_accuracy, voting_accuracy, stacking_accuracy]
plt.figure(figsize=(10, 6))
plt.barh(methods, accuracies, color='skyblue')
plt.xlabel('Accuracy')
plt.title('Comparison of Ensemble Methods on Pima Indians Diabetes Dataset')
plt.show()
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.preprocessing import StandardScaler
# Step 1: Load the Dataset (Example Dataset - Replace with your own dataset)
# Assuming the dataset has columns 'Size', 'Bedrooms', 'Age', and 'Price'
# Here, 'Price' is the target variable.
url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.csv'
column_names = ['Size', 'Bedrooms', 'Age', 'Price']
data = pd.read_csv(url, names=column_names)
# Step 2: Preprocess Data
# Check for missing values
print("Missing Values:\n", data.isnull().sum())
# Split the data into features (X) and target (y)
X = data[['Size', 'Bedrooms', 'Age']] # Features (independent variables)
y = data['Price'] # Target variable (dependent variable)
# Step 3: Split Data into Training and Test Sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Step 4: Feature Scaling (if necessary)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
# Step 5: Create and Train the Multiple Linear Regression Model
model = LinearRegression()
model.fit(X_train_scaled, y_train)
# Step 6: Make Predictions
y_pred = model.predict(X_test_scaled)
# Step 7: Evaluate the Model
mse = mean_squared_error(y_test, y_pred)
mae = mean_absolute_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print("Mean Squared Error (MSE):", mse)
print("Mean Absolute Error (MAE):", mae)
print("R-squared (R²):", r2)
# Step 8: Visualizing the predictions vs actual prices
plt.scatter(y_test, y_pred)
plt.plot([min(y_test), max(y_test)], [min(y_test), max(y_test)], color='red') # Line of perfect fit
plt.xlabel('Actual Prices')
plt.ylabel('Predicted Prices')
plt.title('Actual vs Predicted Prices')
plt.show()
Slip 18 :
Q.1. Write a python program to implement k-means algorithm on a Diabetes dataset.
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import load_diabetes
# Step 1: Load the Diabetes dataset
# For this example, we're using the dataset available from sklearn datasets
diabetes_data = load_diabetes()
X = diabetes_data.data # Features (independent variables)
y = diabetes_data.target # Target (dependent variable)
# Step 2: Preprocess the Data
# We will scale the features for better clustering performance using StandardScaler
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# Step 3: Apply K-Means Clustering
# We will try clustering into 3 clusters (this can be adjusted based on the dataset)
kmeans = KMeans(n_clusters=3, random_state=42)
kmeans.fit(X_scaled)
# Step 4: Evaluate the Clusters
# Get the cluster labels and centers
labels = kmeans.labels_
centers = kmeans.cluster_centers_
# Step 5: Visualize the Clusters
# We'll reduce the dimensions to 2D for easy visualization using PCA (Principal Component Analysis)
from sklearn.decomposition import PCA
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)
# Create a scatter plot of the clustered data
plt.figure(figsize=(8, 6))
plt.scatter(X_pca[:, 0], X_pca[:, 1], c=labels, cmap='viridis', s=50)
plt.scatter(centers[:, 0], centers[:, 1], c='red', s=200, alpha=0.75, marker='x')
plt.title('K-Means Clustering of Diabetes Dataset')
plt.xlabel('PCA Component 1')
plt.ylabel('PCA Component 2')
plt.show()
# Display the cluster centers (means of the features)
print("Cluster Centers:\n", centers)
Q.2. Write a python program to implement Polynomial Linear Regression for salary_positions dataset.
# Import necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
# Step 1: Load the Salary Positions Dataset (Example dataset)
# Here, we're creating a sample dataset for illustration
data = {
'Position Level': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
'Salary': [45000, 50000, 60000, 75000, 90000, 110000, 130000, 150000, 180000, 200000, 220000, 250000]
}
df = pd.DataFrame(data)
# Step 2: Preprocess the Data
X = df['Position Level'].values.reshape(-1, 1) # Independent variable
y = df['Salary'].values # Dependent variable
# Step 3: Create Polynomial Features
poly = PolynomialFeatures(degree=4) # Creating 4th degree polynomial features
X_poly = poly.fit_transform(X)
# Step 4: Fit the Polynomial Regression Model
lin_reg = LinearRegression()
lin_reg.fit(X_poly, y)
# Step 5: Visualize the Polynomial Regression Curve
# Plotting original data points
plt.scatter(X, y, color='blue')
# Plotting the polynomial regression line
X_grid = np.arange(min(X), max(X), 0.1) # Creating a smooth curve
X_grid = X_grid.reshape((len(X_grid), 1))
plt.plot(X_grid, lin_reg.predict(poly.transform(X_grid)), color='red')
plt.title('Polynomial Linear Regression (Salary vs Position Level)')
plt.xlabel('Position Level')
plt.ylabel('Salary')
plt.show()
# Step 6: Predict salaries for Level 11 and Level 12
level_11 = np.array([[11]])
level_12 = np.array([[12]])
salary_11 = lin_reg.predict(poly.transform(level_11))
salary_12 = lin_reg.predict(poly.transform(level_12))
print(f"Predicted Salary for Level 11: {salary_11[0]}")
print(f"Predicted Salary for Level 12: {salary_12[0]}")
# Step 7: Calculate Mean Squared Error (MSE) for evaluation
y_pred = lin_reg.predict(X_poly)
mse = mean_squared_error(y, y_pred)
print(f"Mean Squared Error: {mse}")
Slip 19 :
Q.1. Fit the simple linear regression and polynomial linear regression models to Salary_positions.csv data. Find which one is more accurately fitting to the given data. Also predict the salaries of level 11 and level 12 employees
# Import necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.metrics import mean_squared_error
# Step 1: Load the Salary Positions Dataset
df = pd.read_csv('Salary_positions.csv') # Make sure the CSV file is in the correct directory
# Step 2: Preprocess the Data
X = df['Position Level'].values.reshape(-1, 1) # Independent variable (Position Level)
y = df['Salary'].values # Dependent variable (Salary)
# Step 3: Simple Linear Regression
simple_linear_reg = LinearRegression()
simple_linear_reg.fit(X, y)
# Step 4: Polynomial Linear Regression
poly = PolynomialFeatures(degree=4) # 4th degree polynomial features
X_poly = poly.fit_transform(X)
poly_linear_reg = LinearRegression()
poly_linear_reg.fit(X_poly, y)
# Step 5: Evaluate the Models using Mean Squared Error (MSE)
y_pred_simple = simple_linear_reg.predict(X)
y_pred_poly = poly_linear_reg.predict(X_poly)
mse_simple = mean_squared_error(y, y_pred_simple)
mse_poly = mean_squared_error(y, y_pred_poly)
print(f"Mean Squared Error for Simple Linear Regression: {mse_simple}")
print(f"Mean Squared Error for Polynomial Linear Regression: {mse_poly}")
# Step 6: Predict salaries for Level 11 and Level 12 using both models
level_11 = np.array([[11]])
level_12 = np.array([[12]])
# Simple Linear Regression Predictions
salary_11_simple = simple_linear_reg.predict(level_11)
salary_12_simple = simple_linear_reg.predict(level_12)
# Polynomial Linear Regression Predictions
salary_11_poly = poly_linear_reg.predict(poly.transform(level_11))
salary_12_poly = poly_linear_reg.predict(poly.transform(level_12))
print(f"Predicted Salary for Level 11 (Simple Linear Regression): {salary_11_simple[0]}")
print(f"Predicted Salary for Level 12 (Simple Linear Regression): {salary_12_simple[0]}")
print(f"Predicted Salary for Level 11 (Polynomial Linear Regression): {salary_11_poly[0]}")
print(f"Predicted Salary for Level 12 (Polynomial Linear Regression): {salary_12_poly[0]}")
# Step 7: Visualize the Results
# Plotting Simple Linear Regression results
plt.scatter(X, y, color='blue')
plt.plot(X, y_pred_simple, color='red')
plt.title('Simple Linear Regression')
plt.xlabel('Position Level')
plt.ylabel('Salary')
plt.show()
# Plotting Polynomial Linear Regression results
plt.scatter(X, y, color='blue')
X_grid = np.arange(min(X), max(X), 0.1) # To create a smooth curve
X_grid = X_grid.reshape((len(X_grid), 1))
plt.plot(X_grid, poly_linear_reg.predict(poly.transform(X_grid)), color='red')
plt.title('Polynomial Linear Regression')
plt.xlabel('Position Level')
plt.ylabel('Salary')
plt.show()
Slip 20 :
Q.1. Implement Ridge Regression, Lasso regression model using boston_houses.csv and take only ‘RM’ and ‘Price’ of the houses. divide the data as training and testing data. Fit line using Ridge regression and to find price of a house if it contains 5 rooms. and compare results
# Import necessary libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Ridge, Lasso
from sklearn.metrics import mean_squared_error
# Step 1: Load the Dataset
# Assuming the dataset is in CSV format and located in the current directory
df = pd.read_csv('boston_houses.csv') # Replace with your actual dataset path
# Step 2: Select Features
df = df[['RM', 'Price']] # Selecting only 'RM' (rooms) and 'Price' (house price)
# Step 3: Preprocess the Data
# Split the data into features (X) and target (y)
X = df[['RM']] # 'RM' represents the number of rooms
y = df['Price'] # 'Price' represents the house price
# Split the data into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Step 4: Apply Ridge and Lasso Regression
# Ridge Regression
ridge = Ridge(alpha=1.0) # You can tune alpha for regularization strength
ridge.fit(X_train, y_train)
# Lasso Regression
lasso = Lasso(alpha=0.1) # You can tune alpha for regularization strength
lasso.fit(X_train, y_train)
# Step 5: Predict House Price for 5 Rooms
rooms = np.array([[5]]) # Predict for a house with 5 rooms
ridge_pred = ridge.predict(rooms)
lasso_pred = lasso.predict(rooms)
# Step 6: Compare Results
print(f"Ridge Regression Prediction for 5 rooms: ${ridge_pred[0]:.2f}")
print(f"Lasso Regression Prediction for 5 rooms: ${lasso_pred[0]:.2f}")
# Optional: Evaluate the models on test data
ridge_test_pred = ridge.predict(X_test)
lasso_test_pred = lasso.predict(X_test)
ridge_mse = mean_squared_error(y_test, ridge_test_pred)
lasso_mse = mean_squared_error(y_test, lasso_test_pred)
print(f"Ridge Regression MSE on Test Data: {ridge_mse:.2f}")
print(f"Lasso Regression MSE on Test Data: {lasso_mse:.2f}")
Q.2. Write a python program to implement Decision Tree whether or not to play Tennis.
# Import necessary libraries
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import LabelEncoder
# Step 1: Create the dataset
data = {
'Outlook': ['Sunny', 'Sunny', 'Overcast', 'Rain', 'Rain', 'Rain', 'Overcast', 'Sunny', 'Sunny', 'Rain', 'Sunny', 'Overcast', 'Overcast', 'Rain'],
'Temperature': ['Hot', 'Hot', 'Hot', 'Mild', 'Cool', 'Cool', 'Cool', 'Mild', 'Mild', 'Mild', 'Hot', 'Mild', 'Mild', 'Mild'],
'Humidity': ['High', 'High', 'High', 'High', 'Normal', 'Normal', 'Normal', 'High', 'Normal', 'Normal', 'High', 'Normal', 'Normal', 'High'],
'Wind': ['Weak', 'Strong', 'Weak', 'Weak', 'Weak', 'Strong', 'Strong', 'Weak', 'Weak', 'Weak', 'Weak', 'Weak', 'Strong', 'Strong'],
'PlayTennis': ['No', 'No', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'No', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes', 'No']
}
# Step 2: Convert to DataFrame
df = pd.DataFrame(data)
# Step 3: Encode categorical variables using LabelEncoder
encoder = LabelEncoder()
df['Outlook'] = encoder.fit_transform(df['Outlook'])
df['Temperature'] = encoder.fit_transform(df['Temperature'])
df['Humidity'] = encoder.fit_transform(df['Humidity'])
df['Wind'] = encoder.fit_transform(df['Wind'])
df['PlayTennis'] = encoder.fit_transform(df['PlayTennis']) # Target variable
# Step 4: Split the data into features (X) and target (y)
X = df.drop('PlayTennis', axis=1) # Features
y = df['PlayTennis'] # Target
# Step 5: Split the data into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Step 6: Train the Decision Tree model
dtree = DecisionTreeClassifier()
dtree.fit(X_train, y_train)
# Step 7: Predict using the trained model
y_pred = dtree.predict(X_test)
# Step 8: Evaluate the model accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy of Decision Tree Model: {accuracy * 100:.2f}%')
# Step 9: Print the Decision Tree rules
from sklearn.tree import export_text
tree_rules = export_text(dtree, feature_names=list(X.columns))
print("\nDecision Tree Rules:\n")
print(tree_rules)
Slip 21 :
Q.1. Create a multiple linear regression model for house price dataset divide dataset into train and test data while giving it to model and predict prices of house.
# Import necessary libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
# Step 1: Load the dataset
# You can replace this with your own dataset
# For this example, let's assume we're working with a dataset named 'house_prices.csv'
df = pd.read_csv('house_prices.csv')
# Step 2: Preprocess the data
# Assuming the dataset has columns like 'Size', 'Bedrooms', 'Age', 'Price'
# Replace missing values or handle categorical variables if necessary
df = df.dropna() # Remove rows with missing values
# Step 3: Split the data into features (X) and target (y)
X = df[['Size', 'Bedrooms', 'Age']] # Features
y = df['Price'] # Target
# Step 4: Split the data into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Step 5: Train the Multiple Linear Regression model
model = LinearRegression()
model.fit(X_train, y_train)
# Step 6: Make predictions using the trained model
y_pred = model.predict(X_test)
# Step 7: Evaluate the model's performance
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f"Mean Squared Error: {mse}")
print(f"R² Score: {r2}")
# Step 8: Predict prices for a new house (example input)
new_house = pd.DataFrame({'Size': [2500], 'Bedrooms': [4], 'Age': [10]})
predicted_price = model.predict(new_house)
print(f"Predicted Price for the new house: ${predicted_price[0]:,.2f}")
Slip 22 :
Q.1. Write a python program to implement simple Linear Regression for predicting house price.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
# Step 1: Load the dataset
# For this example, let's assume we have a dataset 'house_prices.csv'
# The dataset contains two columns: 'Size' (in square feet) and 'Price' (in dollars)
df = pd.read_csv('house_prices.csv')
# Step 2: Preprocess the data
# Check for missing values
print(df.isnull().sum())
# Drop any rows with missing values (if needed)
df = df.dropna()
# Features and target variable
X = df[['Size']] # Feature (e.g., Size of the house in square feet)
y = df['Price'] # Target variable (Price of the house)
# Step 3: Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Step 4: Create and train the linear regression model
model = LinearRegression()
model.fit(X_train, y_train)
# Step 5: Make predictions on the test data
y_pred = model.predict(X_test)
# Step 6: Evaluate the model
# Calculate Mean Squared Error (MSE)
mse = mean_squared_error(y_test, y_pred)
# Calculate R-squared value
r2 = r2_score(y_test, y_pred)
# Displaying the evaluation metrics
print(f"Mean Squared Error: {mse}")
print(f"R-squared: {r2}")
# Step 7: Visualize the results
plt.scatter(X_test, y_test, color='blue', label='Actual Prices')
plt.plot(X_test, y_pred, color='red', label='Regression Line')
plt.xlabel('Size of House (in sq ft)')
plt.ylabel('Price of House (in dollars)')
plt.title('Simple Linear Regression for House Price Prediction')
plt.legend()
plt.show()
# Example: Predict the price of a house with 1500 sq ft size
predicted_price = model.predict([[1500]])
print(f"The predicted price for a 1500 sq ft house is ${predicted_price[0]:,.2f}")
Slip 23 :
Q.1. Fit the simple linear regression and polynomial linear regression models to Salary_positions.csv data. Find which one is more accurately fitting to the given data. Also predict the salaries of level 11 and level 12 employees.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
# Step 1: Load the dataset
# Assuming 'Salary_positions.csv' has columns 'Level' and 'Salary'
data = pd.read_csv('Salary_positions.csv')
# Step 2: Explore the data
print(data.head())
X = data['Level'].values.reshape(-1, 1) # Feature: Level
y = data['Salary'].values # Target: Salary
# Step 3: Fit Simple Linear Regression model
lin_reg = LinearRegression()
lin_reg.fit(X, y)
# Step 4: Fit Polynomial Regression model (degree 2 or 3)
poly = PolynomialFeatures(degree=2)
X_poly = poly.fit_transform(X)
poly_reg = LinearRegression()
poly_reg.fit(X_poly, y)
# Step 5: Compare models
# Make predictions using both models
y_pred_lin = lin_reg.predict(X)
y_pred_poly = poly_reg.predict(X_poly)
# Calculate RMSE for both models
rmse_lin = np.sqrt(mean_squared_error(y, y_pred_lin))
rmse_poly = np.sqrt(mean_squared_error(y, y_pred_poly))
print(f"RMSE for Simple Linear Regression: {rmse_lin}")
print(f"RMSE for Polynomial Linear Regression: {rmse_poly}")
# Step 6: Predict Salaries for Level 11 and Level 12 employees
level_11 = np.array([[11]])
level_12 = np.array([[12]])
salary_pred_lin_11 = lin_reg.predict(level_11)
salary_pred_lin_12 = lin_reg.predict(level_12)
salary_pred_poly_11 = poly_reg.predict(poly.transform(level_11))
salary_pred_poly_12 = poly_reg.predict(poly.transform(level_12))
print(f"Predicted Salary for Level 11 (Linear): {salary_pred_lin_11}")
print(f"Predicted Salary for Level 12 (Linear): {salary_pred_lin_12}")
print(f"Predicted Salary for Level 11 (Polynomial): {salary_pred_poly_11}")
print(f"Predicted Salary for Level 12 (Polynomial): {salary_pred_poly_12}")
# Step 7: Plot the results
plt.scatter(X, y, color='red')
plt.plot(X, y_pred_lin, label='Linear Regression', color='blue')
plt.plot(X, y_pred_poly, label='Polynomial Regression (degree=2)', color='green')
plt.xlabel('Level')
plt.ylabel('Salary')
plt.title('Linear vs Polynomial Regression')
plt.legend()
plt.show()
Slip 24 :
Q.1. Write a python program to Implement Decision Tree classifier model on Data which is extracted from images that were taken from genuine and forged banknote-like specimens. (refer UCI dataset https://archive.ics.uci.edu/dataset/267/banknote+authentication)
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, classification_report
from sklearn.preprocessing import StandardScaler
# Step 1: Load the dataset
# URL of the dataset from UCI repository (or local file path)
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/00267/data_banknote_authentication.csv"
column_names = ['variance', 'skewness', 'curtosis', 'entropy', 'class']
# Load the dataset into a pandas DataFrame
data = pd.read_csv(url, names=column_names)
# Step 2: Preprocess the data
# Checking for null values
print("Checking for null values:")
print(data.isnull().sum()) # Should be zero for all columns
# Split the data into features (X) and target (y)
X = data.drop('class', axis=1) # Features (all columns except 'class')
y = data['class'] # Target (the 'class' column)
# Split the data into training and testing sets (80% training, 20% testing)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Step 3: Standardize the features (optional but recommended for tree models)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
# Step 4: Train the Decision Tree Classifier model
dt_classifier = DecisionTreeClassifier(random_state=42)
dt_classifier.fit(X_train, y_train)
# Step 5: Make predictions
y_pred = dt_classifier.predict(X_test)
# Step 6: Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f"\nAccuracy of Decision Tree Classifier: {accuracy * 100:.2f}%")
# Classification report for more detailed metrics
print("\nClassification Report:")
print(classification_report(y_test, y_pred))
Q.2. Write a python program to implement linear SVM using UniversalBank.csv. [15 M]
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, classification_report
# Step 1: Load the dataset
# Assuming 'UniversalBank.csv' is located in the same directory
url = 'UniversalBank.csv' # Replace with the actual file path or URL
data = pd.read_csv(url)
# Step 2: Preprocess the data
# Check the first few rows of the dataset
print("First few rows of the dataset:")
print(data.head())
# Checking for null values
print("\nChecking for null values:")
print(data.isnull().sum())
# Dropping any rows with missing values (if any)
data = data.dropna()
# Assume the target variable is 'PersonalLoan' and the rest are features
X = data.drop(columns=['PersonalLoan']) # Features
y = data['PersonalLoan'] # Target variable (whether the customer took a loan)
# Encode categorical variables (if any)
# For example, if you have 'education' column or 'zip code', convert them to numeric
X = pd.get_dummies(X, drop_first=True) # Convert categorical features to numerical if necessary
# Step 3: Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Step 4: Standardize the data (Scaling the features)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
# Step 5: Train the Linear SVM model
svm_model = SVC(kernel='linear', random_state=42) # Using linear kernel
svm_model.fit(X_train, y_train)
# Step 6: Make predictions
y_pred = svm_model.predict(X_test)
# Step 7: Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f"\nAccuracy of Linear SVM: {accuracy * 100:.2f}%")
# Classification report for more detailed metrics
print("\nClassification Report:")
print(classification_report(y_test, y_pred))
Slip 25 :
Q.1. Write a python program to implement Polynomial Regression for house price dataset.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.metrics import mean_squared_error, r2_score
# Step 1: Load the dataset
# Assuming the dataset has 'SquareFeet' and 'Price' columns (change based on actual dataset)
url = 'house_price_dataset.csv' # Replace with your actual dataset path
data = pd.read_csv(url)
# Step 2: Preprocess the Data
print("First few rows of the dataset:")
print(data.head())
# Assuming 'SquareFeet' is the feature and 'Price' is the target variable
X = data['SquareFeet'].values.reshape(-1, 1) # Reshaping to make it a 2D array for the model
y = data['Price'].values
# Step 3: Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Step 4: Polynomial Feature Transformation (degree=4, you can change it)
poly = PolynomialFeatures(degree=4)
X_poly_train = poly.fit_transform(X_train)
X_poly_test = poly.transform(X_test)
# Step 5: Fit the Polynomial Regression Model (Linear Regression on transformed features)
model = LinearRegression()
model.fit(X_poly_train, y_train)
# Step 6: Predict house prices
y_pred_train = model.predict(X_poly_train)
y_pred_test = model.predict(X_poly_test)
# Step 7: Evaluate the Model
print("\nTrain Mean Squared Error:", mean_squared_error(y_train, y_pred_train))
print("Test Mean Squared Error:", mean_squared_error(y_test, y_pred_test))
print("\nTrain R2 Score:", r2_score(y_train, y_pred_train))
print("Test R2 Score:", r2_score(y_test, y_pred_test))
# Step 8: Visualize the Polynomial Regression results
# Plotting the training data and model prediction
plt.scatter(X_train, y_train, color='blue', label='Training Data')
plt.plot(X_train, y_pred_train, color='red', label='Polynomial Regression Line (train)')
# Plotting the testing data and model prediction
plt.scatter(X_test, y_test, color='green', label='Test Data')
plt.plot(X_test, y_pred_test, color='orange', label='Polynomial Regression Line (test)')
plt.title('Polynomial Regression for House Price Prediction')
plt.xlabel('Square Feet')
plt.ylabel('Price')
plt.legend()
plt.show()
Q.2. Create a two layered neural network with relu and sigmoid activation function. [15 M]
# Import necessary libraries
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import Adam
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_classification
from sklearn.preprocessing import StandardScaler
# Step 1: Generate a synthetic binary classification dataset
X, y = make_classification(n_samples=1000, n_features=20, n_classes=2, random_state=42)
# Step 2: Scale the data (important for neural networks)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# Step 3: Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)
# Step 4: Define the model
model = Sequential()
# First hidden layer with ReLU activation function
model.add(Dense(units=64, input_dim=X_train.shape[1], activation='relu'))
# Output layer with Sigmoid activation function for binary classification
model.add(Dense(units=1, activation='sigmoid'))
# Step 5: Compile the model
model.compile(optimizer=Adam(), loss='binary_crossentropy', metrics=['accuracy'])
# Step 6: Train the model
history = model.fit(X_train, y_train, epochs=20, batch_size=32, validation_data=(X_test, y_test))
# Step 7: Evaluate the model on the test set
test_loss, test_accuracy = model.evaluate(X_test, y_test)
# Output results
print(f'Test Loss: {test_loss}')
print(f'Test Accuracy: {test_accuracy}')
# Step 8: Make predictions (optional)
y_pred = model.predict(X_test)
y_pred = (y_pred > 0.5) # Convert probabilities to binary (0 or 1)
Slip 26 :
Q.1. Create KNN model on Indian diabetes patient’s database and predict whether a new patient is diabetic (1) or not (0). Find optimal value of K.
# Import necessary libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import classification_report, confusion_matrix
import matplotlib.pyplot as plt
# Step 1: Load the dataset (use your local dataset or the following URL for Indian Diabetes dataset)
url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv'
columns = ['Pregnancies', 'Glucose', 'BloodPressure', 'SkinThickness', 'Insulin',
'BMI', 'DiabetesPedigreeFunction', 'Age', 'Outcome']
# Load dataset into a pandas dataframe
data = pd.read_csv(url, header=None, names=columns)
# Step 2: Preprocess the data
# Handle missing values (replace zeros with NaN where appropriate, then fill them)
data.replace(0, np.nan, inplace=True)
data.fillna(data.mean(), inplace=True)
# Step 3: Split the data into features (X) and target (y)
X = data.drop('Outcome', axis=1)
y = data['Outcome']
# Step 4: Standardize the features (important for KNN)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# Step 5: Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)
# Step 6: Train KNN model and evaluate the performance for different values of K
# Function to find the optimal K
def optimal_k(X_train, X_test, y_train, y_test):
accuracies = []
for k in range(1, 21): # Test for K values from 1 to 20
knn = KNeighborsClassifier(n_neighbors=k)
knn.fit(X_train, y_train)
accuracy = knn.score(X_test, y_test)
accuracies.append(accuracy)
# Plotting K vs accuracy
plt.plot(range(1, 21), accuracies, marker='o')
plt.xlabel('Value of K')
plt.ylabel('Accuracy')
plt.title('Accuracy vs K')
plt.show()
# Return the optimal K
optimal_k = accuracies.index(max(accuracies)) + 1
return optimal_k, max(accuracies)
# Find the optimal value of K
optimal_k_value, max_accuracy = optimal_k(X_train, X_test, y_train, y_test)
print(f"Optimal value of K: {optimal_k_value} with accuracy: {max_accuracy}")
# Step 7: Train the KNN model with the optimal K and evaluate it
knn_optimal = KNeighborsClassifier(n_neighbors=optimal_k_value)
knn_optimal.fit(X_train, y_train)
# Evaluate on the test data
y_pred = knn_optimal.predict(X_test)
print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred))
print("\nClassification Report:")
print(classification_report(y_test, y_pred))
Q.2. Use Apriori algorithm on groceries dataset to find which items are brought together. Use minimum support =0.25
# Import necessary libraries
import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules
# Step 1: Load the dataset
# For this example, you can use an example groceries dataset or replace with your dataset
# You can load the dataset in the following way:
# groceries_df = pd.read_csv("groceries.csv", header=None)
# For demonstration, we will use a sample dataset.
# Sample Dataframe (for illustration purposes)
data = {'TransactionID': [1, 2, 3, 4, 5, 6],
'Items': [['Milk', 'Eggs', 'Bread'],
['Milk', 'Diaper', 'Beer', 'Eggs'],
['Bread', 'Milk', 'Diaper', 'Beer'],
['Milk', 'Eggs', 'Bread', 'Diaper'],
['Milk', 'Bread', 'Diaper', 'Beer'],
['Eggs', 'Bread', 'Beer']]}
# Convert to a dataframe
groceries_df = pd.DataFrame(data)
# Step 2: Preprocess the data into one-hot encoded format
# Convert the data to a format suitable for Apriori (a list of lists for each transaction)
# Create a list of all unique items in the transactions
all_items = list(set([item for sublist in groceries_df['Items'] for item in sublist]))
# Create an empty DataFrame with items as columns
basket = pd.DataFrame(0, index=groceries_df['TransactionID'], columns=all_items)
# Fill in the DataFrame
for idx, row in groceries_df.iterrows():
for item in row['Items']:
basket.at[idx, item] = 1
# Step 3: Apply the Apriori algorithm to find frequent itemsets
# Minimum support of 0.25 means that we are looking for itemsets that appear in at least 25% of the transactions
frequent_itemsets = apriori(basket, min_support=0.25, use_colnames=True)
# Step 4: Generate the association rules from frequent itemsets
# We use lift > 1 to get meaningful association rules
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1)
# Step 5: Display the results
print("Frequent Itemsets:")
print(frequent_itemsets)
print("\nAssociation Rules:")
print(rules)
Slip 27 :
Q.1. Create a multiple linear regression model for house price dataset divide dataset into train and test data while giving it to model and predict prices of house
# Import necessary libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
# Step 1: Load the dataset (You can replace this with your actual dataset)
# For illustration, we will use a sample dataset.
# Example: 'House Price Dataset' with features like Area, Rooms, and other factors
# Replace this with your actual dataset file, such as 'house_prices.csv'
# For illustration, creating a sample dataset
data = {
'Area': [1500, 1800, 2400, 3000, 3500, 4000],
'Rooms': [3, 4, 4, 5, 5, 6],
'Age': [10, 15, 20, 25, 30, 35],
'Price': [400000, 450000, 600000, 650000, 700000, 750000] # Target variable (Price)
}
# Convert to pandas DataFrame
df = pd.DataFrame(data)
# Step 2: Preprocess the data
# We will separate features (independent variables) and target (dependent variable)
X = df[['Area', 'Rooms', 'Age']] # Independent variables
y = df['Price'] # Dependent variable (house price)
# Step 3: Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Step 4: Create a Linear Regression model
model = LinearRegression()
# Step 5: Train the model on the training data
model.fit(X_train, y_train)
# Step 6: Make predictions on the test data
y_pred = model.predict(X_test)
# Step 7: Evaluate the model's performance
# Calculate Mean Squared Error and R-squared (R2)
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
# Print results
print(f"Mean Squared Error (MSE): {mse}")
print(f"R-squared (R2): {r2}")
# Step 8: Predict prices of houses (Example: predicting for new data)
# For a new house with 2500 sqft, 4 rooms, and 15 years old:
new_house_data = np.array([[2500, 4, 15]]) # New data (Area, Rooms, Age)
predicted_price = model.predict(new_house_data)
print(f"Predicted Price for the new house: ${predicted_price[0]:,.2f}")
Slip 28 :
Q.1. Write a python program to categorize the given news text into one of the available 20 categories of news groups, using multinomial Naïve Bayes machine learning model.
# Import necessary libraries
import numpy as np
import pandas as pd
from sklearn.datasets import fetch_20newsgroups
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
# Step 1: Load the 20 Newsgroups dataset
newsgroups = fetch_20newsgroups(subset='all') # Load both training and test data
X = newsgroups.data # News articles
y = newsgroups.target # Corresponding categories
# Step 2: Split the dataset into training and testing sets (80% training, 20% testing)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Step 3: Convert the text data into numeric feature vectors using TF-IDF
vectorizer = TfidfVectorizer(stop_words='english', max_features=5000)
X_train_tfidf = vectorizer.fit_transform(X_train)
X_test_tfidf = vectorizer.transform(X_test)
# Step 4: Train a Multinomial Naive Bayes model
model = MultinomialNB()
model.fit(X_train_tfidf, y_train)
# Step 5: Make predictions on the test set
y_pred = model.predict(X_test_tfidf)
# Step 6: Evaluate the model performance
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy * 100:.2f}%")
# Step 7: Display detailed performance metrics
print("\nClassification Report:")
print(classification_report(y_test, y_pred, target_names=newsgroups.target_names))
print("\nConfusion Matrix:")
print(confusion_matrix(y_test, y_pred))
# Step 8: Example: Predicting a new text article
new_text = ["This is an example of a news article about technology and innovation."]
new_text_tfidf = vectorizer.transform(new_text)
prediction = model.predict(new_text_tfidf)
print(f"\nPredicted Category for the new text: {newsgroups.target_names[prediction[0]]}")
Slip 29 :
Q.1. Take iris flower dataset and reduce 4D data to 2D data using PCA. Then train the model and predict new flower with given measurements.
# Import necessary libraries
import numpy as np
import pandas as pd
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.decomposition import PCA
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import StandardScaler
# Step 1: Load the Iris dataset
iris = datasets.load_iris()
X = iris.data # Features: sepal length, sepal width, petal length, petal width
y = iris.target # Labels: species of iris flowers
# Step 2: Standardize the features (important for PCA)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# Step 3: Apply PCA to reduce 4D data to 2D
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)
# Step 4: Split the dataset into training and testing sets (80% training, 20% testing)
X_train, X_test, y_train, y_test = train_test_split(X_pca, y, test_size=0.2, random_state=42)
# Step 5: Train the SVM classifier on the reduced data
svm = SVC(kernel='linear', random_state=42)
svm.fit(X_train, y_train)
# Step 6: Predict the flower species on the test set
y_pred = svm.predict(X_test)
# Step 7: Evaluate the accuracy of the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy of the SVM model with PCA-reduced data: {accuracy * 100:.2f}%")
# Step 8: Predict flower species for a new flower with given measurements
# Example new flower data (sepal length, sepal width, petal length, petal width)
new_flower = np.array([[5.1, 3.5, 1.4, 0.2]])
# Standardize the new flower data
new_flower_scaled = scaler.transform(new_flower)
# Apply PCA transformation to the new flower
new_flower_pca = pca.transform(new_flower_scaled)
# Predict using the trained SVM model
predicted_class = svm.predict(new_flower_pca)
predicted_class_name = iris.target_names[predicted_class][0]
print(f"Predicted flower species for the input data {new_flower[0]}: {predicted_class_name}")
Comments
Post a Comment
hey