Feature Selection on Dirac
Device: Dirac-1
Introduction
In machine learning problems, we often have to start with a large number of features. We need a feature selection technique that can discover a relatively small subset of the most relevant features. In what follows we present a tutorial on using QCi's technology to select a set of features by minimizing their inter-correlation. This approach can be used with any unsupervised machine learning approach such as anomaly detection and clustering algorithms.
Methodology
Let us have a dataset with samples and features, represented by a matrix . Moreover, let us each column of by where , representing all samples for feature . We can choose a subset of features of size () such that the inter-correlation of the subset is minimal. We have
where
where denotes a correlation function such as the Pearson correlation, and is a binary variable indicating inclusion or exclusion of feature . Obviously, . The above minimization problem is subject to a constraint,
We can exclude the diagonal elements of as they always add up to . In the matrix form we have,
subject to the above constraint. Note that is a identity matrix. Note too that we have assumed that the reduced dimension is assumed to be given in the above approach.
Medicare Prescription Data
We implemented this approach using a publically available set of data on prescription of opioids in the United States. The dataset can be found at https://www.cms.gov/data-research/statistics-trends-and-reports/medicare-provider-utilization-payment-data/part-d-prescriber
Clean data
We start by cleaning the dataset.
In [2]:
import pandas as pd# Input INP_FILE = "Medicare_Provider_Utilization_and_Payment_Data__Part_D_Prescriber_Summary_Table_CY2014__50001-NNN__ANON.csv"OUT_FILE = "cleaned_medicare_data.csv"CON_VARS = ["total_claim_count","total_30_day_fill_count","total_drug_cost","total_day_supply","bene_count","total_claim_count_ge65","total_30_day_fill_count_ge65","total_drug_cost_ge65","total_day_supply_ge65","bene_count_ge65","brand_claim_count","brand_drug_cost","generic_claim_count","generic_drug_cost","other_claim_count","other_drug_cost","mapd_claim_count","mapd_drug_cost","pdp_claim_count","pdp_drug_cost","lis_claim_count","lis_drug_cost","nonlis_claim_count","nonlis_drug_cost","opioid_claim_count","opioid_drug_cost","opioid_day_supply","opioid_bene_count","opioid_prescriber_rate","antibiotic_claim_count","antibiotic_drug_cost","antibiotic_bene_count","hrm_claim_count_ge65","hrm_drug_cost_ge65","hrm_bene_count_ge65","antipsych_claim_count_ge65","antipsych_drug_cost_ge65","antipsych_bene_count_ge65","average_age_of_beneficiaries","beneficiary_age_less_65_count","beneficiary_age_65_74_count","beneficiary_age_75_84_count","beneficiary_age_greater_84_count","beneficiary_female_count","beneficiary_male_count","beneficiary_race_white_count","beneficiary_race_black_count","beneficiary_race_asian_pi_count","beneficiary_race_hispanic_count","beneficiary_race_nat_ind_count","beneficiary_race_other_count","beneficiary_nondual_count","beneficiary_dual_count","beneficiary_average_risk_score",]VALID_PROVIDER_MI = ["A","M","J","L","R","S","E","D","C","B","K","P","W","H","T","G","F","N","V","I","O","Y","Z","U", "Q","X",]VALID_GEN = ["F", "M", "Other", "Unknown"]VALID_ENTITIES = ["I", "O"]VALID_DESC_FLAGS = ["S", "T"]VALID_ENROLLS = ["E", "N", "O"]# Some utilities def convert_to_float(x):try:return float(x)except:return Nonedef convert_to_int(x):try:return int(float(x))except:return None# Read data df = pd.read_csv(INP_FILE, on_bad_lines = "skip", low_memory=False)# Clean categorical variables df["nppes_provider_mi"] = df["nppes_provider_mi"].fillna("Unknown")df["nppes_provider_mi"] = df["nppes_provider_mi"].apply(lambda x: x if x in VALID_PROVIDER_MI else "Unknown")df["nppes_credentials"] = df["nppes_credentials"].fillna("Unknown")df["nppes_credentials"] = df["nppes_credentials"].apply(lambda x: str(x).replace(".", ""))cred_hash = {"MEDICAL DOCTOR": "MD","NURSE PRACTITIONER": "NP",}df["nppes_credentials"] = df["nppes_credentials"].apply(lambda x: cred_hash[x] if x in cred_hash else x,)df["nppes_provider_gender"] = df["nppes_provider_gender"].fillna("Unknown")df["nppes_provider_gender"] = df["nppes_provider_gender"].apply(lambda x: x if x in VALID_GEN else "Other",)df["nppes_entity_code"] = df["nppes_entity_code"].apply(lambda x: x if x in VALID_ENTITIES else "Unknown",)df["nppes_provider_zip5"] = df["nppes_provider_zip5"].fillna("Unknown")df["nppes_provider_country"] = df["nppes_provider_country"].apply(lambda x: "US" if x == "US" else "Other",)df["description_flag"] = df["description_flag"].apply(lambda x: x if x in VALID_DESC_FLAGS else "Unknown",)df["medicare_prvdr_enroll_status"] = df["medicare_prvdr_enroll_status"].apply(lambda x: x if x in VALID_ENROLLS else "Unknown",)# Treat missing beneficiary count as it cannot be zero df["bene_count"] = df["bene_count"].apply(convert_to_int).fillna(-1)tmp_df = df.groupby("specialty_description", as_index=False,)["bene_count"].mean()bene_count_hash = dict(zip(tmp_df["specialty_description"],tmp_df["bene_count"],))df["bene_count"] = df.apply(lambda x: x["bene_count"] if x["bene_count"] > 0 else bene_count_hash[x["specialty_description"]],axis=1,)# Treat continuous variables for item in CON_VARS:df[item] = df[item].apply(convert_to_float).fillna(0.0)# Filter out invalid states df = df[~df["nppes_provider_state"].isin(["XX", "E", "N", "S"])]# Output df.to_csv(OUT_FILE, index=False)
Generate features
We then generate features. The categorical features are encoded using the average value of a few important variables in each category.
In [3]:
import pandas as pd# Input INP_FILE = "cleaned_medicare_data.csv"OUT_FILE = "medicare_features.csv"# Set some parameters CAT_VARS = ["nppes_provider_mi",#"nppes_credentials", # This is rather messy, so ignoring it. "nppes_provider_gender","nppes_entity_code","nppes_provider_city","nppes_provider_zip5",#"nppes_provider_country", # Almost all cases are US "specialty_description","medicare_prvdr_enroll_status","nppes_provider_state",]CON_VARS = ["total_claim_count","total_30_day_fill_count","total_drug_cost","total_day_supply","bene_count","total_claim_count_ge65","total_30_day_fill_count_ge65","total_drug_cost_ge65","total_day_supply_ge65","bene_count_ge65","brand_claim_count","brand_drug_cost","generic_claim_count","generic_drug_cost","other_claim_count","other_drug_cost","mapd_claim_count","mapd_drug_cost","pdp_claim_count","pdp_drug_cost","lis_claim_count","lis_drug_cost","nonlis_claim_count","nonlis_drug_cost","opioid_claim_count","opioid_drug_cost","opioid_day_supply","opioid_bene_count","antibiotic_claim_count","antibiotic_drug_cost","antibiotic_bene_count","hrm_claim_count_ge65","hrm_drug_cost_ge65","hrm_bene_count_ge65","antipsych_claim_count_ge65","antipsych_drug_cost_ge65","antipsych_bene_count_ge65","average_age_of_beneficiaries","beneficiary_age_less_65_count","beneficiary_age_65_74_count","beneficiary_age_75_84_count","beneficiary_age_greater_84_count","beneficiary_female_count","beneficiary_male_count","beneficiary_race_white_count","beneficiary_race_black_count","beneficiary_race_asian_pi_count","beneficiary_race_hispanic_count","beneficiary_race_nat_ind_count","beneficiary_race_other_count","beneficiary_nondual_count","beneficiary_dual_count","beneficiary_average_risk_score",]# Read and clean data df = pd.read_csv(INP_FILE, low_memory=False)# Embed categorical features embedded_cat_features = []for item in CAT_VARS:tmp_df = df.groupby(item, as_index=False).agg({"opioid_claim_count": "mean","opioid_drug_cost": "mean","opioid_day_supply": "mean","opioid_bene_count": "mean","opioid_prescriber_rate": "mean",}).rename(columns={"opioid_claim_count": "%s_opioid_claim_count" % item,"opioid_drug_cost": "%s_opioid_drug_cost" % item,"opioid_day_supply": "%s_opioid_day_supply" % item,"opioid_bene_count": "%s_opioid_bene_count" % item,"opioid_prescriber_rate": "%s_opioid_prescriber_rate" % item,})df = df.merge(tmp_df, how="left", on=item)embedded_cat_features += ["%s_opioid_claim_count" % item,"%s_opioid_drug_cost" % item,"%s_opioid_day_supply" % item,"%s_opioid_bene_count" % item,"%s_opioid_prescriber_rate" % item,]# Drop unembedded categorical variables and some others df = df[["npi"] + CON_VARS + embedded_cat_features]# Write features file df.to_csv(OUT_FILE, index=False)
Feature Selection
Once the features are generated, we can implement the above-mentioned feature selection algorithm. We start by importing some libraries, setting some parameters, and loading the features into a Pandas dataframe.
In [1]:
# Import libsimport sysimport osimport timeimport numpy as npimport pandas as pdfrom qci_client import QciClient# Define some parametersFEATURES_FILE = "medicare_features.csv"REDUCED_DIM = 10# Read featuresdf = pd.read_csv(FEATURES_FILE, low_memory=False)
We now print the feature names and get the total count of features in the dataset,
In [2]:
feature_names = list(set(df.columns) - {"npi"})orig_dim = len(feature_names)print("Original dimension is %d; reduced dimension will be %d" % (orig_dim,REDUCED_DIM,))
Out [ ]:
Original dimension is 93; reduced dimension will be 10
We should now create the objective matrix ,
In [14]:
# Generate the objective matrixX = np.array(df[feature_names])C = abs(np.corrcoef(X, rowvar=False))# Make correlation symmetric to machine precisionC = 0.5 * (C + C.transpose())objective = C - np.eye(orig_dim)objective = np.array(objective, dtype=np.float32)
And create the constraint matrix,
In [15]:
# Generate the constraintcons_lhs = np.ones(shape=(orig_dim), dtype=np.float32)cons_lhs = cons_lhscons_rhs = np.array([-REDUCED_DIM])constraints = np.hstack([cons_lhs, cons_rhs])print(constraints.shape)
Out [ ]:
(94,)
We now solve the above quadratic binary problem using QCi's Dirac-1,
In [16]:
# Create json objectsobjective_json = {"file_name": "objective_tutorial.json","file_config": {"objective": {"data": objective, "num_variables": orig_dim},} }constraint_json = {"file_name": "constraints_tutorial.json","file_config": {"constraints": {"data": constraints, #"num_variables": orig_dim,#"num_constraints": 1,}}}# Solve the optimizzation problemqci = QciClient()objective_file_id = qci.upload_file(objective_json)["file_id"]constraint_file_id = qci.upload_file(constraint_json)["file_id"]# Setup job jsonjob_params = {"sampler_type": "dirac-1", "alpha": 5.0, "nsamples": 20,}body = qci.build_job_body(job_type="sample-constraint", job_params=job_params,constraints_file_id=constraint_file_id, objective_file_id=objective_file_id,job_name=f"tutorial_eqc1",job_tags=["tutorial_eqc1"],)# Run the jobjob_response_json = qci.process_job(job_type="sample-constraint", job_body=body)print(job_response_json)results = list(job_response_json["results"]["file_config"].values())[0]energies = results["energies"]samples = results["solutions"]is_feasibles = results["feasibilities"]if True:print("Energies:", energies) # Pick a feasible solution with lowest energy # The sample solutions are sorted by energy sol = Nonefor i, item in enumerate(samples):sol = itemis_feasible = is_feasibles[i]if is_feasible:breakif not is_feasible:print("Solution is not feasible!")assert sol is not None, "No feasible solution found!"assert len(sol) == orig_dim, "Inconsistent solution size!"assert sum(sol) == REDUCED_DIM, "Solution is not feasible!"
Out [ ]:
Dirac allocation balance = 0 s (unmetered) Job submitted job_id='65f330d79e39850da64a4143'-: 2024/03/14 10:16:07 RUNNING: 2024/03/14 10:16:08 COMPLETED: 2024/03/14 10:20:49 Dirac allocation balance = 0 s (unmetered) {'job_info': {'job_id': '65f330d79e39850da64a4143', 'job_submission': {'job_name': 'tutorial_eqc1', 'job_tags': ['tutorial_eqc1'], 'problem_config': {'quadratic_linearly_constrained_binary_optimization': {'constraints_file_id': '65f330d638d25ec78cae744b', 'objective_file_id': '65f330d638d25ec78cae7449', 'alpha': 5, 'atol': 1e-10}}, 'device_config': {'dirac-1': {'num_samples': 20}}}, 'job_status': {'submitted_at_rfc3339nano': '2024-03-14T17:16:07.37Z', 'queued_at_rfc3339nano': '2024-03-14T17:16:07.37Z', 'running_at_rfc3339nano': '2024-03-14T17:16:08.384Z', 'completed_at_rfc3339nano': '2024-03-14T17:20:49.229Z'}, 'job_result': {'file_id': '65f331f138d25ec78cae744e', 'device_usage_s': 39}, 'details': {'status': 'COMPLETED'}}, 'results': {'file_id': '65f331f138d25ec78cae744e', 'num_parts': 1, 'num_bytes': 14228, 'file_name': 'tutorial_eqc1', 'file_config': {'quadratic_linearly_constrained_binary_optimization_results': {'counts': [3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], 'energies': [-497.892578125, -497.892578125, -497.888671875, -497.888671875, -497.8837890625, -497.876953125, -497.85546875, -497.85546875, -497.8515625, -497.849609375, -497.8486328125, -497.8466796875, -497.84375, -497.8349609375, -497.833984375, -497.826171875, -497.7958984375, -497.7802734375], 'feasibilities': [True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True], 'objective_values': [2.107582171214744, 2.107582171214744, 2.111151851946488, 2.1112560788169503, 2.1162998459767546, 2.122956631937995, 2.144297275925055, 2.144297275925055, 2.148616509046405, 2.1505788440699685, 2.1509259993908922, 2.1532203237293284, 2.1561004637042056, 2.16536722460296, 2.165533141582273, 2.173614846891723, 2.204232932417653, 2.219193771714344], 'solutions': [[0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0], [0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0]]}}}} Energies: [-497.892578125, -497.892578125, -497.888671875, -497.888671875, -497.8837890625, -497.876953125, -497.85546875, -497.85546875, -497.8515625, -497.849609375, -497.8486328125, -497.8466796875, -497.84375, -497.8349609375, -497.833984375, -497.826171875, -497.7958984375, -497.7802734375]
Finally, we can print the list of selected variables,
In [17]:
selected_vars = []for i in range(orig_dim):if sol[i] > 0:selected_vars.append(feature_names[i])print(selected_vars)
Out [ ]:
['specialty_description_opioid_drug_cost', 'beneficiary_race_black_count', 'beneficiary_race_asian_pi_count', 'nppes_entity_code_opioid_day_supply', 'nppes_provider_city_opioid_drug_cost', 'beneficiary_race_nat_ind_count', 'beneficiary_average_risk_score', 'nppes_provider_gender_opioid_claim_count', 'nppes_provider_mi_opioid_drug_cost', 'antipsych_drug_cost_ge65']