0

First post in the community (congrats or I am sorry are in order :-)). I provided some code below for survey data I am trying to analyze. I am trying to capture the rows that have the value "1" in any column. It was noted as a float, but I converted to an interger and it did not work. Used quotes and did not work. Any advice?

# Dependencies and Setup
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import json
from pprint import pprint
import requests
import time
from scipy import stats
import seaborn as sn
%matplotlib inline
    
# Read csv
us_path = "us_Data.csv"
us_responses = pd.read_csv(us_path)

# Created filtered data frame.    
preexisting_us = us_responses

# Filter data.
preexisting_us = us_responses[us_responses["diabetes"] == "1" | us_responses(us_responses["cardiovascular_disorders"] == "1") | us_responses(us_responses["obesity"] == "1") | us_responses(us_responses["respiratory_infections"] == "1") | us_responses(us_responses["respiratory_disorders_exam"] == "1") | us_responses(us_responses["gastrointestinal_disorders"] == "1") | us_responses(us_responses["chronic_kidney_disease"] == "1") | us_responses(us_responses["autoimmune_disease"] == "1") | us_responses(us_responses["chronic_fatigue_syndrome_a"] == "1")]
nladkins
  • 3
  • 2
  • I am sorry… that you did not read [this](https://stackoverflow.com/help/minimal-reproducible-example) and [that](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) :p please make your question more explicit and welcome to SO ;) – mozway Oct 24 '21 at 17:30
  • @mozway - Can you be more explicit? What is missing? Feel free to pass on to other posts if this doesn't help you. – nladkins Oct 24 '21 at 17:39
  • If you ask the question, I guess you haven't read the links above ;) – mozway Oct 24 '21 at 18:02

1 Answers1

0

First of all, you probably should define your new DataFrame as a copy of the orignal one, such as df = us_responses.copy(). In this way you are sure that the original DataFrame will not modified (I suggest you to have a look at the documentation).

Now, to filter the DataFrame you can use simpler ways than the one of your code. For example:

cols_to_check = ['diabetes', 'cardiovascular_disorders', ... ]
df_filtered = df.loc[df[cols_to_check].sum(axis=1) > 0, :]

In this way, by calculating the sum of the selected columns, if at least one has value 1, the corresponding row is kept in the filtered DataFrame.

However, if you really want to keep your code the way it is (which I would not suggest), you need to make some corrections:

preexisting_us = preexisting_us[preexisting_us["diabetes"] == 1 | (preexisting_us["cardiovascular_disorders"] == 1) | (preexisting_us["obesity"] == 1) | (preexisting_us["respiratory_infections"] == 1) | (preexisting_us["respiratory_disorders_exam"] == 1) | (preexisting_us["gastrointestinal_disorders"] == 1) | (preexisting_us["chronic_kidney_disease"] == 1) | (preexisting_us["autoimmune_disease"] == 1) | (preexisting_us["chronic_fatigue_syndrome_a"] == 1)]

If you are interested in more info about filtering using loc(), here you can find the documentation.

Please, follow @mozway suggestions for posting clearer questions next time.

Alessandro
  • 361
  • 1
  • 9