1

Im not able to drop the rows with no values in my dataframe, why is that ? i tried many ways already and none of them work. My dataframe was a json , and my "priority" column has many empty values , which i want to drop the certain rolls that have no value or at least fill them up with something for i can do some training.

i tried

raw_df = pd.read_json("pd_calls.json")
df = raw_df.rename(columns=raw_df.iloc[0]).drop(0)

df.dropna()
feature_cols = ['date_time','FullAddress','call_type','priority','lat','long']
x = df[feature_cols]

x.dropna()
x = x[pd.notnull(x['priority'])]
df = df.dropna(how='any') 
df
still displaying the rows with no values for "priority" 

my data set from the json priority is the L , it should be from 1 to 4 the empty L are " "

{"A": "P17070009161", "B": "7/6/2017 8:35", "C": "5", "D": "36200", "E": "", "F": "BLADEN AV MURRIETTA", "G": "", "H": "36200 BLADEN AV MURRIETTA, San Diego, CA", "I": "1182", "J": "CAN", "K": "999", "L": "2", "M": "33.5569955", "N": "-117.1839502"},
{"A": "P17070042146", "B": "7/25/2017 10:58", "C": "3", "D": "42100", "E": "", "F": "ALEXANDRA DR MURRIETTA", "G": "", "H": "42100 ALEXANDRA DR MURRIETTA, San Diego, CA", "I": "WARRANT", "J": "W", "K": "999", "L": "3", "M": "32.903892", "N": "-117.219352"},
{"A": "P17040037291", "B": "4/22/2017 16:35", "C": "7", "D": "42100", "E": "", "F": "ORANGE BOSSOM TEMECULA", "G": "", "H": "42100 ORANGE BOSSOM TEMECULA, San Diego, CA", "I": "AU1", "J": "W", "K": "999", "L": "1", "M": "33.5121386", "N": "-117.1350924"},
{"A": "P17050008176", "B": "5/5/2017 19:57", "C": "6", "D": "59800", "E": "", "F": "ALBEMARLE", "G": "ST", "H": "59800 ALBEMARLE ST, San Diego, CA", "I": "MCTSTP", "J": "O", "K": "438", "L": "1", "M": "32.676946", "N": "-117.0652657"},
{"A": "P17020026861", "B": "2/16/2017 12:08", "C": "5", "D": "60300", "E": "", "F": "AVE DE LAS VISTAS", "G": "", "H": "60300 AVE DE LAS VISTAS, San Diego, CA", "I": "MCTSTP", "J": "O", "K": "725", "L": "1", "M": "32.5822973", "N": "-117.0092733"},
{"A": "P17010005533", "B": "1/4/2017 13:57", "C": "4", "D": "75700", "E": "", "F": "SARANAC", "G": "", "H": "75700 SARANAC, San Diego, CA", "I": "FU", "J": "K", "K": "326", "L": "2", "M": "32.7707775", "N": "-117.0492753"},
{"A": "P17050025037", "B": "5/15/2017 16:10", "C": "2", "D": "104400", "E": "", "F": "BLACK MOUNTAIN", "G": "RD", "H": "104400 BLACK MOUNTAIN RD, San Diego, CA", "I": "REPORT", "J": "W", "K": "242", "L": "", "M": "32.9390205", "N": "-117.1284305"},
{"A": "P17070034017", "B": "7/20/2017 14:46", "C": "5", "D": "104400", "E": "", "F": "BLACK MOUNTAIN", "G": "RD", "H": "104400 BLACK MOUNTAIN RD, San Diego, CA", "I": "REPORT", "J": "W", "K": "242", "L": "", "M": "32.9390205", "N": "-117.1284305"},
{"A": "P17080000540", "B": "8/1/2017 9:55", "C": "3", "D": "123300", "E": "", "F": "SALMON RIVER", "G": "RD", "H": "123300 SALMON RIVER RD, San Diego, CA", "I": "REPORT", "J": "W", "K": "233", "L": "", "M": "32.9540226", "N": "-117.1207721"}
]

in pandas

date_time                 FullAddress   call_type   priority  lat           long
1   6/14/2017 21:54 10 14TH ST, San Diego, CA   1151    2   32.7054489  -117.1518696
2   3/29/2017 22:24 10 14TH ST, San Diego, CA   1016    2   32.7054489  -117.1518696
3   6/3/2017 18:04  10 14TH ST, San Diego, CA   1016    2   32.7054489  -117.1518696
4   3/17/2017 10:57 10 14TH ST, San Diego, CA   1151    2   32.7054489  -117.1518696
5   3/3/2017 23:45  10 15TH ST, San Diego, CA   911P    2   32.7057215  -117.1503498
6   2/10/2017 8:23  10 15TH ST, San Diego, CA   AU2     2   32.7057215  -117.1503498
7   4/11/2017 4:57  10 15TH ST, San Diego, CA   5150    2   32.7057215  -117.1503498
8   3/28/2017 6:30  10 15TH ST, San Diego, CA   1146        32.7057215  -117.1503498
9   6/22/2017 10:19 10 15TH ST, San Diego, CA   242      1  32.7057215  -117.1503498
10  6/5/2017 19:27  10 15TH ST, San Diego, CA   5150    2   32.7057215  -117.1503498
11  6/28/2017 11:51 10 15TH ST, San Diego, CA   415     2      32.7057215   -117.1503498
12  6/28/2017 12:28 10 15TH ST, San Diego, CA   911P    2   32.7057215  -117.1503498
13  7/7/2017 21:59  10 15TH ST, San Diego, CA   647F    2   32.7057215  -117.1503498
14  8/11/2017 7:07  10 15TH ST, San Diego, CA   1140OD      32.7057215  -117.1503498
David Arriaga
  • 69
  • 2
  • 11
  • 2
    `df.dropna()` does nothing unless you assign it back like `df=df.dropna()` same for `x.dropna()` – anky Mar 27 '19 at 15:41
  • just tried it and still does not drop the rows – David Arriaga Mar 27 '19 at 15:49
  • you can create a reproducible example. take a [look](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) and then inform jez :) – anky Mar 27 '19 at 15:51
  • what do you mean ? – David Arriaga Mar 27 '19 at 15:52
  • @jezrael hey whats your problem its not a duplicate – David Arriaga Mar 27 '19 at 16:05
  • are you sure the empty values are in fact nulls, and not whitespace or something like that? Like the person above said, if you could provide the actual dataframe or a way to reproduce it, we could work with it. can you share the json file? – chitown88 Mar 27 '19 at 16:11
  • @chitown88 just did , and this is how the empty values look in my json "L": " ", the non empty ones look like "L":"1" – David Arriaga Mar 27 '19 at 16:27
  • 1
    do this with your pandas df: `df = df[df['priority'] != '']` let me know if that works. I also agree, this shouldn't be flagged as a duplicate – chitown88 Mar 27 '19 at 16:29
  • @chitown88 wow bro it worked , thank you , can you explain it , if not = ' ' ? – David Arriaga Mar 27 '19 at 16:34
  • 1
    What that is doing is it's filtering your dataframe. It's saying, keep only the rows, where the `priority` column is NOT EQUAL to `""`. Basically, pandas is recognizing those as empty strings, not as nulls. So there are no nulls to drop when using `.dropna`, but there are empty strings. you can also read a little bit [here](https://stackoverflow.com/questions/29314033/python-pandas-dataframe-remove-empty-cells) – chitown88 Mar 27 '19 at 16:37
  • 1
    if you `print(df['priority'] != "")`, it'll return True or False for all the rows; True if not equal to an empty string. Then we're saying keep all the True rows with `df[df['priority'] != ""]` – chitown88 Mar 27 '19 at 16:39
  • 1
    the other option is to replace all empty strings in your dataframe with `np.nan`. Then you could use `dropna`. but either way works – chitown88 Mar 27 '19 at 16:41
  • damn thanks man , now it makes sense .. thank you wish i can up vote your answer – David Arriaga Mar 27 '19 at 17:03
  • so sorry, add correct dupe. – jezrael Mar 27 '19 at 17:15

0 Answers0