How to get only the colums I want to move to new dataframe with this code?

Question

I am trying to select only 2 columns from a csv file: Body and CreatedDate. CreatedDate looks like this: 2018-08-07T12:36:11.000Z. Body is just text of work being done. Some Body cells are empty so I only want the ones with data in it.

I have tried using the code below to just get only the 2 desired columns:

import pandas as pd
df = pd.read_csv("file.csv")
df1= df['CreatedDate'].map(str) + ' ' + df['Body'].map(str)
print(df1)

I am getting the entire df printed twice. I see this:

[10 rows x 15 columns] & [15 rows x 10 columns]

at the bottom of each print. I am expecting to only see my 2 chosen columns. Why am I seeing all of df twice on the console?

score 0 · Accepted Answer · answered Jun 12 '19 at 20:25

0

There are many options for indexing a dataframe. This particular one can be done on a single line.

import pandas as pd
# read the csv into df
df = pd.read_csv("file.csv")
# take only the rows where 'Body' has a value and only columns ['Body', 'CreatedDate']
df = df.loc[df['Body'].notnull(),['Body', 'CreatedDate']]
print(df)

You may also want to read up on pandas.DataFrame.dropna.

answered Jun 12 '19 at 20:25

pnovotnyq

547
3
12

That did it! Thanks! – Dave Jun 12 '19 at 20:33

How to get only the colums I want to move to new dataframe with this code?

1 Answers1