0

I am trying to select only 2 columns from a csv file: Body and CreatedDate. CreatedDate looks like this: 2018-08-07T12:36:11.000Z. Body is just text of work being done. Some Body cells are empty so I only want the ones with data in it.

I have tried using the code below to just get only the 2 desired columns:

import pandas as pd
df = pd.read_csv("file.csv")
df1= df['CreatedDate'].map(str) + ' ' + df['Body'].map(str)
print(df1)

I am getting the entire df printed twice. I see this:

[10 rows x 15 columns] & [15 rows x 10 columns]

at the bottom of each print. I am expecting to only see my 2 chosen columns. Why am I seeing all of df twice on the console?

Das_Geek
  • 2,775
  • 7
  • 20
  • 26
Dave
  • 95
  • 13

1 Answers1

0

There are many options for indexing a dataframe. This particular one can be done on a single line.

import pandas as pd
# read the csv into df
df = pd.read_csv("file.csv")
# take only the rows where 'Body' has a value and only columns ['Body', 'CreatedDate']
df = df.loc[df['Body'].notnull(),['Body', 'CreatedDate']]
print(df)

You may also want to read up on pandas.DataFrame.dropna.

pnovotnyq
  • 547
  • 3
  • 12