3
import csv

f = csv.reader(open('lmt.csv','r'))         # open input file for reading
Date, Open, Hihh, mLow, Close, Volume = zip(*f)    #s plit it into separate columns

ofile = open("MYFILEnew1.csv", "wb")            # output csv file
c = csv.writer(ofile)

item = Date   
item2 = Volume
rows = zip(item, item)
i = 0
for row in item2:
    print row
    writer = csv.writer(ofile, delimiter='\t')
    writer.writerow([row])

ofile.close() 

Above is what I have produced so far.

As you can see in the 3rd line, I have extracted 6 columns from a spreadsheet. I want to create a .csv file under the name of MYFILEnew1.csv which only has two columns, Date and Volume.

What I have above creates a .csv that only writes Volume column into the first column of the new .csv file. How would you go about placing Date into the second column?

For example

Date       Open   High      Low     Close    Volume
17-Feb-16   210   212.97    209.1   212.74  1237731

is what i have. and Id like to produce a new csv file such that it has

Date           Volume
17-Feb-16      1237731
stratofortress
  • 453
  • 1
  • 9
  • 21

3 Answers3

3

If I understand you question correctly, you can achieve that very easily using panda's read_csv and to_csv (@downvoter: Could you explain your downvote, please!?); the final solution to your problem can be found below EDIT2:

import pandas as pd

# this assumes that your file is comma separated
# if it is e.g. tab separated you should use pd.read_csv('data.csv', sep = '\t')
df = pd.read_csv('data.csv')

# select desired columns
df = df[['Date', 'Volume']]

#write to the file (tab separated)
df.to_csv('MYFILEnew1.csv', sep='\t', index=False)

So, if your data.csv file looks like this:

Date,Open,Hihh,mLow,Close,Volume
1,5,9,13,17,21
2,6,10,14,18,22
3,7,11,15,19,23
4,8,12,16,20,24

The the MYFILEnew1.csv would look like this after running the script above:

Date    Volume
1   21
2   22
3   23
4   24

EDIT

Using your data (tab separated, stored in the file data3.csv):

Date    Open    Hihh    mLow    Close   Volume
17-Feb-16   210 212.97  209.1   212.74  1237731

Then

import pandas as pd

df = pd.read_csv('data3.csv', sep='\t') 

# select desired columns
df = df[['Date', 'Volume']]

# write to the file (tab separated)
df.to_csv('MYFILEnew1.csv', sep='\t', index=False)

gives the desired output

Date    Volume
17-Feb-16   1237731

EDIT2

Since your header in your input csv file seems to be messed up (as discussed in the comments), you have to rename the first column. The following now works fine for me using your entire dataset:

import pandas as pd

df = pd.read_csv('lmt.csv', sep=',') 

# get rid of the wrongly formatted column name
df.rename(columns={df.columns[0]: 'Date' }, inplace=True)

# select desired columns
df = df[['Date', 'Volume']]

# write to the file (tab separated)
df.to_csv('MYFILEnew1.csv', sep='\t', index=False)
Cleb
  • 25,102
  • 20
  • 116
  • 151
  • i implemented your script above but i received the following error message "['Date'] not in index" – stratofortress Feb 20 '16 at 19:45
  • Is your input csv file tab or comma separated? If it is tab separated, you need `sep='\t'` as additional optin in the `read_csv` command. But it should work fine with the example csv I provided. How does `df` look like after reading in the file? – Cleb Feb 20 '16 at 19:47
  • csv i guess it means its comma seperated? (sorry im not good with this kindof stuff..) – stratofortress Feb 20 '16 at 19:49
  • Usually yes (Comma Separated Values = csv). You could print `df` after reading in the the file and see how it looks like. I edit my question to account for the tab separated option. – Cleb Feb 20 '16 at 19:51
  • "print df" after "df = pd.read_csv('lmt.csv')" prints out all columns nicely but i think its "df = df[['Date', 'Volume']]" that fails – stratofortress Feb 20 '16 at 19:53
  • Could you print `df.columns`? This should be an array that contains your column headers (and you should see `Date` among those). – Cleb Feb 20 '16 at 19:55
  • Index([u'Date', u'Open', u'High', u'Low', u'Close', u'Volume'], dtype='object') – stratofortress Feb 20 '16 at 19:57
  • i noticed that if i do "df = df['Volume']" it works perfectly, but if i do df = df['Date'] i get an error – stratofortress Feb 20 '16 at 19:58
  • Ok, columns look like intended, the the above solution should work fine. I quickly try with your data. – Cleb Feb 20 '16 at 19:59
  • ['Date', 'Volume'] if i do this, it just puts the two columns in the same column of the new csv file actually... – stratofortress Feb 20 '16 at 20:01
  • I edited my answer again. It works fine for me. So my guess is that the problem is due to the separator in your input file. You should try tab, comma and white space. You can also use `sep=None` then it will automatically try to determine the separator. – Cleb Feb 20 '16 at 20:11
  • https://drive.google.com/a/cornell.edu/file/d/0B-oiy8SOP3pQUkoxZmM2MGVBZnM/view?usp=sharing – stratofortress Feb 20 '16 at 20:16
  • above is the file if you wanna test it out. it doesnt work for me – stratofortress Feb 20 '16 at 20:16
  • did you send a link to your file? i dont see any – stratofortress Feb 20 '16 at 20:21
  • https://drive.google.com/open?id=0B6o7Uh6laLVAVm9vMEtVdU9fdlE does this work now? – stratofortress Feb 20 '16 at 20:22
  • http://finance.yahoo.com/q/hp?s=LMT scroll to the bottom and download the spreadsheet – stratofortress Feb 20 '16 at 20:25
  • Ok, I detected the problem: When you print `df.columns[0]` it returns `\xef\xbb\xbfDate` instead of `Date`. That is some weird formatting issue in the csv; I'll try to fix that... – Cleb Feb 20 '16 at 20:27
  • also, if i do df = df[['Low', 'Volume']], it puts both columns in the same column of the new file.. – stratofortress Feb 20 '16 at 20:28
  • also, i was gonna calculate the number of days between two given dates, using "datetime" module, i only know how to do it when the format of dates is "%m/%d/%Y" do you know how i can implement "datetime" module for this csv file? – stratofortress Feb 20 '16 at 20:36
  • i will be gone for the next few hours. thank you so much for your help so far. – stratofortress Feb 20 '16 at 20:38
  • it does correctly pick out the two columns but they get written in the same column of the new csv file... – stratofortress Feb 20 '16 at 20:44
  • How do you open your csv? Do you specify that the values are tab-separated when you open it? If not, you should do this (usually you can choose the separator when you double-click to document to open it). – Cleb Feb 20 '16 at 20:46
  • i open it with "df = pd.read_csv('lmt.csv', sep=',') " (if this is what you are asking me..) – stratofortress Feb 20 '16 at 20:48
  • i changed your last line from "df.to_csv('MYFILEnew10.csv', sep='\t', index=False)" to df.to_csv('MYFILEnew10.csv', sep=',', index=False) this seems to have solved the issue thanks! – stratofortress Feb 20 '16 at 20:49
  • OK, the you open it with the comma separator; in your question you use the tab separator that's why I chose it as well. But glad it is solved now! Then please accept the answer by clicking on the small check next to the answer which then turns green. – Cleb Feb 20 '16 at 20:51
  • sorry for accepting the answer so late. i was gone for a few hours and i didnt know about the check mark. Thank you so much! – stratofortress Feb 21 '16 at 03:43
  • No problem. I also answered your second question about subtracting dates. :) – Cleb Feb 21 '16 at 09:35
2

Here I would suggest using the csv module's csv.DictReader object to read and write from the files. To read the file, you would do something like

import csv
fieldnames=('Date', 'Open', 'High', 'mLow', 'Close', 'Volume')
with open('myfilename.csv') as f:
    reader = csv.DictReader(f, fieldnames=fieldnames)

Beyond this, you will just need to filter out the keys you don't want from each row and similarly use the csv.DictWriter class to write to your export file.

cderwin
  • 425
  • 4
  • 11
2

You were so close:

import csv

f = csv.reader(open('lmt.csv','rb')) # csv is binary
Date, Open, Hihh, mLow, Close, Volume = zip(*f)

rows = zip(Date, Volume)

ofile = open("MYFILEnew1.csv", "wb")  
writer = csv.writer(ofile)
for row in rows:
    writer.writerow(row) # row is already a tuple so no need to make it a list

ofile.close()
mechanical_meat
  • 163,903
  • 24
  • 228
  • 223