1

I am having trouble passing the stripped whitespace for stop_headsign back to stop_times for output to CSV. Alternatively, is there a way to .rstrip() the entire stop_headsign column?

Here is the stop_times.txt Gist.

Here is the pandas rstrip reference.

Below is my code:

import pandas as pd

stop_times = pd.read_csv('csv/stop_times.txt')

for x in stop_times['stop_headsign']:
    if type(x) == str:
        x = x.rstrip()
        # figure out how to pass store new value
    if type(x) == float:
        pass

stop_times['distance'] = 0

stop_times.to_csv('csv/stop_times.csv', index=False)

Below is what the csv output shows:

trip_id,arrival_time,departure_time,stop_id,stop_sequence,pickup_type,drop_off_type,stop_headsign,distance
568036,,,00382,26,0,0,78 UO                                             ,0
568036,,,00396,7,0,0,78 UO <> 78 via 18th AVE                          ,0
568036,,,00398,8,0,0,78 UO <> 78 via 18th AVE                          ,0
568036,,,00400,9,0,0,78 UO <> 78 via 18th AVE                          ,0
568036,,,00404,10,0,0,78 UO <> 78 via 18th AVE                          ,0
568036,,,00407,11,0,0,78 UO <> 78 via 18th AVE                          ,0
568036,,,00412,13,0,0,78 UO <> 78 via 18th AVE                          ,0
568036,,,00413,14,0,0,78 UO <> 78 via 18th AVE                          ,0
568036,,,00416,15,0,0,78 UO <> 78 via 18th AVE                          ,0
568036,,,00418,16,0,0,78 UO <> 78 via 18th AVE                          ,0
568036,,,00419,17,0,0,78 UO <> 78 via 18th AVE                          ,0
568036,,,00422,18,0,0,78 UO <> 78 via 18th AVE                          ,0
568036,,,00423,19,0,0,78 UO <> 78 via 18th AVE                          ,0
568036,,,00425,20,0,0,78 UO <> 78 via 18th AVE                          ,0
568036,,,00427,21,0,0,78 UO <> 78 via 18th AVE                          ,0
568036,,,01006,2,0,0,78 UO <> 78 via 18th AVE                          ,0
  • Where do you put your "write to csv file" code? As I see in your given code, in the loop, you just assign the value to x and do nothing there. – Phuc Tran Dec 17 '14 at 07:38
  • I've tried various methods, so here I just wanted to poll to find the best way to pass x back. That is why I didn't include an assignment. Also, since it's a huge dataset, there are some assignment methods I tried that took too long. –  Dec 17 '14 at 07:39
  • Try read the whole file and put it in the list, then you can loop the list and strip data. Then you can write back to new file. Look at this: http://stackoverflow.com/questions/18776370/converting-a-csv-file-into-a-list-of-tuples-with-python – Phuc Tran Dec 17 '14 at 08:19

1 Answers1

3

Pandas has a handy "extension" property on Series objects for this:

stop_times["stop_headsign"] = stop_times["stop_headsign"].str.rstrip()

Actually, your link is pointing to this, .str is of type StringMethods.

There is a section Vectorized String Methods in the basics-documentation on this that links to Working with Text Data.

filmor
  • 30,840
  • 6
  • 50
  • 48
  • ok, thanks. i wasn't sure how to call `strip()` given what is stated in the documentation. perhaps i've missed some critical pandas documentation along the way. –  Dec 17 '14 at 09:00