0

Very simple problem. I am reading in CSV files organized in specific way. There's no header and the file shape is a rectangle; there are no missing or corrupt entries. I read in the csv file using pandas and convert to a numpy array.

The problem is, when I print the first column, the last entry is missing. The printed output ends at the second-to-last value.

import pandas as pd
import numpy as np

filenames=glob.glob(r'\my\filepath\*csv')
def data(filename):
    out = pd.read_csv(r'{}'.format(filenames[0]),sep=',',header=None).to_numpy()
    return out

alldata = data(filenames[0])
column1 = alldata[0:-1,0]
print(column1)

I expect the print command to print the entire column, but the print ends at the second-to-last value. I have the csv file open in excel and the print command is clearly missing the last value. However, if I do

print(alldata)

I can see the expected last value of column1 in the printed table. What's happening? The 0:-1 should span the entire column, correct?

John
  • 11
  • 3
  • 4
    Incorrect, `0:-1` does not span the entire column. Try it out: `print(alldata[0:-1])` prints everything up to, but not including, the last entry. Check out this [Q/A](https://stackoverflow.com/questions/509211/how-slicing-in-python-works) for more discussion on slicing. – Michael Ruth Aug 08 '23 at 20:02

1 Answers1

0

Mate, the problem is caused by the slicing, alldata[0:-1, 0] selects from the first row included until the last row (not included). Try this:

filenames = glob.glob(r'\my\filepath\*csv')

def data(filename):
    out = pd.read_csv(r'{}'.format(filenames[0]), sep=',', header=None).to_numpy()
    return out

alldata = data(filenames[0])
column1 = alldata[:, 0]  # Select all rows in the first column
print(column1)
notarealgreal
  • 734
  • 16
  • 29
  • Great, thank you. I was suspecting something like that but -1 representing the last element inclusive is hardwired into me from other coding languages. – John Aug 09 '23 at 15:57