It looks like a List but I can't index into it: ValueError: Length of values (2) does not match length of index (279999)

Question

I am importing the CSV file from here: https://raw.githubusercontent.com/kwartler/Harvard_DataMining_Business_Student/master/BookDataSets/LaptopSales.csv

This code works:

from dfply import *
import pandas as pd
df = pd.read_csv("LaptopSales.csv")
(df >> select(X["Date"]) >> mutate(AdjDate = (X.Date.str.split(" "))) >> head(3))

and produces this result:

    Date                AdjDate
0   01-01-2008 00:01    [01-01-2008, 00:01]
1   01-01-2008 00:02    [01-01-2008, 00:02]
2   01-01-2008 00:04    [01-01-2008, 00:04]

But when I try to extract the first element in the list:

from dfply import *
import pandas as pd
df = pd.read_csv("LaptopSales.csv")
(df >> select(X["Date"]) >> mutate(AdjDate = (X.Date.str.split(" ")[0])) >> head(3))

I get a wall of error culminating in:

ValueError: Length of values (2) does not match length of index (279999)

In `X.Date.str.split(" ")[0]` try changing `[0]` to `.apply(lambda row: row[0])` — woblob, Dec 17 '22 at 08:11
@woblob I get "TypeError: 'float' object is not subscriptable" — nicomp, Dec 18 '22 at 12:23

score 0 · Answer 1 · answered Dec 17 '22 at 19:43

0

AdjDate = (X.Date.str.split(" ")[0]))

Is in fact comparing 2 series index by index and return a series with the length of primary series.

Then you can not store it in a 2 lengthed variable and pandas raise error

answered Dec 17 '22 at 19:43

Alireza75

513
1
4
19

This doesn't fix the problem – nicomp Dec 18 '22 at 12:12
@nicomp I describe the reason of error. I forgot to write true using this. if you still need to this, i can write – Alireza75 Dec 19 '22 at 04:58

nicomp · Accepted Answer · 2022-12-18T15:09:20.577

The answer is that one of the rows in the CSV file contains a value in the Date column that is NaN. That value can't be split on " ". Nan is a float: since the split fails to create a list, then the indexing operation fails. It's row 2913 in the .CSV file: ",51,SE14 6LA,SE8 3JD,460,15,4,2,1.5,Yes,80,Yes,536682,177068,537175,177885"

The reason I didn't simply delete the question is because the data set is publicly available and appears to be part of a course available through Harvard University: https://github.com/kwartler/Harvard_DataMining_Business_Student

It looks like a List but I can't index into it: ValueError: Length of values (2) does not match length of index (279999)

2 Answers2