-2

I am trying to read a csv file using pandas. The CSV file is structured as follows:

Timestamp, UTC, id, loc, spd
001, 12z, q20, "52, 13", 320
002, 13z, a32, "53, 12", 321
003, 14z, q32, "54, 11", 321
004, 15`, a43, "55, 10", 330

The code I am using is as follows:

import pandas as pd
import matplotlib.pyplot as plt

fname = "data.csv"
data = pd.read_csv(fname,sep=",", header=None, skiprows=1)
data.columns = ["Timestamp", "UTC", "Callsign", "Position", "Speed", "Direction"]

t = data["Timestamp"]
utc = data["UTC"]
acid = data["Callsign"]
pos = data["Position"]
spd = ["Speed"]

plt.plot(t,spd)
plt.show()

How do I deal with the loc column being two values inside double brackets " ", such that I can plot the timestamp vs spd for example?

When I try to plot(t,id), it goes fine, but when I try to plot(t, spd), I get a ValueError (x and y must have same first dimension, but have shapes (466,) and (1,)?

Anyone know a workaround for this?

thereiswaldo
  • 87
  • 1
  • 8
  • 4
    What does this have to do with `pip` or with installing modules? – DeepSpace Sep 03 '22 at 18:52
  • Do you have in the spd column only one value in the first row and no other values? Check out the CSV file if it is the case and post here for example the first five lines of it (not as image, but as text please). – Claudio Sep 03 '22 at 18:58
  • Please, provide a minimal reproducible example. https://stackoverflow.com/help/minimal-reproducible-example https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples – alec_djinn Sep 03 '22 at 18:59
  • Hi all, sorry for not adding my code, I have updated my question – thereiswaldo Sep 04 '22 at 08:11

1 Answers1

0

It looks like you just need to treat the columns as strings, remove the quotation marks and convert to integers:

data["Position"] = data.Position.str[2:].astype(int)
data["Speed"] = data.Speed.str[:-1].astype(int)

Note that the first uses [2:] because I found there was a whitespace before the first ", i.e. "52.

Your error for plotting Speed is because you have used spd = ["Speed"] when instead you should have used spd = data["Speed"], so you had the whole dataframe length for time, but a list of length 1 for spd.

Rawson
  • 2,637
  • 1
  • 5
  • 14