1

I was working before on Raspberry Pi, but once I moved to an Ubuntu PC, the code below is not working anymore. I guess I need to somehow install that csvformat, but not sure how.

Python code:

import os
import subprocess
import pandas
import pandas as pd

subprocess.call("csvformat -t plates.txt >> plates.csv", shell=True)

f=pd.read_csv("plates.csv")

keep_col = [11,13]
new_f = f[keep_col]
new_f.to_csv("new_plates.csv", index=False)

subprocess.call("sudo rm plates.csv", shell=True)

df = pd.read_csv("new_plates.csv", names=["Plates", "Name"])
df["Plates"] = df["Plates"].str.split().apply("".join)

print(df)

Error message:

/bin/sh: 1: csvformat: not found
Traceback (most recent call last):
  File "script.py", line 14, in <module>
    new_f = f[keep_col]
  File "/usr/local/lib/python3.6/dist-packages/pandas/core/frame.py", line 2934, in __getitem__
    raise_missing=True)
  File "/usr/local/lib/python3.6/dist-packages/pandas/core/indexing.py", line 1354, in _convert_to_indexer
    return self._get_listlike_indexer(obj, axis, **kwargs)[1]
  File "/usr/local/lib/python3.6/dist-packages/pandas/core/indexing.py", line 1161, in _get_listlike_indexer
    raise_missing=raise_missing)
  File "/usr/local/lib/python3.6/dist-packages/pandas/core/indexing.py", line 1246, in _validate_read_indexer
    key=key, axis=self.obj._get_axis_name(axis)))
KeyError: "None of [Int64Index([11, 13], dtype='int64')] are in the [columns]
Gino Mempin
  • 25,369
  • 29
  • 96
  • 135
  • if you have to install something on Ubuntu then first check `apt search ...` - like `apt search csvkit` - and later you can use `apt install ...` – furas Jul 11 '19 at 13:32
  • As an aside, [you want to avoid `shell=True`](/questions/3172470/actual-meaning-of-shell-true-in-subprocess); `with open('plates.csv', 'a') as out: subprocess.call(["csvformat","-t", "plates.txt"], stdout=out, shell=False)` – tripleee Jul 11 '19 at 13:38
  • Possible duplicate of https://stackoverflow.com/questions/19034959/installing-python-modules-on-ubuntu – tripleee Jul 11 '19 at 13:42
  • Your title and question description is misleading because you have 2 different, unrelated problems. 1 is the "_csvformat: not found_" error and 1 is the "_KeyError: "None of [Int64Index([11, 13], dtype='int64')] are in the [columns]_" error. I've updated [my answer](https://stackoverflow.com/a/56990373/2745495) to try to address both. – Gino Mempin Jul 11 '19 at 23:53

1 Answers1

1

You have 2 different, unrelated problems.

1st problem:

/bin/sh: 1: csvformat: not found

Your script seems to be looking for the csvformat command line tool which is part csvkit.

You can install csvkit with pip:

1.2. Installing csvkit

Installing csvkit is easy:

sudo pip install csvkit

Note:

If you’re familiar with virtualenv, it is better to install csvkit in its own environment. If you are doing this, then you should leave off the sudo in the previous command.

As noted in the csvkit documentation (and in the comments), doing sudo pip install is generally considered as bad. See What are the risks of running 'sudo pip'?. Using a virtual environment is the recommended approach.

After installing csvkit, check that you now have csvformat:

csvformat -h

Now, when you run your script, make sure that the pip you used to install csvkit is the same Python (python, python3, ..) that you use to run your script. For example, if you use python to run your script:

$ python -m pip list | grep csvkit
csvkit          1.0.4

If csvkit does not appear, then you installed it somewhere else.

2nd problem:

KeyError: "None of [Int64Index([11, 13], dtype='int64')] are in the [columns]

You are getting this because you are accessing (slicing) the contents of f incorrectly. The return of read_csv is a pandas DataFrame. You can't just use f[11,13] to slice it. You need to use either loc (label-based slicing) or iloc (integer indexing-based slicing).

I don't know what you are actually trying to do with keep_col = [11,13], but you should do something like this instead:

f=pd.read_csv("test.csv")
#     1  ..  11  12  13  ..
# 0   x  ..  x   x   x   ..
# 1   x  ..  x   x   x   ..
# 2   x  ..  x   x   x   ..
# ..

f_slice_1 = f.loc[:, '11':'13']
#     11  12  13 
# 0   x   x   x  
# 1   x   x   x  
# 2   x   x   x   
# ..

f_slice_2 = f.iloc[:, 11:13]
#     12  13 
# 0   x   x  
# 1   x   x  
# 2   x   x 
# ..
Gino Mempin
  • 25,369
  • 29
  • 96
  • 135
  • 2
    Installing it via sudo is dangerous, as it can overwrite any version installed via the distributions package manager. Either without sudo (installing specific to the user) or in a virtualenv is the preferred way. – Jan Henke Jul 11 '19 at 13:38