-1

I have a .txt dataset like this:

user_000044 2009-04-24  13:47:07    Spandau Ballet  Through The Barricades 

I have to read the last two colums, Spandau Ballet as unique and Through the Barricades as unique. How can I do this?

In need to create two array, artists =[] and tracks = [] in which I put data in a loop, but I can't define the portion of text in a line.

Someone can help me?

RobJan
  • 1,351
  • 1
  • 14
  • 18
Jessica Martini
  • 253
  • 2
  • 3
  • 11
  • 1
    Apparently your fields are separated by tabs--you should state that, since those tabs cannot be seen. Are there many rows, with the same format as the one you show? What do you mean by "read... as unique"? You say you "put data in a loop"--please show your code attempt. – Rory Daulton Jul 19 '18 at 20:24
  • 2
    This is a TSV—tab-separated values—file, which is just a CSV (comma-separated values) with tabs instead of commas as the delimiters. You can use the stdlib's `csv` module to read these, or third-party libraries (including NumPy and Pandas), or (if you know there are never any tab characters or quotes or escapes within the fields) just call `line.split('\t')` on each line. – abarnert Jul 19 '18 at 20:26
  • The stackoverflow site is mostly concerned with helping people correct their code. Since there's no code in the question it will likely be closed. To re-open it, have a try at implementing the suggestions in the comment. Alternatively, ask another question with the code in it. Good luck! – holdenweb Jul 20 '18 at 05:37

3 Answers3

1

If the columns in your file are separated by tabulations, you can use np.loadtxt (NumPy function) following

artists, tracks = np.loadtxt("myfile.txt", delimiter = "\t", dtype = str, usecols = [ 3, 4 ], unpack = True)

This will output a NumPy array. Optionally, you can convert these arrays into conventional Python lists of strings following

artists = [ str(s) for s in artists ]
tracks = [ str(s) for s in tracks ]
Kefeng91
  • 802
  • 6
  • 10
0

An option using python and no third-party packages:

data = open('dataset.txt', 'r').readlines()

artists = []
tracks = []

for line in data:
    artist, track = line.split(' '*2)[-2::]
    artists.append(artist.strip())
    tracks.append(track.strip())

print artists
print tracks

output:

['Spandau Ballet']
['Through The Barricades']

[-2::] gets the last 2 columns in each line, adjust to get other columns if needed.

0

You are probably better off with using the pandas-module to load the content of the .txt into a pandas DataFrame and proceed from there. If you're unfamiliar with it...a DataFrame is as close to an Excelsheet as it can get with Python. pandas will handle reading the lines for you so you don't have to write your own loop.

Assuming your textfile is four column, tab-separated, this would look like:

# IPython for demo:
import pandas as pd

df = pd.read_csv('ballet.txt', sep='\t', header=None, names=['artists', 'tracks'], usecols=[2, 3])
# usecols here limits the Dataframe to only consist the 3rd and 4th column of your .txt

Your DataFrame then could look like:

df
# Out: 
          artists                  tracks
0  Spandau Ballet  Through The Barricades
1   Berlin Ballet               Swan Lake

Access single columns by column names:

df.artists  # or by their index e.g. df.iloc[:, 0]
# Out: 
0    Spandau Ballet
1     Berlin Ballet
Name: 2, dtype: object

You can still put the data into an array at this point, but I can't think of a reason you really wanna do this if you're aware of the alternatives.

Darkonaut
  • 20,186
  • 7
  • 54
  • 65